Abstract
One important issue in Bayesian estimation is the determination of an effective informative prior. In hierarchical Bayes models, the uncertainty of hyperparameters in a prior can be further modeled via their own priors, namely, hyper priors. This study introduces a framework to construct hyper priors for both the mean and the variance hyperparameters for estimating the treatment effect in a two-group randomized controlled trial. Assuming a random sample of treatment effect sizes is obtained from past studies, the hyper priors can be constructed based on the sampling distributions of the effect size mean and precision. The performance of the hierarchical Bayes approach was compared with the empirical Bayes approach (hyperparameters are fixed values or point estimates) and the ordinary least squares (OLS) method via simulation. The design factors for data generation included the sample treatment effect size, treatment/control group size ratio, and sample size. Each generated data set was analyzed using the hierarchical Bayes approach with three hyper priors, the empirical Bayes approach with twelve priors (including correct and inaccurate priors), and the OLS method. Results indicated that the proposed hierarchical Bayes approach generally outperformed the empirical Bayes approach and the OLS method, especially with small samples. When more sample effect sizes were available, the treatment effect was estimated more accurately regardless of the sample sizes. Practical implications and future research directions are discussed.
Keywords
In behavioral science research, the treatment effect is commonly estimated with a randomized controlled trial (RCT) using the frequentist framework, such as by least squares and maximum likelihood (ML) methods. A potential problem is capitalizing on chance, which means that the statistical inference is subject to sampling fluctuations and may not represent the true population effect especially with limited data. Assuming that some prior knowledge of parameters independently of the sample data is available, one idea is to allow the inclusion of prior knowledge into statistical inferences and derive a set of possible solutions based on the priors specified. The Bayesian inference, as a popular alternative to its frequentist counterparts, is able to serve this purpose by incorporating prior knowledge with a likelihood function to form the posterior distribution of the parameter. The posterior inference is based on a probability distribution that quantifies the uncertainty of the parameter.
The two distinctive approaches in Bayesian inferences are the empirical Bayes approach and the hierarchical Bayes approach, with the latter also referred to as full Bayes approach. The two approaches differ in how the hyperparameters (parameters in a prior distribution) are specified. In the empirical Bayes approach, hyperparameters of a prior are fixed values or point estimates, determined by aggregated quantities such as the measures of moments from related studies. In practice, the values of hyperparameters are rarely known a priori from limited samples and may hardly reflect the true distributions of model parameters, potentially leading to distorted Bayesian inferences (Depaoli, 2014). In the hierarchical Bayes approach, the uncertainty of hyperparameters can be further modeled via their own priors with additional parameters, namely, hyper priors (Carlin & Louis, 2008; Gelman et al., 2013). For example, in an RCT, the treatment effect can be assigned a normal prior (Level 1), in which the mean and precision (reciprocal of the variance) hyperparameters follow certain distributions represented by the hyper priors (Level 2), allowing randomness modeled at both levels.
In spite of adding a second level in the model hierarchy, the specification of hyper priors is still crucial and can affect the modeling results to varying extents (Malinverno & Briggs, 2004; Miranda-Moreno et al., 2013). A few studies have investigated hierarchical models in various fields of research, including road safety analyses (Heydari et al., 2014; Lord & Miranda-Moreno, 2008; Miranda-Moreno et al., 2013; Yu & Abdel-Aty, 2013) and educational research (Gelman, 2006b; Jacobucci & Grimm, 2018). All demonstrated that the use of properly constructed informative hyper priors can enhance parameter estimates. Nevertheless, little research has been centered on developing informative hyper priors based on historical data for the treatment effect in an RCT. Our study addresses this gap by introducing a framework to construct informative hyper priors for both the mean and the precision hyperparameters for the treatment effect in an RCT. The performance of the hierarchical Bayes approach was also compared with the empirical Bayes approach (using correct and inaccurate priors) and the ordinary least squares (OLS) method.
Bayesian Estimation
Bayesian methods treat all parameters as random; each follows a specific distribution. Based on Bayes’s theorem the posterior distribution
where
Due to the intensive computation for the normalized constant, the posterior is typically approximated by randomly drawing a large number of samples from the posterior distributions of parameters using Markov chain Monte Carlo (MCMC) methods such as the Gibbs sampler (Geman & Geman, 1984). The Gibbs sampler takes a random walk of the state space according to the target posterior distributions. As the number of draws increases, the MCMC chain will eventually converge to a stationary posterior distribution. The convergence can be checked using the trace plots or the proportional scale reduction method (Gelman & Rubin, 1992). Earlier samples before the convergence are discarded as the burn-ins. Samples drawn after the convergence can be used to construct the posterior distribution and make Bayesian inferences based on properties of the posterior distribution (e.g., mean, median, quantile, and variance).
Bayesian methods have a number of advantages over frequentist methods. First, information from similar past studies with respect to the parameters of interest can be combined in a form of prior distributions to derive posterior distributions. Assigning accurate and strong informative priors has been shown to improve parameter estimates especially in small samples (Heydari et al., 2014; Lee, 2007; Lord & Miranda-Moreno, 2008). Second, with the control of prior distributions (i.e., the amount of information given to the parameters), Bayesian methods can estimate complex statistical models that may not be estimable using frequentist methods (Heydari et al., 2014; Muthén & Asparouhov, 2012). Third, Bayesian statistics can adopt hierarchical models in which a range of prior choices can be tested through hyper priors to improve accuracy of parameter estimation (Gelman et al., 2013).
Bayesian Method to Estimating Treatment Effect
To investigate the treatment effect in an RCT, consider a regression model
where for the ith person,
where mean
Depending on the availability of prior knowledge, the prior can range from noninformative (broad knowledge about parameters and trivial impact on posteriors) to informative (definite knowledge and substantial impact). A noninformative prior with a low precision (i.e., large variance) contributes little information, leaving the posterior estimate close to the frequentist estimate. Noninformative priors may be assigned to nontarget model parameters (e.g.,
Empirical Bayes Model
In an empirical Bayes model, prior hyperparameters µ and τ in Equation (3) are given constant values, which can be obtained in a variety of ways. One easy way is to assign a noninformative prior with small τ, which yet produces little impact on the formation of the posterior. Informative priors or weakly informative priors can also be used, in which the prior mean µ and precision τ are often determined based on expert opinions and historical data such as previous meta-analytic research findings. For example, meta-analytic estimates of the effect sizes can be incorporated as informative priors in a Bayesian analysis (Steel et al., 2015).
Imposing meaningful informative priors has been shown to improve parameter estimates especially with small sample sizes (Lee, 2007; Liang & Yang, 2014; Steel et al., 2015). Nonetheless, given that the true prior distribution is mostly unknown in practice, one should be cautious about the impact of assigning inaccurate informative priors on parameter estimation (Depaoli, 2014; Finucane, 2018; Liang et al., 2018; Little, 2006). Finucane (2018) conducted a simulation study to examine the impact of correct and various inaccurate priors for estimating the effect size of an intervention in RCTs. Results showed that the use of inaccurate priors could lead to misleading posterior inferences of parameters. In addition, although data-driven informative priors have been widely used in empirical Bayesian data analyses, they are limited in that the variability of the hyperparameters cannot be quantified. A sensitivity test with various prior choices is always recommended to examine the stability of results in an empirical Bayes model. A general guideline from Depaoli and van de Schoot (2017) suggests that results from noninformative and informative priors are first compared, and then informative priors altering the mean and variance hyperparameters upward and downward can be further investigated.
Hierarchical Bayes Model
We propose to use a hierarchical Bayes model with hyper priors to investigate the treatment effect in RCTs. The use of hyper priors in practical Bayesian analyses is not uncommon (e.g., Malinverno & Briggs, 2004; Natesan et al., 2016). Our study, notwithstanding, is unique by describing a new framework for formulating informative hyper priors in the context of RCTs.
In a hierarchical Bayes RCT, we assume that a random sample of treatment effect sizes from a population of effect sizes can be obtained from previous studies or experiments. The goal is to utilize information from these sample effect sizes to develop informative hyper priors that characterize the distribution of the sample effect size mean and precision, respectively. A common distribution for the effect size mean is the normal distribution, and for the effect size precision is the gamma distribution. To quantify the randomness in the mean and precision hyperparameters, hyper priors are placed on hyperparameters
and
Specifically, each mean hyperparameter
The question becomes how we can use a random sample of effect sizes to derive the four parameter values (
The above equations indicate that the normal hyper prior on
The formulation of informative hyper priors for
Note that alternative approaches are available for formulating hyper priors through prior knowledge. For example, Yu and Abdel-Aty (2013) introduced four approaches of formulating informative priors, for negative binomial and Poisson lognormal models, including (1) the two-stage Bayesian updating approach (first, obtain posterior parameter estimates with noninformative priors, and then use those posterior estimates to develop informative priors in a later Bayesian analysis), (2) the ML approach (use ML estimates for hyperparameters), (3) the method of moments (e.g., use mean and variance of the variable), and (4) the expert experience method. Findings indicated that the two-stage Bayesian updating approach performed the best given sufficient data, and data-driven informative priors generally enhanced the parameter estimates compared with noninformative priors.
The Current Investigation
To our knowledge, much simulation research in education and psychology to date has focused on manipulating empirical priors to improve posterior parameter estimates such as setting the prior hyperparameter values to be higher or lower than the true values (e.g., Finch & Miller, 2019; Marcoulides, 2018). Among the research on empirical priors, developing data-driven priors has drawn attention and has been shown to improve model estimation. For example, McNeish (2016) demonstrated how to use ML estimates as the hyperparameter values to construct data-driven priors to address the small-sample problem. Marcoulides (2017) proposed a Bayesian synthesis approach utilizing a sequential process to create a single posterior distribution using multiple available data sets. More specifically, given m sample data sets containing prior information, estimates from one data set will be used as the prior to estimate the next data set. This process is repeated until information from all m data sets is incorporated and a final posterior distribution is obtained. This approach has been shown to be highly effective given large sample sizes with the fused data. Nevertheless, the use of data-driven priors needs to take extreme cautions. As pointed out by Darnieder (2011), one problem with data-dependent priors is that the data are used twice: one for obtaining the prior and the other in the process of combining the prior and the likelihood. Darnieder suggested to make an adjustment to the likelihood function to make it conditional on the data that provide prior information.
We classify the aforementioned data-driven approaches as empirical Bayes approaches, because estimates from sample data were used as hyperparameter values instead of being used to form hyper priors for the hyperparameters. The main distinctions between hyper priors and empirical priors are how the priors are assigned and how many levels of hierarchy are present. If the data-driven priors are assigned directly to the model parameters, there is only one level of hierarchy, and they are empirical priors. If the data-driven priors are employed to the hyperparameters of the priors on model parameters, then there are two levels of hierarchy, and they are data-driven hyper priors. Our study used the latter approach (i.e., hierarchical Bayes approach), by which the data-driven hyper priors were assigned to the mean and variance hyperparameters of the prior for the treatment effect, instead of being directly imposed on the treatment effect parameter. Accordingly, our approach differentiates itself from the empirical prior studies mentioned earlier.
In fact, little research has applied hierarchical Bayes models in educational research. As one example, Gelman (2006b) used hyper priors on variance parameters in multilevel modeling of school data and recommended the use of hyper priors estimated from the data. However, the example provided in Gelman’s study did not explicitly indicate how the hyper priors were derived from the data. Outside educational research, hyper priors have received wider attention in road safety research (Farid et al., 2017; Heydari et al., 2014; Miranda-Moreno et al., 2013; B.-J. Park et al., 2010; Yu & Abdel-Aty, 2013). Nonetheless, in road safety analyses, the outcome of interest is typically crash/accident occurrences (the counts), and thus the common adopted models are count models, such as Poisson, negative binomial, Poisson–lognormal, and Poisson–gamma models. These models are rarely used in RCTs where the outcome variable is typically continuous and approximately normally distributed. Thus, the analytical framework proposed in the road safety research may not be generalizable to the context of RCT research.
Furthermore, across various fields of research, studies that utilized hierarchical Bayes approaches mostly focused on formulating the hierarchy of variance hyperparameters to improve estimation accuracy using hyper priors. Few studies have focused on constructing hyper priors for both the mean and the variance hyperparameters. In addition, noninformative or rather vague hyper priors are usually adopted in practice that allow for data “speaking for itself,” which, however, may not work well in all cases especially when the sample is small and not a representative of the population (Miranda-Moreno et al., 2013). The use of accurate informative hyper priors has unique benefits for modeling data with small sample sizes. An effective prior distribution would regularize the posterior estimate to be close to the population value. With that said, one most important but difficult task in a Bayesian analysis for applied researchers is to determine effective informative priors, specifically in this study, informative hyper priors, which is the main issue addressed in our study.
Therefore, the primary goal of the current study was to examine the hierarchical Bayes approach described above through a simulation study varying in the sample treatment effect size, treatment/control group size ratio, and sample size. As a comparison, we also analyzed the same simulated data sets using the empirical Bayes approach with various prior choices and the OLS method.
Method
Data Generation
Data were generated based on a two-group RCT design following the steps below repeatedly for 500 replications using the computer program R (R Core Team, 2019). First, data for the treatment group were generated following a normal distribution N(β1, 1), where β1 is the treatment effect size in Equation (2). Second, data for the control group were generated following a standard normal distribution N(0, 1). Third, the two groups of data were combined into one data set, in which a group identifier was assigned, with 0 indicating the control group and 1 indicating the treatment group.
The manipulated factors included four sample treatment effects β1 (0, 0.1, 0.2, and 0.3), three ratios of the treatment group size to control group size, nT/nC (1:1, 1:2, and 1:4), and four total sample sizes (60, 120, 480, and 960), creating a total of 48 conditions for data generation. Specifically, the sample treatment effects as standardized mean differences ranging from 0 to 0.3 represent no to medium impacts (Cohen, 1988; Ferguson, 2009). Larger treatment effects were not considered because they are often estimated well using conventional approaches with a sufficiently large sample size. We assumed that sample treatment effects come from a population of effect sizes following a normal distribution N(0.1, 0.42), 1 consistent with Finucane (2018). Hence, the sample treatment effect equal to 0.1 represents a scenario where the sample treatment effect to be estimated is equal to the population mean of effect sizes. The treatment effect of 0 indicates 0.25 SDs below the population mean, and treatment effects of 0.2 and 0.3 indicate 0.25 and 0.5 SDs above the population mean, respectively. For the ratio of treatment to control group sizes, a ratio of 1:1 indicates an equal sample size between groups. A ratio of 1:4 indicates that the treatment and control groups account for 1/5 and 4/5 of the data, respectively, which is commonly encountered in practice. A wide range of sample sizes from 60 to 960 was considered. Relatively small sample sizes were of particular interest because the use of accurate informative priors has been shown to improve model estimation (Depaoli, 2014; Finch & Miller, 2019; Lee, 2007; Liang & Yang, 2014).
Data Analysis
Data analyses were conducted using R. For Bayesian estimation, JAGS (Just Another Gibbs Sampler; Plummer, 2003) was used to obtain the Bayesian inferences through the R package rjags (Plummer et al., 2016). The Gibbs sampler (Geman & Geman, 1984) was used to draw posterior samples with two MCMC chains. For each chain, we first conducted an adaption phase with 1,000 iterations, and then performed the burn-in phase with another 1,000 iterations. After that, we ran 10,000 iterations for each chain and summarized the posterior distributions of parameters using the samples drawn from both chains. Trace plots were examined to ensure the convergence. Each generated data set was estimated using three approaches: hierarchical Bayes approach (three hyper priors), empirical Bayes approach (twelve priors), and the OLS approach. Accordingly, the data analysis conditions were 768 (48 data generation conditions × 16 estimation approaches).
Hierarchical Bayes Approach
With the hierarchical Bayes approach, we assumed that a collection of sample effect sizes are readily available from previous studies. Information from these effect sizes can be aggregated to construct informative hyper priors through hierarchical modeling. We first defined the population distribution of effect sizes to follow N(0.1, 0.42). Then for each replication, we randomly drew from the population distribution 10, 20, and 100 sample effect sizes, which represent typical numbers of effect sizes that may be obtained in practice. Based on the sample effect sizes generated, the four hyperparameters
The Bayesian inferences were evaluated using the bias, root mean square error (RMSE), and stated probability (StProb). The bias was computed as the average difference between the estimated and true treatment effects across replications, expressed as
where
Bias indicates the systematic estimation bias resulting from the estimation approach used. RMSE evaluates both the systematic bias and the sampling variation.
The StProb is defined as the proportion of posterior treatment effect estimates greater than a cutoff value L as
Empirical Bayes Approach
With the empirical Bayes approach, the hyperparameters of μ (mean) and τ (precision) in the empirical prior in Equation (3) were given fixed values. The prior mean was examined at four levels: 0, 0.1, 0.2, and 0.3. The prior mean of 0.1 was consistent with the population treatment effect, and 0, 0.2, and 0.3 indicated inaccurate representations of the population treatment effect. The prior SD (
The prior distribution of N(0.1, 0.42) was consistent with the population distribution of effect sizes, and we expect posterior results to be the most accurate with this prior. The prior distributions with an incorrect mean and a small variance are considered incorrect informative priors, for example, N(0.3, 0.22). This mirrors the scenario when a researcher is very certain about an incorrect belief on the magnitude of the treatment effect; adopting an incorrect informative prior may distort posterior estimates. Notwithstanding, research has shown that the use of inaccurate priors with relatively large variances may still yield more precise parameter estimates than the use of noninformative priors (Depaoli, 2014; Finch & Miller, 2019). The prior settings in empirical Bayes models allowed for investigating the impact of a variety of priors on the estimation results. The estimation of treatment effects was evaluated using the bias and RMSE.
Ordinary Least Squares Approach
As a frequentist counterpart to the Bayesian estimation, the OLS approach was also used to estimate the treatment effect. The OLS was implemented in R using the lm() function. Because prior knowledge is not considered in OLS, estimates are influenced by distinctive characteristics of a specific sample. If the sample obtained is biased against the population (i.e., sampling bias exists), the sample treatment effect estimate may also be biased against the population treatment effect. The outcomes from the OLS were evaluated using the bias and RMSE.
Results
We aggregated and reported results for each analysis condition across replications. The outcomes of interest (bias, RMSE, and StProb) for varying analysis conditions are presented. Results for the treatment/control group ratio of 1:2 were not reported because the patterns were similar to those for other treatment/control group ratios.
Performance of Hierarchical Priors
Figure 1 shows the bias for posterior treatment effect estimates using the hierarchical Bayes approach. The figure was disaggregated by the sample treatment effects and the treatment/control group ratios and included the numbers of effect sizes used for constructing the hyper priors and the sample sizes. In general, when the sample treatment effect was equal to the population mean effect size (0.1), the estimates were the most accurate regardless of the number of effect sizes obtained from prior research. The posterior means were underestimated when the sample treatment effect was below the population mean effect size and overestimated when the sample effect was above the population mean effect size. As the number of effect sizes increased, an obvious decrease in bias was observed for all sample sizes, and the difference in bias across sample sizes diminished. This was because more accurate hyperparameters for the hyper priors were obtained with an increasing number of effect sizes (see Equations 4 and 5). A more accurate prior dragged the posterior away from the incorrect likelihood and thus provided a treatment effect estimate closer to the population mean effect size.

Bias of posterior treatment effect estimates using the hierarchical Bayes approach.
It is notable that estimates from smaller samples were more accurate than those from larger samples. It is counterintuitive because conventionally larger sample sizes lead to more accurate parameter estimates. In our scenario, however, when the sample treatment effect is biased in a larger sample, it indicated a stronger but less accurate likelihood function for the treatment effect. As sample sizes increased, the likelihood was more likely to influence the formation of posteriors, and if the likelihood was incorrect, it could produce more biased posterior estimates even with informative accurate priors. The role of priors was more important in smaller samples. With an increasing number of effect sizes available from past studies, the hyper priors became more informative and accurate and regularize parameter estimates in smaller samples more efficiently.
Figure 2 shows the RMSE of estimating the treatment effects using hyper priors. When the sample treatment effects were between 0 and 0.2 (0.1 is the population effect size), RMSE were smaller with larger sample sizes. This pattern of RMSE was opposite to the pattern of bias where larger samples produced larger bias (Figure 1). As RMSE considers both the bias and the sampling variation, this indicated that the sampling errors in large samples were small enough to counterweight the large bias in computing RMSE. Nevertheless, when the treatment effect was 0.3, larger samples tended to yield larger RMSE. This may be because the bias in large samples was large enough, so even combining with trivial sampling variation, the RMSE was still the greatest. The differences in RMSE across sample sizes reduced as the number of effect sizes increased. This indicated that when sufficient prior knowledge was available, the hyper priors could yield reasonably low RMSE regardless of the sample sizes. With balanced group sizes (i.e., treatment/control group ratio = 1:1), the RMSE were slightly lower than those with unbalanced groups sizes (i.e., treatment/control group ratio = 1:4).

Root mean square errors of posterior treatment effect estimates using the hierarchical Bayes approach.
The stated probabilities at the upper 25% (StProb1), 50% (StProb2), and 75% (StProb3) of the posterior distribution of the treatment effect were evaluated. Their summary plots are provided as supplemental materials. StProb2 indicates the median of the posterior distribution, consistent with the bias pattern in Figure 1. The StProb2 were closely around the theoretical 50% when the sample treatment effect was equal to the population effect size mean; the StProb2 were below or above 50% to varying extents if the sample treatment effect was lower or higher than the population effect size mean, respectively. When the number of effect sizes were relatively large (e.g., 100), the StProb2 were the closest to the theoretical 50% StProb for all sample sizes. For the treatment effect of 0.1 (population effect size mean), comparing StProb1 and StProb3 to their theoretical probabilities of 25% and 75%, respectively, revealed that when sample sizes were 480 or less, the posterior distribution of the treatment effect was platykurtic compared with the population distribution of effect sizes N(0.1, 0.42). With a large sample size of 960 and a balance group design, the three StProb matched the theoretical probabilities of 25%, 50%, and 75%. For other sample treatment effects, platykurtic distributions were also observed in most conditions.
Performance of Empirical Priors
Figures 3 and 4 show the bias and RMSE using empirical priors when the sample treatment effect was equal to 0.1 (population effect). The figures were disaggregated by the prior means and treatment/control group ratios and included the prior SDs and sample sizes. Based on the bias of the posterior treatment effect, it was revealed that the estimates were the most accurate regardless of the prior SDs and sample sizes when the prior mean was equal to the population effect size mean of 0.1. When the prior mean was deviated from the population effect, using a noninformative prior with a large SD resulted in the least bias because the prior had little effect on the formation of the posterior. With larger sample sizes (which provided strong and correct likelihood), the bias decreased even with inaccurate informative priors (i.e., incorrect prior mean, small variance). For RMSE, similar patterns were observed across all conditions; larger samples yielded smaller RMSE. Compared with hierarchical models by which the bias was nearly zero for all sample sizes when the treatment effect equaled to 0.1, the bias with empirical priors was heavily dependent on the prior mean, prior SD, and sample size. The bias with empirical priors could go up to 0.15 when using an incorrect informative prior N(0.3, 0.22) in small samples (n = 60). Furthermore, the RMSE with hyper priors were up to around 0.15, whereas RMSE with empirical priors could go up to 0.28 in a balance design and 0.35 in an unbalance design.

Bias of treatment effect estimates using the empirical Bayes approach when treatment effect size is equal to 0.1.

Root mean square errors of treatment effect estimates using the empirical Bayes approach when treatment effect size is equal to 0.1.
Figures 5 and 6 show the bias and RMSE using empirical priors when the sample treatment effect was equal to 0.3 (0.2 above the population effect). In all conditions, the treatment effects were overestimated to varying extents. The bias was reduced as the prior mean, prior SD, and sample size decreased. With a prior distribution equal to the distribution of population effect sizes N(0.1, 0.42), the bias ranged between 0.13 and 0.20, with the lowest bias associated with the smallest sample size of 60. In contrast to hierarchical Bayes models (see Figure 1), the bias with hyper priors ranged from 0.02 to 0.16 across all conditions; particularly with 100 effect sizes as prior information, the bias was not larger than 0.10. For RMSE using the empirical Bayes approach, when the prior was noninformative, the smallest RMSE were associated with the greatest sample sizes. When the prior was informative, RMSE generally decreased due to the decrease in sampling variation. The RMSE with empirical priors could go up to 0.42, whereas the RMSE with hyper priors went no higher than 0.19. Results of bias and RMSE when sample treatment effects equal to 0 and 0.2 using the empirical Bayes approach can be found in the online supplemental material.

Bias of treatment effect estimates using the empirical Bayes approach when treatment effect size is equal to 0.3.

Root mean square errors of treatment effect estimates using the empirical Bayes approach when treatment effect size is equal to 0.1.
Performance of Ordinary Least Squares
Figures 7 and 8 show the bias and RMSE using the OLS method, in which no priors were used. In general, bias was similar across all sample sizes and reflected the difference between the sample treatment effect and population mean effect size. In other words, the treatment effect estimates depended heavily on the sample data using OLS. When the sample treatment effect was not representative of the population effect, the OLS estimates were biased with respect to the population effect. The RMSE showed that larger samples yielded smaller RMSE. Because RMSE is a function of bias and sampling variation and bias was similar across sample sizes, this indicated that the larger RMSE in smaller samples mainly resulted from the larger sampling errors. A similar pattern was observed for different treatment/control group ratios. The treatment/control group ratios did not seem to affect the bias, yet unbalanced groups were shown to have larger RMSE.

Bias of treatment effect estimates using the ordinary least squares approach.

Root mean square errors of treatment effect estimates using the ordinary least squares approach.
Discussion
One important practical consideration in a Bayesian analysis is to develop an effective prior distribution that may improve parameter estimation utilizing information from past studies. This article proposed a framework to develop informative hyper priors using a hierarchical Bayes approach for estimating the treatment effect in RCTs, presuming a random sample of effect sizes are available from prior research. The present study contributes to the understanding of the hierarchical Bayes approach in comparison with the empirical Bayes and OLS approaches for estimating the treatment effect under various scenarios.
Findings from our study indicated that the hierarchical Bayes approach generally outperformed the traditional empirical Bayes approach and the OLS method, especially with small samples. With the hierarchical Bayes approach, when more sample effect sizes were obtained from past studies, the treatment effect was estimated closer to the population effect size regardless of the sample sizes. This highlights the advantages of using an accurate, small-variance hyper prior on regularizing the posterior distribution of the treatment estimate. Even the number of sample effect sizes obtained was small (e.g., 10), the treatment effect estimates were still improved to varying extents; estimates improved notably more with smaller sample sizes, because the use of informative hyper priors may dominate the posterior formation more than the likelihood function. With larger sample sizes, data likelihood is more likely to dominate the posterior distribution; however, an associated risk is that an inaccurate data likelihood leads to biased treatment effect estimates without strong accurate priors. In addition, the RMSE from the hierarchical Bayes approach were generally lower than those from the empirical Bayes and OLS approaches. The above findings are partially in line with prior research (Lord & Miranda-Moreno, 2008; Miranda-Moreno et al., 2013), which have shown that the use of informative hyper priors led to more accurate parameter estimation. Note that they used hyper priors only on the precision parameters, unlike the present study.
The use of hyper priors provided a more accurate treatment effect estimate, even compared with using a correct prior distribution, that is, N(0.1, 0.42), in an empirical Bayes model, particularly in scenarios where the sample treatment effect was not equal to the population effect size. The correct prior distribution quantifies the distribution of all possible effect sizes in the population, which has a positive effect on the posterior formation. In contrast, the hyper priors describe the sampling distributions of the effect size mean and precision, respectively. Given a sufficient number of effect sizes obtained, hyper priors not only can provide an accurate estimate of the effect size mean but also can allow a much smaller variability of the effect size mean—compared with N(0.1, 0.42). This is analogous to the population distribution and the sampling distribution of the mean. In other words, the sampling distribution always has a smaller variance than the population distribution when the number of effect sizes obtained from past studies is greater than one. Therefore, the use of hyper priors formulated based on the proposed method in this study mostly provided more precise parameter estimates than the use of empirical priors. The advantage of using hyper priors was more obvious with smaller sample sizes and when the treatment effect from a random sample was different from the population effect size. It is notable that, if the treatment effect was equal to the population effect size mean (e.g., in our case, 0.1), the hierarchical Bayes approach performed similarly to the empirical Bayes approach that assigned a correct prior mean (the same as the population effect size mean) or a noninformative prior. For other empirical models (with other prior specifications), the hierarchical Bayes approach also outperformed.
As the treatment effect in an RCT is usually unknown in practice, it is important that the hyper prior on the mean hyperparameter reflects the distribution of the sample effect size mean meaningfully. Research has suggested that the mean hyperparameter has notable impact on the parameter estimates (Finucane, 2018; Xiao et al., 2019). We demonstrated that the use of an effective hyper prior on the mean hyperparameter as described in this article could reasonably quantify the distribution of the mean hyperparameter and thus lead to accurate treatment effect estimates. As previous studies on hyper priors mostly focused on developing hyper priors for the variance hyperparameters (Gelman, 2006b; Miranda-Moreno et al., 2013), our study sheds light on the importance of formulating hyper priors for the mean hyperparameter.
For applied researchers, the results from this study encourage using the hierarchical Bayes approach over the empirical Bayes and OLS approaches to estimate treatment effects in RCTs. The hierarchical Bayes approach is advantageous because it provides similar or most often more accurate parameter estimates than the other two approaches. The empirical Bayes results heavily depend on the prior distributions assigned, and a sensitivity test is always necessary to examine the impact of priors (e.g., Liang, 2020). With the OLS method, all estimates are capitalizing on sampling fluctuation, and no previous evidence about the treatment effect can be incorporated. Although a sensitivity test can also be performed on the hyper prior distributions, the current investigation shows that the hyper priors formed using the method proposed resulted in low bias and RMSE especially with small sample sizes and/or a large number of obtainable effect sizes. In practice, as the sample size increases, we would expect the sample to be more representative of the population. Therefore, a large biased sample as we simulated in some conditions may be less often seen in empirical research.
Based on the discussion above, in real data applications, to improve the estimation of the treatment effect in RCTs, we recommend to first search for relevant studies with necessary statistics to calculate the effect size representing the treatment effect to be estimated. After a set of effect sizes are obtained from past studies, the researcher can compute the hyperparameter values for hyper priors following formulas in Equations (6) through (9). Then the hierarchical Bayes model can be implemented to obtain the posterior estimate of the treatment effect. As demonstrated in this study, even prior information from a small number of effect sizes could help improve parameter estimates.
The benefits of using hyper priors in estimating treatment effects in RCTs could lead to possible extensions of the hierarchical Bayes approach as future research directions. One possible extension would be to apply hyper priors in random effects models (aka, multilevel models), where selected parameters are assumed to be random at a higher level. Although research on hyper priors has emerged in multilevel modeling (Gelman, 2006a, 2006b), how to formulate hyper priors using past evidence has not been fully addressed and investigated. Another research direction would be to extend hierarchical Bayes models to structural equation modeling to improve parameter estimates, such as factor loadings, structural coefficients, and factor covariances. Moreover, as models become more complex, assessing model fit and conducting model selections among a set of candidate models are typically a common practice. Investigating various Bayesian model fit evaluation and selection methods (Garnier-Villarreal & Jorgensen, 2020; Gelman et al., 2013; Liang & Luo, 2019; Vehtari et al., 2017) in the context of hierarchical Bayes models would also be a future direction of research.
Last, the integrated nested Laplace approximation (INLA) technique has been introduced recently as an alternative to the MCMC simulation technique due to its fast implementation for making Bayesian inferences (e.g., Rue et al., 2009; Serhiyenko et al., 2016). Because of INLA’s efficiency, more models may be run within a much shorter time period, which allows for the sensitivity tests of hyper priors with a wide range of hyperparameter values. It would be worth exploring the use of informative hyper priors in INLA models and understanding how various hyper priors affect the results. In addition, it would be interesting to compare the results from the INLA and MCMC models and observe whether conclusions may change depending on different model estimation techniques.
Supplemental Material
EPM_supplements – Supplemental material for Hierarchical Bayes Approach to Estimate the Treatment Effect for Randomized Controlled Trials
Supplemental material, EPM_supplements for Hierarchical Bayes Approach to Estimate the Treatment Effect for Randomized Controlled Trials by Xinya Liang, Akihito Kamata and Ji Li in Educational and Psychological Measurement
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
