The effect of an independent variable on a dependent variable is often evaluated with hypothesis testing. Sometimes, multiple studies are available that test the same hypothesis. In such studies, the dependent variable and the main predictors might differ, while they do measure the same theoretical concepts. In this article, we present a Bayesian updating method that can be used to quantify the joint evidence in multiple studies regarding the effect of one variable of interest. We apply our method to four studies on how trust in social and economic exchange depends on experience from previous exchange with the same partner. In addition, we examine five hypothetical situations in which the results from the separate studies are less clear-cut than in our trust example.
Researchers may have several data sets that can be used to address a research question with respect to the relation between two variables. The main variables of interest may be operationalized in different ways, be measured on different scales, and the statistical model used to relate both variables may differ between studies. This article shows how Bayesian updating can be used to summarize the evidence in the data sets for hypotheses on the relation between the two variables. Before introducing our method, we briefly present a case study that we use for illustrative purposes and we sketch the limitations of meta-analysis if studies address the same relation, but employ different variables and models.
Research in economic sociology and social dilemma research on trust problems in exchange relations (see Raub and Buskens 2008; Buskens and Raub 2013, for overviews) often studies how trust depends on prior exchange and interactions between the partners. For example, Batenburg, Raub, and Snijders (2003) study the extent to which buyers of information technology (IT) products trust their sellers using a survey on about 1,000 buyers of IT products. One of their hypotheses is that if the buyer has had positive experiences with the seller from transactions in the past, the buyer trusts the seller more in the present transactions. To test this hypothesis, they analyze the effect of the variable “past” (a measure for the amount of positive past experiences) on the dependent variable “lack-of-trust,” which is measured by the extent to which the buyer invests in management of the relation such as writing a contract to prevent the seller from untrustworthy behavior. Batenburg et al. (2003) use a linear regression model with additional independent variables to test this hypothesis. Rooks et al. (2000) and Buskens and Raub (2002) test the same hypothesis with similar variables, using a vignette experiment with hypothetical transactions. In this experiment, purchase managers decide how much time and effort they want to invest to prevent untrustworthy behavior of their seller, while the past experiences of the buyer with the seller are one of the variables describing the hypothetical transactions. They use a linear regression model, too. Two further studies have been used to test the same hypothesis. Buskens and Weesie (2000) use another vignette experiment to test whether past experiences have an effect on trust of students in a secondhand car dealer. Here, trust is measured by the choice between two dealers. Thus, trust is measured as a dichotomous variable. Consequently, Buskens and Weesie use a probit analysis to test the hypothesis. Finally, Buskens, Raub, and Van der Veer (2010) test whether past experiences have an effect on trust in a laboratory experiment in which subjects have to decide whether to trust another subject. Because subjects play a series of these interactions with the same partner, subjects can make their behavior conditional on the partner’s behavior in the past. The choice between trusting or not trusting is again a dichotomous variable and is analyzed via a three-level logistic regression. Table 1 provides an overview of the studies.
Note. denotes the dependent variable and the predictor of interest.
a“Network embeddedness” means “network of the exchange partners with third parties.”
In each study, the authors are interested in whether past has a positive effect on trust. In null hypothesis testing, one usually tests whether past has no impact on trust versus it has a (positive/negative) effect. Another method to evaluate a positive (or negative) effect is by quantifying evidence for the three effects/hypotheses , , and/or . Royall (1997) describes how evidence for the hypotheses at hand can be quantified using the likelihood ratio test. In Bayesian model selection, one can also quantify evidence for several hypotheses at hand using the Bayes factor (BF), which is equal to the likelihood ratio test for point hypotheses (like ) and can be seen as a generalization of the likelihood ratio test for other hypotheses. The BF gives the relative support for each hypothesis, enabling statements of the type “is 10 times as likely as .”
Combining Effect Sizes Versus Updating Evidence
To combine multiple studies, one can employ (Bayesian) meta-analysis (among others, Lipsey and Wilson 2000; Cooper, Hedges, and Valentine 2009). We briefly discuss meta-analysis and its limitations. Subsequently, we describe our own method. Table 2 provides an overview of the differences between the methods.
Meta-Analysis Versus Bayesian Updating.
Meta-Analysis
Bayesian Updating
Effect size
Required
Not requireda
Design
Equal across studies
Equal or unequal across studies
Main results
Estimate of effect size (or parameter) and corresponding value or confidence interval
Evidence; that is, posterior model probabilities
aOur method uses the parameter estimates and their standard errors of each study, but it does not require that they can be transformed into one effect size, like Cohen’s or .
Meta-analysis can be based on the parameter estimate and its standard error or on a corresponding effect size for each study. When one is interested in the parameter , the hypotheses to be tested are versus , or . Meta-analysis results in a parameter estimate or an estimate of effect size based on all studies and a corresponding p value. This estimate is only interpretable when the parameters or effect sizes are comparable. Hence, one cannot use parameters from another type of model. Also, one cannot use effect sizes that cannot be transformed into one type (e.g., the hazard ratio and the odds ratio). In addition, the design of the model should be the same in all studies, that is, the predictors in the model should be the same for all studies. Namely, the parameter estimate or effect size is a conditional one and, therefore, it might change when adding or discarding predictors. Thus, parameter estimates and effect sizes cannot be compared combining multiple studies with different models can also be based on p values. However, a drawback of this method is, among other things, that p values not only reflect effect size but also the number of observations (Lipsey and Wilson 2000; Cooper et al. 2009).
Note that the types and designs of the models employed in our case study differ in various ways (see Table 1). Each of the studies tests (among other things) whether there is an effect of the variable past (a measure for the amount of positive past experiences) on trust. In every study, trust and past are measured by different variables. Actually, in the first two studies, the effect of past on lack-of-trust is inspected. Consequently, we multiplied the estimates of the first two studies by minus one. Also, trust is measured on a different scale in each model. Therefore, the studies employ different models. Each model also includes different sets of other predictors. Despite all these differences, the predictor past measures the same concept in all studies. Nevertheless, meta-analysis cannot be employed to combine the four studies regarding trust.
To combine multiple studies of different types and designs, but regarding one theoretical concept, we introduce a Bayesian updating method. In this method, as opposed to meta-analysis, the hypotheses do not address the specific parameter , a parameter that is the same in all studies. Instead, it covers an underlying effect and uses the parameters (for ) of the studies, since they may not be comparable. Nevertheless, they are indicative for the same underlying effect. The method can be employed to evaluate the following hypotheses:
Notably, our method does not combine estimates but the evidence for a positive (), negative (), and null effect () of the predictor of interest (which measures one theoretical concept, say, past) on the dependent variable (which measures one theoretical concept, say, trust).
The input for our method is the estimate of the parameter of interest () and its standard error (). This input can be obtained in two ways, from the data or by simply using the values of the parameter estimates reported in the studies. Note that all the necessary information in the data with respect to β is adequately summarized by using and . For the four studies that we use as an example, the parameter estimates corresponding to past () and the standard errors () are provided in Table 3. Thus, this method does not require an effect size, like Cohen’s , or odds ratio, or comparable parameter estimates, and different types of models as well as different sets of predictors may be used in each study.
The Parameter Estimates ( and for Study ) and Corresponding One-Sided Values of the Four Studies for Trust.
It should be stressed that the researcher has to make sure that the s do reflect comparable relationships between the two key variables. Although adding a control variable usually affects only the magnitude of , it sometimes renders a change in the sign of . Hence, one should pay attention to the model specification. For instance, if in one study the relation between trust and past is examined in a regression model and in a second study this relation is examined with a logistic regression and is also modeled as being conditional on various predictors that characterize the network of the exchange partners with third parties, one should be careful that both models do inspect the same theoretical relationship between trust and past. It is up to the researcher to decide whether the s reflect the same theoretical relationships.
Figure 1 shows how all the parameter estimates of all studies are used for updating the evidence/support for the three hypotheses. We now briefly sketch the updating (a more detailed discussion follows in a subsequent section). First, we assume that all three hypotheses are equally likely and initialize the so-called prior model probabilities () of each hypothesis to be for ; that is, can take on one of the values in the set comprising the subscripts of , , and . The prior model probability (PrMP) is a number on a scale of 0 to 1, which quantifies the weight attached to the current hypothesis. Subsequently, we start with one study and use its parameter estimate and standard error to calculate or approximate the likelihood. Based on the likelihood, the BF for each hypothesis is determined. Bear in mind that the BF quantifies the support of the data of a pair of hypotheses. We employ , that is, the BF of (i.e., , , or ) versus a hypothesis without constraints on the parameter of interest. If, for instance, equals ten, then has 10 times more support than the hypothesis without constraints. Since the unconstrained hypothesis is not of importance in this article, is just a useful technical tool. Based on these BFs and the initial PrMPs, the PMPs () for the three hypotheses () can be assessed, which reflects the evidence/support in the data for the three hypotheses when evaluating solely Study 1. Then, these PMPs are used in the calculation of the PMPs for the evaluation of two studies (). This process is repeated for all studies (resulting in ).
Bayesian updating in case of studies, where represents the prior model probability, the posterior model probability after evaluating studies, and are the parameter estimates for Study 1, and is the Bayes factor for Hypothesis versus the unconstrained hypothesis in Study , with {0, >, < }.
In the remainder of this article, we first elaborate on the concepts likelihood, prior, posterior, BFs, and PMPs. Second, we describe our proposed method of using multiple studies to quantify the evidence for , , and . Third, we illustrate how to apply the method by combining the evidence from the studies on how trust depends on previous experience. Additionally, we inspect five hypothetical situations in which the results from the separate studies are less clear-cut than in our example on trust where the results in each study consist of significant positive effects (see the p values in Table 3). Moreover, we investigate the sensitivity of the method with respect to the prior distribution that is needed as input for computing the BF and introduce ways to deal with sensitivity.
Information in the Data
The method proposed here can be applied when different models are used in different studies (e.g., a regression model and a logistic regression model), provided that a function of , the expectation of , can be written as a linear combination of the predictors in all these models (see McCullagh and Nelder 1989; McCulloch and Searle 2005)
where is a function of , denotes a constant, the parameter that corresponds to , the th predictor for observation , the number of observations, and the number of predictors. In regression models, , and in logistic regression models, .
There is one key variable among the predictors, namely, . In our example, the key variable is “past.” Thus, we are interested in , the parameter corresponding to . Observe that the s of different studies might not be comparable, but they do all reflect the effect of the key variable on the same theoretical concept. For each of the studies, we know or can calculate the estimate and the standard error (see Table 3). This enables us to approximate the true likelihood of by , which is a normal distribution with mean and variance :
More details on large-sample inference and normal approximations can be found in, for example, Chapter 4 of Gelman et al. (2004). Bear in mind that many distributions can be approximated by a normal distribution for a large number of observations and some even for a moderate number of observations. Related to this, a maximum likelihood estimator follows asymptotically a normal distribution (under some regularity conditions and when the conditions for consistency of maximum likelihood estimator are satisfied: Ferguson 1996). Furthermore, the central limit theorem states that (under some conditions) the mean of a sufficiently large number of independent random variables will be approximately normally distributed (Rice 1995).
Note that the likelihood quantifies the support in the data for each value of . In the next section, it will be shown how a combination of prior and likelihood renders the posterior distribution, where the prior will be used to quantify the complexity and the posterior the fit of a hypothesis. Hence, as in model selection using information criteria, like the Akaike information criterion (Akaike 1973, 1974), Bayesian model selection employs a trade-off between fit and complexity. To simplify notation, the dependence on is implied throughout below.
Prior and Posterior
To evaluate the hypotheses of interest in (1), one first needs to specify a prior distribution for for . We use a so-called conjugate prior (more details can be found in Gelman et al. 2004). This implies that the prior distribution of the parameter is a normal distribution, since the likelihood of is approximately a normal distribution, and will result in a normal posterior as discussed below. To determine the prior distribution for (), we first need to specify the prior distribution of the parameter for the case where there are no restrictions. We refer to this as the unconstrained prior, which is defined by
Subsequently, the prior distribution for () is determined by
where the indicator function equals one if the argument is true, that is, if the parameter value is in accordance with the constraints imposed by , and zero otherwise. Thus, the prior for is proportional to the unconstrained prior when the parameters are in accordance with and otherwise it is zero. Note that the integral in the denominator is a normalizing constant, which is needed to make a density, that is, to let integrate to 1. One needs to specify the parameters of the prior distribution (3), that is, and .
We want the a priori belief in and to be the same. Therefore, we choose such that 50 percent of the prior is in agreement with and 50 percent with (more details can be found in Jeffreys 1961; Berger and Mortera 1999; Mulder, Hoijtink, and Klugkist 2010). Bear in mind that this implies that the complexity of both hypotheses is the same, namely 50 percent. According to the authors mentioned, BFs computed based on such a prior are well calibrated for the selection between and . Subsequently, we need to deal with , the variance of the prior. The prior variance should be chosen such that the prior is vague/noninformative, that is, such that it has little influence on the results. But, it should not be too vague, because then will receive the highest support also when it is not true. This is known as the Lindley paradox (Lindley 1957). Grounding the prior variance on the data avoids having too vague priors (Berger and Pericchi 2004, 1996). In our method, a value for is determined using the confidence intervals of for all the studies, analogous to the approach proposed in Klugkist, Laudy, and Hoijtink (2005) and Kuiper and Hoijtink (2010). In each of the studies, one can compute the 99 percent confidence interval of . The 99 percent confidence interval in Study is
where is the parameter estimate of in Study and the standard error of in Study . The 99 percent prior credibility interval of is given by
since for all , where is the prior parameter estimate of in Study and the prior standard error of in Study . To let the prior credibility interval include the confidence interval based on the data of one study, the bounds of the prior credibility interval must embed the most extreme bound of the confidence interval of this study. Figure 2 depicts the 99 percent confidence intervals of for all four studies in our example. Here, one must set equal to , , , and for , and , respectively, which leads to , and , respectively. In the section “Example,” we will show that this method to determine has satisfactory properties.
The 99 percent confidence interval of for each of the four studies and the four resulting 99 percent prior credibility intervals.
The posterior is proportional to the prior times the likelihood. It can be used to quantify the fit of hypotheses. If, for example, the posterior has a mean of 1.5 and a variance of 1, 93 percent of the posterior is in agreement with and 7 percent with . This implies that the data support more than . Stated otherwise, the fit of to the data is better than that of . Since both the prior and the likelihood are normal distributions, the posterior is a normal distribution as well (Gelman et al. 2004). To determine the posterior distribution for (), we first need to specify the posterior distribution of the parameter for the case where there are no restrictions. We refer to this as the unconstrained posterior, which is defined by
with
The posterior for () is then given by
Thus, the posterior for equals the unconstrained posterior when the parameters are in accordance with and otherwise it is zero. Bear in mind that the integral in the denominator is a normalizing constant.
In the next section, it will be shown how the prior and posterior distribution can be used to compute BFs and PMPs.
Bayes Factors and Posterior Model Probabilities
To quantify the evidence for the hypotheses at hand, one can calculate BFs or PMPs. The BF gives the support of a hypothesis relative to another hypothesis. Thus, quantifies the relative support of with respect to the unconstrained hypothesis. If equals 1, the data do not support more than the unconstrained hypothesis. If is larger (smaller) than one, the data support more (less) than the unconstrained hypothesis. Evidently, the more extreme (above or below one) the value of the BF is, the stronger the evidence (for or against, respectively, the hypothesis of interest) is. The BF is calculated by the ratio of marginal likelihoods of two hypotheses (e.g., Chib 1995; Kass and Raftery 1995), where the marginal likelihood for is the normalizing constant in (5). From (5) it follows that
for any value of in agreement with , where and are the prior and posterior distribution of the parameter for hypothesis , respectively, which are described in the previous section. Thus, for and (),
where and are convenient vehicles for the computation of . As is elaborated in, for example, Klugkist et al. (2005), is the BF for a constrained hypothesis (like ) with respect to the unconstrained hypothesis, that is, the case where there are no restrictions on the parameters. As already mentioned, since the unconstrained hypothesis is not of importance in this article, is just a useful technical tool.
where and are the proportions of the unconstrained prior in equation (3) and the unconstrained posterior in equation (4), respectively, in agreement with the constraints of hypothesis for . As elaborated above, since , for , because half of the prior distribution is in agreement with and the other half with . If, for instance, the posterior has a mean of 1.5 and a variance of 1, .
The Savage–Dickey representation (Dickey 1971) offers an easy way of calculating (i.e., ), namely
One only has to evaluate the unconstrained posterior and prior density at to compute .
A PMP for hypothesis , , gives the relative support for in a finite set of hypotheses (Kass and Raftery 1995). For ,
where is the PrMP of hypothesis , which represents the degree of belief of a researcher in each hypothesis before observing the data. An uninformative choice is to set the PrMPs equal for all hypotheses. In the example, then equals for . When , , and , the PMP values for the three hypotheses (with equal PrMPs) are , , and . Note that it can also be seen from the PMPs, among other things, that the support for is times higher than the support for . Another choice, in case there is previous research, is to set the PrMPs equal to the PMPs of a previous study. We will elaborate on this in the next section, where we explain how one can combine several studies.
Updating Evidence From Multiple Studies
The results of several studies, that is, the evidence for the hypotheses at hand, can be combined/updated by setting the PrMP of Study equal to the PMP of Study . In the first study, the PrMP will be set equal for all hypotheses (i.e., for ). Let and represent the PrMPs and PMPs, respectively, for hypothesis in Study , let be for Study , and let be the total number of studies to combine. Then, for ,
When the studies are combined, this results in an overall PMP for (). It can be shown for that is calculated by
From this it can be seen that the order of the studies does not influence the outcome of the method.
Having discussed how the method of quantifying evidence for the hypotheses of interest works in general, we illustrate this method in the following section by combining the four studies described in the introduction and summarized in Tables 1 and 3. In addition, we will study the sensitivity of the method with respect to .
Example
In the illustration, we use the studies of Batenburg et al. (2003), Buskens and Raub (2002), Buskens and Weesie (2000), and Buskens et al. (2010). Each of these studies tests (among other things) whether there is an effect of the variable past (a measure for the amount of positive past experiences) on trust. As mentioned before, because of the different types and designs of the models (Table 1), meta-analysis cannot be employed to combine the four studies regarding trust. But we can combine these studies via Bayesian updating. Notably, our method does not combine estimates but evidence for null, positive, and negative effects. In all studies, the parameter estimate and the standard error of the coefficient of past is calculated (see Table 3). Hence, we can implement our method.
For this example, the optimal values of the prior variance for Studies and are , and , respectively. The panel “” in Table 4 shows, for the different steps of the method, the updated PMPs for hypothesis after adding Study for , , and the optimal values for Studies , and . The column for displays the PMPs for only one study, namely Study 1, and employs equal PrMPs of the three hypotheses. Here, the support for is high and that for is low. The column for displays the PMPs for Study 2 and uses the PMPs of as the PrMPs. Like for , the support for is high. When the third study is added, there is full support for and none for and , namely and . The same remains valid when including the fourth study. From the overall PMP value for (), it follows that we favor over and . Furthermore, it can be said that support for is times higher than the support for for . In the example, the support for is infinitely huge, since in all studies a (large) positive effect of past on trust was found. Note that when one combines studies with mixed effects, that is, not solely positive effects, the value of (for ) is more illustrative.
Values for Hypothesis in Study for , , and .
/
1
2
3
4
0
0.030
0.003
5.671e−31
1.536e−50
>
0.966
0.997
1.000
1.000
<
0.004
7.809e−05
0.000
0.000
0
0.022
0.002
6.212e−32
3.660e−52
>
0.976
0.998
1.000
1.000
<
0.002
2.418e−05
0.000
0.000
0
0.020
0.002
2.787e−32
8.596e−53
>
0.978
0.998
1.000
1.000
<
0.002
1.139e−05
0.000
0.000
To increase confidence in the conclusions obtained, one could elaborate with a sensitivity study using and for Study . As can be seen in Table 4, the results for these values are for all practical purposes the same. The conclusion is that is the preferred hypothesis when combining the four studies and has much more support than and .
In general, the following guidelines will be employed. First, examine the results for . When one of the hypotheses renders the highest overall PMP value, the sensitivity of the results with respect to the prior specification should be checked (see the next step). Otherwise, the studies under investigation cannot distinguish between some or all the hypotheses. Hence, more studies are required. Second, inspect the stability of the results by examining the results for and . When the results are stable, one can conclude that the hypothesis with the highest PMP value is the preferred one. In case the results are not stable, more studies are needed to draw conclusions. These decision rules will be applied in the next section to situations that are less clear-cut than in the example above.
An Examination of Hypothetical Situations
The example illustrates that combining four studies with persuasive evidence for (namely for all four studies) renders high to full support for . Since this is to be expected, we will additionally examine five hypothetical situations, that is, situations based on artificial data. They are depicted in Table 5, where represents the one-sided value (regarding ) corresponding to the reported parameter estimates ( and for Study ). Effects for which is larger than zero are called positive effects and those smaller than zero, negative effects. Positive effects with a value smaller than .05 are called significant positive effects, those with a value larger than .05 and smaller than .10, small positive effect, and those with a value larger than .10 and smaller than .50, positive null effects. Negative effects with a value larger than .95 are referred to as significant negative effects, those with a value smaller than .95 and larger than .90, small negative effect, and those with a value smaller than .90 but larger than .50, negative null effects. Five hypothetical situations are distinguished: (1) Positive null effects: all effects, although not significant, are positive, namely, for each study. (2) Insignificant positive effects: all effects are small and positive with for each study. (3)–(5) Mixed effects (situations with positive, negative, and null effects). In the first mixed effect situation, there is a significant positive effect (), a significant negative effect ( leading to for a ), a small positive effect (), and a positive null effect (). In the second mixed effect situation, there is a significant positive effect (with a lower value, namely ), a small negative effect ( leading to for ), and two small positive effects (). The third one resembles the second, the only difference is that the significant positive effect is replaced by a positive null effect with . In all situations, the standard errors of are set equal to the ones of the example. The values were chosen based on the type of situation, which resulted in the values depicted in Table 5.
The Parameter Estimates ( and for Study ) and Corresponding One-Sided Values for Five Hypothetical Situations.
No Effects
Small Effects
Mixed Effects 1
Mixed Effects 2
Mixed Effects 3
1
0.029
0.007
.400
0.045
.060
0.055
.030
0.068
.010
0.009
.378
2
0.054
0.014
.400
0.084
.060
−0.102
.970
−0.084
.940
−0.084
.940
3
0.093
0.078
.400
0.175
.060
0.153
.100
0.175
.060
0.175
.060
4
0.179
0.151
.400
0.337
.060
0.151
.400
0.337
.060
0.337
.060
Table 6 displays the overall PMPs for these five situations for , , and . The latter two are inspected to examine the sensitivity of the conclusions obtained due to the choice of the prior. The values are given in Table 6.
Values for Hypothesis for , , and .
No Effects
Small Effects
Mixed Effects 1
Mixed Effects 2
Mixed Effects 3
0.001
0.002
0.003
0.003
0.001
0.004
0.008
0.009
0.008
0.008
0.015
0.026
0.023
0.026
0.026
0.056
0.096
0.056
0.096
0.096
No Effects
Small Effects
Mixed Effects 1
Mixed Effects 2
Mixed Effects 3
0
0.427
0.016
0.250
0.038
0.223
0.524
0.984
0.717
0.960
0.752
0.050
1.160e−04
0.033
0.000
0.024
0
0.544
0.015
0.320
0.040
0.286
0.429
0.985
0.661
0.959
0.700
0.027
3.116e−05
0.019
0.001
0.014
0
0.715
0.021
0.482
0.065
0.440
0.272
0.979
0.507
0.935
0.552
0.012
1.248e−05
0.010
2.842e−04
0.008
Table 6 shows that combining four studies with a positive null effect lead to support for both and . This is supported by the results for , but not by that for , where receives most support. Since our results are not supported by the sensitivity analysis, more studies should be collected and added. Moreover, the support for is 1.27 higher than for , which is not compelling evidence.
Combining four studies with a small positive effect (each not significant at ) renders profound support for . The sensitivity analysis shows stability, that is, is preferred over the other two hypotheses for and as well. Thus, even though the four studies did not find a significant positive effect, combining them does lead to compelling support for a positive effect, since all four studies comprise small positive effects.
In the first mixed effect situation, where there is a significant positive effect, a significant negative effect, a positive null effect, and a small positive effect, has the highest support. The sensitivity analysis shows that the support for decreases and that for increases for increasing prior variances. The support for is about 3, 2, and 1 times higher than for for , and , respectively. Because of this variability, more studies should be collected and added to be able to draw conclusions.
In the second mixed effect situation, where there is a small negative effect and two small positive effects besides a positive effect, has the highest support for all three values. In this situation, one can conclude that has profoundly more support than or .
In the third mixed effect situation, which equals the second one except that the significant positive effect has a lower value, renders high/highest support for all the prior variance values. For and , the support for is 2.45 and 3.37 higher than for , respectively. For , this is only 1.26. Nevertheless, we conclude that has highest support and has 2.45 and 50 times more support than and , respectively.
In sum, it evidently depends on the types of effect (positive, negative, null) and their values which effect/hypothesis receives the highest support. When the results are not sensitive for the prior variance, one can conclude that the support for is as large as for (for ). In other situations, more studies should be collected and added.
Conclusion
This article introduces a Bayesian updating method to quantify the evidence for the hypotheses of interest (i.e., no effect, positive effect, and negative effect) from multiple studies with possibly different types of models and designs. Specifically, one obtains an overall posterior model probability for each hypothesis of interest (i.e., , , and ), which reflects the relative support and allows statements of the type “ is 10 times more likely than ”.
In terms of the example of the effect of positive past experiences on trust, we see that combining the four studies increases the confidence in the hypothesis that there is indeed a positive effect of past on trust. Although the example is illustrative, the result might not be surprising, given that the hypothesis under consideration is supported in all separate studies. The evaluations of the five hypothetical situations show that also with limited or mixed evidence in individual studies our method helps distinguish between situations with more and less convincing evidence if different studies are combined.
From the results of the illustration, one can see that our Bayesian updating method is useful for choosing the best of a set of hypotheses in case of multiple studies regarding one theoretical concept. The method is practical because it can be used even when only parameter estimates and standard errors from the different studies are available. It is not necessary that the original data are available and even if the data would be available, this would not lead to better evaluation of the hypotheses because all the information our Bayesian updating needs is incorporated in the parameters and the related standard errors. To facilitate using our Bayesian updating method, we provide software that implements the method in R as described in more detail in the online appendix (which can be found at http://smr.sagepub.com/supplemental/).
Footnotes
Acknowledgments
The authors acknowledge the constructive comments by three anonymous reviewers and by the SMR editor.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This study has been funded by the Netherlands Organization for Scientific Research NWO-VICI-453-05-002 (Hoijtink). Buskens acknowledges support from the High Potential program “Dynamics of Cooperation, Networks, and Institutions” of Utrecht University. Raub acknowledges support from NWO-PIONIER-S-96-168 and PGS-50-370.
References
1.
AkaikeH.1973. “Information Theory and an Etension of the Maximum Likelihood Principle.” Pp. 267–81 in Second International Symposium on Information Theory, edited by PetrovB.CsakiF.. Budapest, Hungary: Akademiai Kiado.
2.
AkaikeH.1974. “A New Look at the Statistical Model Identification.”IEEE Transaction on Automatic Control19:716–23.
3.
BatenburgR. S.RaubW.SnijdersC.. 2003. “Contacts and Contracts: Temporal Embeddedness and the Contractual Behavior of Firms.”Research in the Sociology of Organizations20:135–88.
4.
BergerJ. O.MorteraJ.. 1999. “Default Bayes Factors for Nonnested Hypothesis Testing.”Journal of the American Statistical Association94:542–54.
5.
BergerJ. O.PericchiL. R.. 1996. “The Intrinsic Bayes Factor for Model Selection and Prediction.”Journal of the American Statistical Association91:109–22.
6.
BergerJ. O.PericchiL. R.. 2004. “Training Samples in Objective Bayesian Model Selection.”The Annals of Statistics32:841–69.
7.
BuskensV.RaubW.. 2002. “Embedded Trust: Control and Learning.”Advances in Group Processes19:167–202.
8.
BuskensV.RaubW. (2013). “Rational Choice Social Research on Social Dilemmas: Embeddedness Effects on Trust.” Forthcoming in Handbook of Rational Choice Social Research, edited by WittekR.SnijdersT.A.B.NeeV.. Stanford, CA: Stanford University Press.
9.
BuskensV.RaubW.van der VeerJ.. 2010. “Trust in Triads: An Experimental Study.”Social Networks32:301–12.
10.
BuskensV.WeesieJ.. 2000. “An Experiment on the Effects of Embeddedness in Trust Situations: Buying a Used Car.”Rationality and Society12:227–53.
11.
ChibS.1995. “Marginal Likelihood from the Gibbs Output.”Journal of the American Statistical Association90:1313–21.
12.
CooperH.HedgesL. V.ValentineJ. C.. 2009. The Handbook of Research Synthesis and Meta-Analysis, 2nd ed. New York, NY: Sage Foundation.
13.
DickeyJ.1971. “The Weighted Likelihood Ratio, Linear Hypotheses on Normal Location Parameters.”Annals of Mathematical Statistics42:204–23.
14.
FergusonT. S.1996. A Course in Large Sample Theory. London: Chapman & Hall.
15.
GelmanA.CarlinJ. B.SternH. S.RubinD. B.. 2004. Bayesian Data Analysis2nd ed. London, UK: Chapman and Hall.
16.
JeffreysH.1961. Theory of Probability. New York, NY: Oxford University Press.
17.
KassR. E.RafteryA. E.. 1995. “Bayes Factors.”Journal of the American Statistical Association90:773–95.
18.
KlugkistI.LaudyO.HoijtinkH.. 2005. “Inequality Constrained Analysis of Variance: A Bayesian Approach.”Psychological Methods10:477–93.
19.
KuiperR. M.HoijtinkH.. 2010. “Comparisons of Means Using Exploratory and Confirmatory Approaches.”Psychological Methods15:69–86.
20.
LindleyD. V. 1957. “A Statistical Paradox.”Biometrika44:187–92.
MulderJ.HoijtinkH.KlugkistI.. 2010. “Equality and Inequality Constrained Multivariate Linear Models: Objective Model Selection using Constrained Posterior Priors.”Journal of Statistical Planning and Inference140:887–906.
25.
RaubW.BuskensV.. 2008. “Theory and Empirical Research in Analytical Sociology: The Case of Cooperation in Problematic Social Situations.”Analyse und Kritik30:689–722.
26.
RiceJ.1995. Mathematical Statistics and Data Analysis, 2nd ed. Belmont, California: Duxbury Press.
27.
RooksG.RaubW.SeltenR.TazelaarF.. 2000. “Cooperation Between Buyer and Supplier: Effects of Social Embeddedness on Negotiation Effort.”Acta Sociologica43:123–37.
28.
RoyallR.1997. Statistical Evidence: A Likelihood Paradigm. London, UK: Chapman and Hall.
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.