An assessment of predictive performance of Zellner’s g-priors in Bayesian model averaging

Abstract

When making predictions and inferences, data analysts are often faced with the challenge of selecting the best model among competing models as a result of large number of regressors that cumulate into large model space. Bayesian model averaging (BMA) is a technique designed to help account for uncertainty inherent in model selection process. In Bayesian analysis, issues of the choice of prior distribution have been quite delicate in data analysis and posterior model probabilities (PMP) in the context of model uncertainty under model selection process are typically sensititve to the specification of prior distribution. This research identified a set of eleven candidate default priors (Zellner’s g-priors) prominent in literature and applicable in Bayesian model averaging. A new robust g-prior specification for regression coefficients in Bayesian Model Averaging is investigated and its predictive performance assessed along with other g-prior structures in literature. The predictive abilities of these g-prior structures are assessed using log predictive scores (LPS) and log maximum likelihood (LML). The sensitivity of posterior results to the choice of these g-prior structures was demonstrated using simulated data and real-life data. The simulated data obtained from multivariate normal distribution were first used to demonstrate the predictive performance of the g-prior structures and later contaminated for the same purpose. Similarly for the same purpose, the real life data were normalized before using the data as obtained. Empirical findings reveal that under different conditions, the new g-prior structure exhibited robust, equally competitive and consistent predictive ability when compared with identified g-prior structures from the literature. The new g-prior offers a sound, fully Bayesian approach that features the virtues of prior input and predictive gains that minimise the risk of misspecification.

Keywords

Zellner’s g-priors Bayesian model averaging model uncertainty large model space posterior model probability predictive performance

1. Introduction

In regression analysis, picking a single model among competing models tends to ignore the uncertainty associated with the specification of a selected model as a result of overstatement of the strength of evidence via $p$ -values that are too small (Clyde & George, 2004). Thus, Box (1976) states that “all models are wrong, but some are useful”, while according to Nielsen et al. (2014), Einstein also said that “models should be made as small as possible but not simpler”. There are three major difficulties that arise when putting the Bayesian approach into practice in situation of large model spaces: the choice of the prior distributions; the computation of the integrated likelihood and the estimation of $\tau$ (posterior distribution over the model space) (Adeleke & Ogundeji, 2009; García-donato & Martínez-beneito, 2013). When dealing with large model space, these basic components of Bayesian analysis is better handled by an approach that computes a weighted average of the estimates of all the competing models. This method which has become attractive to many practitioners is called model averaging and is able to incorporate model uncertainty into the analysis. Also, in recent years (Feldkircher et al., 2012; Hanson et al., 2014; Li & Clyde, 2015); there has been increasing interest in forecasting methods that utilise large data sets and Bayesian Model Averaging (BMA) methods have been widely employed in this area. BMA is an appropriate framework employed to control model uncertainty by considering useful information provided by all competing models in the set. In BMA, posterior model probability is applied as a measure to determine the performance of each model in the set in comparison with one another (Rossi et al., 2005).

Prior distributions play very crucial roles in Bayesian probability theory as it is attractive to have conditional distributions that have a closed form under sampling (Okafor, 1999; Lee, 2004). Zellner (1983, 1986) proposed a procedure for evaluating a conjugate prior distribution referred to as Zellner’s informative g-prior, or simply g-prior. The g-prior has been vastly used in Bayesian analysis in multiple regression models, due to the verity that analytical results are more readily available, better computational efficiency and its simple interpretation (Davison, 2008). The benchmark g-prior structure has proven universally popular in BMA, since it leads to simple closed form expressions of posterior quantities and because it reduces prior elicitation to the choice of a single hyperparameter $g$ . The predictive ability of these g-priors constitutes the focus of this research (Fernandez et al., 2001a; Eicher et al., 2011). This study will support the application of BMA in achieving better predictive performance and also account for the uncertainty associated with selecting a single model among large volume of competing models or model space for parameter estimation.

2. Methodology

2.1 Bayesian model averaging

Bayesian Model Averaging (BMA) is a technique designed to help account for the uncertainty inherent in the model selection process, BMA focuses on which regressors to include in the analysis. By averaging across a large set of models one can determine those variables which are relevant to the data generating process for a given set of priors used in the analysis (Hoeting et al., 1999). Given a linear regression model with constant term $\beta_{0}$ and $k$ potential explanatory variables $x_{1},x_{2},\ldots,x_{k}$ of the form:

$\displaystyle y=\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+\ldots+\beta_{k}x_{k}+\varepsilon$ (1)

This gives rise to 2 ${}^{k}$ possible sampling models (indexed $M_{j},j=1,2,\ldots,2^{k}$ ), depending on whether we include or exclude each of the regressors. Once the model space has been determined, the posterior distribution of any coefficient of interest (say $\beta_{h}$ ), given the data $D$ is:

$P({\beta_{h}|D})=\sum\limits_{j=1}^{2^{k}}{P({\beta_{h}|M_{j}})}P({M_{j}|D})$ (2)

BMA uses each model’s posterior probability, $P({M_{j}|D})$ as weights. Each model (a set of variables) receives a weight and the final estimates are constructed as a weighted average of the parameter estimates from each of the models. BMA includes all of the variables within the analysis, but shrinks the impact of certain variables towards zero through the model weights. These weights are the key feature for estimation via BMA and will depend upon a number of key features of the averaging exercise including the choice of prior specified (Montgomery et al., 2010).

The posterior model probability of $M_{j}$ is given by:

$P({M_{j}|D})=P({D|M_{j}})\frac{P({M_{j}})}{P(D)}=P({D|M_{j}})\frac{P({M_{j}})}% {\sum\limits_{i=1}^{2^{k}}{P({D|M_{i}})P({M_{i}})}}$ (3)

where

$\displaystyle P({D|M_{j}})=\int{P({D|\beta^{j},M_{j}})}P({\beta^{j}|M_{j}})d% \beta^{j}$ (4)

and $\beta^{j}$ is the vector of parameters from model $M_{j}$ , $P({\beta^{j}|M_{j}})$ is a prior probability distribution assigned to the parameters of model $M_{j}$ and $P({M_{j}})$ is the prior probability that $M_{j}$ is the true model.

The estimated posterior means and standard deviations of $\hat{{\beta}}=({\hat{{\beta}}_{0},\hat{{\beta}}_{1},\ldots,\hat{{\beta}}_{k}})$ for model M ${}_{j}$ are then constructed as:

$\displaystyle E[{\hat{{\beta}}|D}]=\sum\limits_{j=1}^{2^{k}}{\hat{{\beta}}P({M% _{j}|D})}$ (5) $\displaystyle V[{\hat{{\beta}}|D}]=\sum\limits_{j=1}^{2^{k}}{({\textit{Var}[{% \beta|D,M_{j}}]+\hat{{\beta}}^{2}})P({M_{j}|D})-E[{\beta|D}]^{2}}$ (6)

2.2 Bayesian model averaging under Zellner’s g-prior

The Bayesian framework calls for specifying a prior distribution on the model’s parameters $\beta$ and $\sigma^{2}$ . The bulk of the BMA literature (Raftery et al., 1997; Hoeting et al., 1999; Chipman et al., 2001); favours the natural-conjugate approach, which puts a conditionally normal prior on the regression coefficients. For a model $M_{j}$ :

$({\beta_{j}|\sigma^{2},M_{j}})\sim N({\beta_{j},\sigma^{2}\underline{V}_{j}})$ (7)

where $\underline{V}_{j}$ represents hyperparameter which is difficult to elicit given the many combinations possible in the model selection problems. Thus, BMA applications have thus opted for a common uninformative prior centered at zero, with the variance structure given by Zellner’s g-prior, i.e. $\underline{V}_{j}=g({{X}^{\prime}_{j}X_{j}})^{-1}$ (Zellner, 1986; Agliari et al., 1988). It thus assumes the prior covariance to be proportional to the posterior covariance expression $({{X}^{\prime}_{j}X})^{-1}$ , with the scalar $g$ determining how much importance is attributed to the prior precision $\underline{V}_{j}^{-1}$ . The conditional prior on the coefficients:

$\beta|\sigma^{2}\sim N({0,\sigma^{2}g({{X}^{\prime}X})^{-1}})$ (8)

that is partly determined by the scalar hyperparameter $g$ . The g-prior structure has proven universally popular in BMA, since it leads to simple closed form expressions of posterior statistics and because it reduces prior elicitation to the choice of a single hyperparameter $g$ . This applies in particular to the resulting marginal likelihood:

$P({Y|M_{j},X})\propto({1+g})^{-\frac{k_{j}}{2}}\left({1-\frac{g}{1+g}R_{j}^{2}% }\right)^{-\frac{n-1}{2}}$ (9)

where $R_{j}^{2}$ is the OLS R-squared for model $M_{j}$ .

Table 1

Summary of Identified g-prior Structures Examined

S/N	Structure of g-prior	Comments/Sources
1	$g=n$	Unit Information Prior (UIP) based on number of observations. Kass and Wasserman (1996).
2	$g=\max(n,K^{2})$	Corresponds to the benchmark prior suggested by Fernandez et al. (2001b).
3	$g$ = $K^{2}$	Conforms to the risk inflation criterion by Foster and George (1994).
4	$g=\frac{1}{n}$	It is in the spirit of the “unit information priors” of Kass and Wasserman (1996).
5	$g=\frac{k}{n}$	Here, we assign more information to the prior as we have more regressors in the model.
6	$g=\sqrt{\frac{1}{n}}$	This is an intermediate case, where we choose a smaller asymptotic penalty term for large models than in the Schwarz criterion.
7	$g=\sqrt{\frac{k}{n}}$	The prior information increases with the number of regressors in the model. (Fernandez et al., 2001a).
8	$g=\textit{In}(n^{3})$	Asymptotically mimics the Hannan-Quinn criterion with CHQ $=$ 3 (Fernandez et al., 2001b, p.395).
9	$g=\frac{1}{\textit{In}(n^{3})}$	The Hannan-Quinn criterion. CHQ $=$ 3 as $n$ becomes large. Hannan and Quinn (1979).
10	$g=\frac{\textit{In}(k+1)}{\textit{In}(n)}$	Prior information decreases even slower with sample size and there is asymptotic convergence to the Hannan-Quinn criterion with CHQ $=$ 1.
11	$g=\frac{1}{k^{2}}$	This prior is suggested by the risk inflation criterion (RIC). Foster and George (1994).

The elicitation of $g$ and the optimal choice of $g$ constitute the focus of this research. Theoretically, the different choices of $g$ are based on two major considerations (Liang et al., 2008; Hoeting et al., 1999; Fernnandez et al., 2001a; Eicher et al., 2007). First, it focused on consistency, i.e. the choice of $g$ such that BMA asymptotically uncovers “the true model”. Second, the specification of $g$ was studied in terms of its virtues as a model size penalty term to favour parsimonious models. In this respect Fernnandez et al. (2001a) as well as Foster and George (1994) demonstrated how $g$ can be calibrated to asymptotically mimic popular information criteria such as Risk Inflation Criterion (RIC) or Bayesian Information Criterion (BIC) by adjusting $g$ to their respective model size penalties. The Bayesian framework further calls for defining prior model probabilities $P(M_{j})$ for all models contained in the model space $j\in\{{1,2,\ldots,2^{k}}\}$ . The most common model prior in the literature (Raftery et al., 1997; Raftery et al., 2010), is the uniform distribution that assigns equal prior probability to all models, so that $P(M_{j})=1/2^{k}$ . Given the prior model probabilities we can calculate the posterior model probabilities $P({M|Y,X,g})$ conditional on $({Y,X,g})$ serving as the model weights in Bayesian model averaging:

$P({M|Y,X,g})=\frac{P({Y|M,X,g})P(M)}{P({Y|X})}=\frac{P({Y|M,X,g})P(M)}{\sum% \nolimits_{j=1}^{2^{k}}{P({Y|M_{j},X,g})P({M_{j}})}}$ (10)

BMA inference thus hinges on posterior model probabilities and, in turn, on model priors $P(M)$ and marginal likelihoods $P({Y|M,X})$ (and thus the hyperparameter $g$ ) (Raftery et al., 2010).

This research identified a set of eleven candidate default priors (Zellner’s informative g-prior that is based on a sample of $n$ observations and $k$ regression coefficients of independent variables) advocated in literature (Eicher et al., 2007), see Table 1.

2.3 Predictive criteria for g-priors

To assess the predictive ability of the g-priors, predictive criteria like Log Predictive Score (LPS) and Log Marginal Likelihood (LML) were employed.

2.3.1 Log predictive score (LPS)

Log Predictive Score (LPS) assess both the sharpness of a predictive distribution and statistical consistency between the distributional forecasts and the observations (Kadane & Lazar, 2004). The analysis requires the splitting of the data set into a training set $D^{T}$ and a hold-out set $D^{H}$ . The Training sample is used to derive the BMA results and hold-out sample is used to gauge the predictive performance of independent data.

The predictive ability of any model is measured by the sum of the logarithm of the posterior predictive ordinates for the observations in the hold-out set. The log score for any given model is the observed coordinate of the predictive density

$-\sum\nolimits_{{\theta}^{\prime}\varepsilon D^{H}}{\log P({{\theta}^{\prime}|% M_{k},D^{T}})},$ (11)

where $P({{\theta}^{\prime}|M_{k},D^{T}})$ is the posterior predictive ordinate. The predictive log score for BMA is then

$\textit{LPS}({\theta}^{\prime})=-\sum\nolimits_{{\theta}^{\prime}\varepsilon D% ^{H}}{\log\left\{{\sum\nolimits_{k=1}^{K}{P({{\theta}^{\prime}|M_{k},D^{T}})}P% ({M|D^{T}})}\right\}}.$ (12)

The log predictive score is a proper scoring rule for assessing predictive performance and a smaller value of LPS makes a Bayes model a prior choice for $g$ that is preferable (Fernandez et al., 2001b).

2.3.2 Log marginal likelihood (LML)

The marginal likelihood or the model evidence is the probability of observing the data given a specific model and is defined as:

$P({Y|M})=\int{P({Y|\theta,M})P({\theta|M})d\theta}$ (13)

If we have two models $M_{1}$ and $M_{2}$ , then we can compare the marginal likelihoods of each, i.e., compare $P(Y|M_{1})$ to $P(Y|M_{2})$ and ask which is better (i.e. larger), (Kass et al., 1995).

$\frac{P(M_{1}|Y)}{P({M_{2}|Y})}=\frac{P(M_{1})}{P({M_{2}})}\frac{P(Y|M_{1})}{P% ({Y|M_{2}})}$ (14)

For more than two models, we can compute the marginal likelihoods of each and ask which among the set is the largest. The fundamental quantity in Bayesian model comparison is the marginal likelihood (sometimes also called the “evidence”), which is simply the likelihood of the data integrated over all parameter choices.

Table 2

Predictive ability under different choices of g-priors examined using log predictive scores (LPS)

Model space (2 ${}^{k}$ )		Original simulated data			Contaminated simulated data
s/n	g-Structures	1024	1.07 $\times$ 10 ${}^{9}$	1.15 $\times$ 10 ${}^{18}$	1024	1.07 $\times$ 10 ${}^{9}$	1.15 $\times$ 10 ${}^{18}$
		$k=$ 10	$k=$ 30	$k=$ 60	$k=$ 10	$k=$ 30	$k=$ 60
1	$g=n$	15.92499	15.6923	14.28824	18.6964	18.2288	17.0922
2 and 3	$g=\max(n,K^{2})$	16.06603	15.89193	12.63347	18.7209	21.5723	94.3346
	and $g=K^{2}$
4	$g=\frac{1}{n}$	16.16908	16.15871	16.15829	18.3137	18.3112	18.3090
5	$g=\frac{k}{n}$	16.16908	15.90161	15.69209	18.3340	18.3692	18.4506
6	$g=\sqrt{\frac{1}{n}}$	16.14201	16.08401	16.08089	18.3259	18.3142	71.5153
7	$g=\sqrt{\frac{k}{n}}$	16.09956	15.8829	15.74021	18.3549	18.3898	18.3951
8	$g=\textit{In}(n^{3})$	16.03177	15.63037	14.8992	18.6219	19.9744	20.2829
9	$g=\frac{1}{\textit{In}(n^{3})}$	16.15476	16.11925	16.11811	18.3197	18.3119	18.3047
10	$g=\frac{\textit{In}(k+1)}{\textit{In}(n)}$	16.08797	15.86806	15.76994	18.3665	18.4005	18.3956
11	$g=\frac{1}{k^{2}}$	16.17271	16.17446	16.17502	18.3124	18.3114	18.3114
12	$g=\frac{n}{\sqrt{k}}$	16.03232	15.6432	15.24474	18.0555	16.7221	16.8050

Table 3

Predictive Ability under Different Choices of g priors Examined using Log Marginal Likelihood (LML)

Model space (2 ${}^{k}$ )		Original simulated data			Contaminated simulated data
s/n	g-Structures	1024	1.07 $\times$ 10 ${}^{9}$	1.15 $\times$ 10 ${}^{18}$	1024	1.07 $\times$ 10 ${}^{9}$	1.15 $\times$ 10 ${}^{18}$
		$k=$ 10	$k=$ 30	$k=$ 60	$k=$ 10	$k=$ 30	$k=$ 60
1	$g=n$	$-$ 648.1121	$-$ 645.7366	$-$ 624.7236	$-$ 698.199	$-$ 680.626	$-$ 677,404
2 & 3	$g=\max(n,K^{2})$	$-$ 648.1121	$-$ 645.7366	$-$ 658.5168	$-$ 698.625	$-$ 686.818	$-$ 711.803
	$g=K^{2}$
4	$g=\frac{1}{n}$	$-$ 648.1453	$-$ 648.0972	$-$ 648.0827	$-$ 711.951	$-$ 711.888	$-$ 712.009
5	$g=\frac{k}{n}$	$-$ 647.6254	$-$ 645.781	$-$ 643.134	$-$ 709.838	$-$ 704.825	$-$ 702.686
6	$g=\sqrt{\frac{1}{n}}$	$-$ 647.8079	$-$ 647.54	$-$ 647.4508	$-$ 710.625	$-$ 710.234	$-$ 693.923
7	$g=\sqrt{\frac{k}{n}}$	$-$ 647.2727	$-$ 645.7201	$-$ 643.8106	$-$ 708.059	$-$ 703.874	$-$ 704.276
8	$g=\textit{In}(n^{3})$	$-$ 647.6242	$-$ 644.663	$-$ 631.848	$-$ 698.205	$-$ 684.640	$-$ 685.996
9	$g=\frac{1}{\textit{In}(n^{3})}$	$-$ 647.9682	$-$ 647.777	$-$ 647.7264	$-$ 711.277	$-$ 711.100	$-$ 711.426
10	$g=\frac{\textit{In}(k+1)}{\textit{In}(n)}$	$-$ 648.1321	$-$ 645.6123	$-$ 644.0461	$-$ 707.192	$-$ 703.668	$-$ 704.601
11	$g=\frac{1}{k^{2}}$	$-$ 648.19	$-$ 648.2153	$-$ 648.219	$-$ 712.115	$-$ 712.212	$-$ 712.224
12	$g=\frac{n}{\sqrt{k}}$	$-$ 647.6655	$-$ 644.461	$-$ 635.6674	$-$ 698.084	$-$ 680.252	$-$ 673.113

3. Simulation results

The effects of the set of g-priors were examined using simulated dataset drawn from multivariate normal distributions and later the simulated data obtained was contaminated using chi-square distribution with degree of freedom 2, (level of contamination $=$ 20%). Three groups of dataset (11-variate, 31-variate and 61-variate) random variables were simulated. The Bayesian analysis was carried out using Bayesian model averaging package “BMS” available in the statistical software R for the two data nature studied (i.e. simulated data from normal distribution and contaminated simulated data). The simulation study was carried out on the basis of $n=$ 40 observations with $k=$ 10, 30, and 60 set of regressors or predictor variables (identified as $X1,X2,\ldots,X60$ with a common response variable $Y$ ). The results of the assessment of the different g-priors based on the predictive criteria are shown in Tables 2 and 3.

An overview of the result using simulated data as reported above in Table 2 shows that the LPS for the twelfth g-prior structure at the different model spaces are of the lowest.

An overview of the result using simulated data as reported above in Table 3 shows that the LML for the twelfth g-prior structure at the different model spaces are of the highest.

Table 4
Comparing the predicted values of the 72 ${}^{\rm nd}$ observation (dependent variable) with its actual value using log predictive scores (LPS) across parameter g-priors

S/N	g-prior	Actual value	Predicted value	LPS	S/N	g-prior	Actual value	Predicted value	LPS
1	$g=n$	0.0046	0.0013	$-$ 3.716	7	$g=\sqrt{\frac{k}{n}}$	0.0046	0.015	$-$ 2.981
2	$g=\max(n,k^{2})$	0.0046	0.0021	$-$ 3.649	8	$g=\ln({n^{3}})$	0.0046	0.031	$-$ 3.633
3	$g=k$ ${}^{2}$	0.0046	0.0021	$-$ 3.649	9	$g=\frac{1}{\ln({n^{3}})}$	0.0046	0.021	$-$ 2.653
4	$g=\frac{1}{n}$	0.0046	0.021	$-$ 2.603	10	$g=\frac{\ln({k+1})}{\ln(n)}$	0.0046	0.0145	$-$ 3.019
5	$g=\frac{k}{n}$	0.0046	0.016	$-$ 2.917	11	$g=\frac{1}{k^{2}}$	0.0046	0.0214	$-$ 2.592
6	$g=\sqrt{\frac{1}{n}}$	0.0046	0.02	$-$ 2.683	12	$g=\frac{n}{\sqrt{k}}$	0.0046	0.0033	$-$ 3.614

4. Results of predictive performance of g-prior structures investigated

4.1 Results using data provided by FLS

The effects of the set of g-priors using datasets provided by FLS (Fernandez et al., 2001a) prominent in the BMA literature were examined. The analysis was based on $n=$ 72 observations with $k=$ 41 set of regressors or possible variables. To analyse these data, uniform model prior was applied as the model prior for the model space across parameter g-prior structures investigated. Given the model space 2 ${}^{41}=$ 2.2 $\times$ 10 ${}^{12}$ (over two trillion!) and with a fairly large amount of drawings (5 million), Markov Chain Monte Carlo Model Composition (MC ${}^{3}$ ) sampler is applied to adequately identify the high posterior probability models. The study compares the predictive abilities of the different g-priors identified from literature (Table 1) and six new g-prior structures investigated. Table 4 above shows the results for the eleven g-prior structures (Table 1) and the most reliable and consistent g-prior structure among the six g-prior structures investigated.

The results from Table 4 above show that the actual value of the dependent variable of the 72 ${}^{\rm nd}$ observation is best predicted by the new g-prior, $g=\frac{n}{\sqrt{k}}$ based on the predicted and actual values and having one of the lowest LPS, though preceded by g-prior serial number (S/N): 1, 2 and 3.

4.2 Published real life data

The real life data for the implementation of the sensitivity and predictive performance of identified g-priors and proposed new g-prior structures were obtained from two sources:

a)
The National Bureau of Statistics (NBS), Nigeria; 2012 Annual Reports on all Official Statistics on socio-economic and macro-economic indicators, various machinery and tools that have been brought to bear in improving the efficiency and reliability of official statistics. Data frames comprise of $n=$ 72 observations with $k=$ 41 regressors or predictor variables, cumulating into 2.20 $\times$ 10 ${}^{12}$ models (i.e. Trillion of models!). Unemployment rates as the response variable was regressed on the 41 regressors.
b)
Bulletin of Statistics, Statistics South Africa, Vol. 43. 2; 2009 Annual Reports on National accounts to include Gross domestic product (GDP), Percentage change in the quarterly GDP by industry and other economic indicators. Data frames comprise of $n=$ 42 observations with $k=$ 86 regressors or predictor variables, cumulating into 7.74 $\times$ 10 ${}^{25}$ models (i.e. Septillion of models!). GDP as the response variable was regressed on the 86 regressors.

Table 5
Predictive ability under different choices of g-priors examined using both log predictive scores (LPS) and log marginal likelihood (LML) for the normalised real life data (a) and (b)

Real-life data (a) Real-life data (b)

S/N g-Structures LPS LML LPS LML

1 $g=n$ $-$ 130.7 373.9 81.5 $-$ 480.71

2 & 3 $g=K^{2}$ $-$ 135.7 374 175.58 $-$ 510.169

4 $g=\frac{1}{n}$ $-$ 119.7 329.3 28.18 $-$ 503.2847

5 $g=\frac{k}{n}$ $-$ 125.1 327.9 38.83 $-$ 497.6256

6 $g=\sqrt{\frac{1}{n}}$ $-$ 120.8 328.8 28.93 $-$ 502.7608

7 $g=\sqrt{\frac{k}{n}}$ $-$ 126.4 327.8 36.15 $-$ 499.2061

8 $g=\textit{In}(n^{3})$ $-$ 140.2 344.5 10.86 $-$ 488.9581

9 $g=\frac{1}{\textit{In}(n^{3})}$ $-$ 120.4 327.8 28.55 $-$ 502.9019

10 $g=\frac{\textit{In}(k+1)}{\textit{In}(n)}$ $-$ 127.1 329.3 34.88 $-$ 498.8216

11 $g=\frac{1}{k^{2}}$ $-$ 119.5 342.7 28.04 $-$ 503.3916

12 $g=\frac{n}{\sqrt{k}}$ $-$ 140.1 383.9 10.51 $-$ 480.684

Table 6
Predictive ability under different choices of g-priors examined using both log predictive scores (LPS) and log marginal likelihood (LML) for the un-normalised real life data (a) and (b)

Real-life data (a) Real-life data (b)

S/N g-Structures LPS LML LPS LML

1 g $=$ n $-$ 137.5 363.9 92.52 $-$ 488.31

2 & 3 g $=$ K ${}^{2}$ $-$ 115.8 354 255.18 $-$ 550.169

4 $g=\frac{1}{n}$ $-$ 129.9 349.3 38.68 $-$ 515.285

5 $g=\frac{k}{n}$ $-$ 135.2 317.9 38.93 $-$ 507.626

6 $g=\sqrt{\frac{1}{n}}$ $-$ 130.4 328.8 28.13 $-$ 522.761

7 $g=\sqrt{\frac{k}{n}}$ $-$ 136.6 337.8 56.75 $-$ 509.206

8 $g=\textit{In}(n^{3})$ $-$ 150.7 354.5 13.16 $-$ 508.958

9 $g=\frac{1}{\textit{In}(n^{3})}$ $-$ 125.8 377.8 32.15 $-$ 512.902

10 $g=\frac{\textit{In}(k+1)}{\textit{In}(n)}$ $-$ 137.7 309.3 46.08 $-$ 499.122

11 $g=\frac{1}{k^{2}}$ $-$ 129.7 312.7 30.34 $-$ 513.3916

12 $g=\frac{n}{\sqrt{k}}$ $-$ 150.3 393.9 12.51 $-$ 470.684

4.2.1 Results of BMA analysis of real life data (a) and real life data (b)

		Real-life data (a)	Real-life data (b)
1	$g=n$	$-$ 130.7	373.9	81.5	$-$ 480.71
2 & 3	$g=K^{2}$	$-$ 135.7	374	175.58	$-$ 510.169
4	$g=\frac{1}{n}$	$-$ 119.7	329.3	28.18	$-$ 503.2847
5	$g=\frac{k}{n}$	$-$ 125.1	327.9	38.83	$-$ 497.6256
6	$g=\sqrt{\frac{1}{n}}$	$-$ 120.8	328.8	28.93	$-$ 502.7608
7	$g=\sqrt{\frac{k}{n}}$	$-$ 126.4	327.8	36.15	$-$ 499.2061
8	$g=\textit{In}(n^{3})$	$-$ 140.2	344.5	10.86	$-$ 488.9581
9	$g=\frac{1}{\textit{In}(n^{3})}$	$-$ 120.4	327.8	28.55	$-$ 502.9019
10	$g=\frac{\textit{In}(k+1)}{\textit{In}(n)}$	$-$ 127.1	329.3	34.88	$-$ 498.8216
11	$g=\frac{1}{k^{2}}$	$-$ 119.5	342.7	28.04	$-$ 503.3916
12	$g=\frac{n}{\sqrt{k}}$	$-$ 140.1	383.9	10.51	$-$ 480.684

		Real-life data (a)	Real-life data (b)
1	g $=$ n	$-$ 137.5	363.9	92.52	$-$ 488.31
2 & 3	g $=$ K ${}^{2}$	$-$ 115.8	354	255.18	$-$ 550.169
4	$g=\frac{1}{n}$	$-$ 129.9	349.3	38.68	$-$ 515.285
5	$g=\frac{k}{n}$	$-$ 135.2	317.9	38.93	$-$ 507.626
6	$g=\sqrt{\frac{1}{n}}$	$-$ 130.4	328.8	28.13	$-$ 522.761
7	$g=\sqrt{\frac{k}{n}}$	$-$ 136.6	337.8	56.75	$-$ 509.206
8	$g=\textit{In}(n^{3})$	$-$ 150.7	354.5	13.16	$-$ 508.958
9	$g=\frac{1}{\textit{In}(n^{3})}$	$-$ 125.8	377.8	32.15	$-$ 512.902
10	$g=\frac{\textit{In}(k+1)}{\textit{In}(n)}$	$-$ 137.7	309.3	46.08	$-$ 499.122
11	$g=\frac{1}{k^{2}}$	$-$ 129.7	312.7	30.34	$-$ 513.3916
12	$g=\frac{n}{\sqrt{k}}$	$-$ 150.3	393.9	12.51	$-$ 470.684

Normalizing transformations were made on the data sets to make them multivariate normal, achieved by the standardization of the data set and removal of influential and extreme observations. For real life data (a) and real life data (b), the predictive abilities based on these real data under the different choices of g-priors were examined and compared using Log Predicted Scores (LPS) and Log Marginal Likelihood (LML) (see Tables 5 and 6).

An overview of the result as reported above in Table 5 shows that the LPS for the twelfth g-prior structure at the different model spaces using normalized real-life data (a) and (b) are of the lowest. Similarly, the results show that the LML for the twelfth g-prior structure at the different model spaces using normalized real-life data (a) and (b) are of the highest.

An overview of the result as reported above in Table 6 shows that the LPS for the twelfth g-prior structure at the different model spaces using un-normalized real-life data (a) and (b) are of the lowest. Similarly, the results show that the LML for the twelfth g-prior structure at the different model spaces using normalized real-life data (a) and (b) are of the highest.

5. Conclusion

The study demonstrated that fixing g to arbitrary values may have unintended consequences on posterior model probabilities and subsequently the predictive ability of the g-prior. Given huge model space for cases with more observations than regressors or for cases with lesser observations than regressors, reliable results were obtained using a new g-prior structure. This study complements the contributions of Fernandez et al. (2001), Eicher et al. (2007) and Liang et al. (2008) as it establishes a new robust g-prior structure that exhibits consistent, competitive and reliable predictive performance compared with other parameter g-priors suggested in literature and further provides closed-form representations for posterior quantities under BMA analysis.

As noted in section three, the studies have assessed the g-prior predictive performance in BMA under different conditions using simulated data that follow normal distribution or otherwise and real life data that follow normal distribution or otherwise. The empirical results tend to favour g-specifications ascribing values to g between 0 and 100 that effectively select the right models. Thus, the results demonstrated that fixing g prior values runs the risk of over or understating the importance of some variables (i.e. the posterior inclusion probability of the regressors of models). In conclusion, the new g-prior [ $g=\frac{n}{\sqrt{k}}$ ] offers a sound, fully Bayesian approach that features the virtues of prior input and predictive gains that minimises the risk of misspecification.

References

Adeleke

I. A.

, & Ogundeji

R. K.

(2009). Bayesian estimation of the proportion of subscribers to Nigeria’s National Health Insurance Scheme. Journal of Modern Mathematics and Statistics, 3(3), 56-59.

Agliari

, & Parisetti

(1988). A-g reference informative prior: A note on Zellner’s g-prior. Journal of the Royal Statistical Society, Series D (The Statistician), 37(3), 271-275.

Box

G. E. P.

(1976). Science and statistics. Journal of the American Statistical Association, 71, 791-327.

Chipman

George

, & McCulloch

(2001). The practical implementation of Bayesian model selection. IMS Lecture Notes-Monograph Series, 38, 65-134.

Clyde

, & George

E. I.

(2004). Model uncertainty. Statistical Science, 19, 81-94.

Davison

A. C.

(2008). Statistical Models. Cambridge University Press, New York.

Eicher

Papageorgiou

, & Raftery

(2007). Determining growth determinants: Default priors and predictive performance in Bayesian model averaging. Center for Statistics and the Social Sciences University of Washington. Working Paper no. 76.

Eicher

Papageorgiou

, & Raftery

(2011). Default priors and predictive performance in Bayesian model averaging, with application to growth determinants. Journal of Applied Econometrics, 26, 30-55.

Fernandez

Ley

, & Steel

(2001a). Model uncertainty in cross-country growth regressions, Journal of Applied Econometrics, 16, 563-576.

10.

Fernandez

Ley

, & Steel

(2001b). Benchmark priors for Bayesian model averaging, Journal of Econometrics, 100, 381-427.

11.

Feldkircher

(2012). Forecast combination and bayesian model averaging: A prior sensitivity analysis. Journal of Forecasting, 31(4), 361-376.

12.

Foster

D. P.

, & George

E. I.

(1994). The risk inflation criterion for multiple regression. Annals of Statistics, 22, 1947-1975.

13.

García-donato

, & Martínez-beneito

M. A.

(2013). On sampling strategies in Bayesian variable selection problems with large model spaces. Journal of the American Statistical Association, 108(501), 340-352.

14.

Hannan

E. J.

, & Quinn

B. G.

(1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B (Methodological), 41(2), 190-195.

15.

Hanson

T. E.

Branscum

A. J.

, & Johnson

W. O.

(2014). Informative g-priors for logistic regression. Bayesian Analysis, 9(3), 597-612.

16.

Hoeting

J. A.

Madigan

Raftery

A. E.

, & Volinsky

C. T.

(1999). Bayesian model averaging: A tutorial (with discussion). Statistical Science, 14, 382-401. Corrected version at http://www.stat.washington.edu/www/research/online/hoeting1999.pdf.

17.

Kadane

J. B.

, & Lazar

N. A.

(2004). Methods and criteria for model selection. Journal of the American Statistical Association, 99(65), 279-290.

18.

Kass

R. E.

, & Raftery

A. E.

(1995). Bayes factors. Journal of the American Statistical Association, 90, 773-795.

19.

Kass

, & Wasserman

(1996). The selection of prior distributions by formal rules. Journal of the American Statistical Association, 91(435), 1343-1370.

20.

Lee

P. M.

(2004). Bayesian Statistics: An Introduction. Oxford University Press, New York, Third Edition.

21.

, & Clyde

(2015). Mixtures of g-priors in generalised linear models. Cornell University Library, axXiv, 1503-06913.

22.

Liang

Paulo

Molina

Clyde

, & Berger

(2008). Mixtures of g-priors for Bayesian variable selection. Journal of the American Statistical Association, 103, 410-423.

23.

Montgomery

, & Nyhan

(2010). Bayesian model averaging: Theoretical developments and practical applications. Oxford Journal: Political Analysis, 18(2), 245-270.

24.

Nielsen

Christensen

Cemgil

, & Jensen

(2014). Bayesian model comparison with the g-prior. IEEE Transactions on Signal Processing. DOI: 10.1109/TSP.2013.2286776.

25.

Okafor,

Ray.

(1999). Using an empirical Bayes model to estimate currency exchange rates. Journal of Applied Statistics, 26(8), 973-983.

26.

Raftery

A. E.

Karny

, & Ettler

(2010). Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill. Technometrics, 52, 52-66.

27.

Raftery

A. E.

Madigan

, & Hoeting

J. A.

(1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association, 92, 179-191.

28.

Rossi

P. E.

Allenby

G. M.

, & Mcculloch

(2005). Bayesian Statistics and Marketing. Wiley Series in Probability and Statistics.

29.

Zellner

(1983). Applications of Bayesian analysis in econometrics. Journal of the Royal Statistical Society, Proceedings of the 1982 IOS Annual Conference on Practical Bayesian Statistics, Series D (The Statistician), 32(2), 23-34.

30.

Zellner

(1986). On assessing prior distributions and Bayesian Regression analysis with g-prior distributions. The American Statistician, 49, 327-335.

An assessment of predictive performance of Zellner’s g-priors in Bayesian model averaging

Abstract

Keywords

1. Introduction

2. Methodology

2.1 Bayesian model averaging

2.3.1 Log predictive score (LPS)

Table 4 Comparing the predicted values of the 72 nd observation (dependent variable) with its actual value using log predictive scores (LPS) across parameter g-priors

4.1 Results using data provided by FLS

4.2 Published real life data

5. Conclusion

References

Table 4
Comparing the predicted values of the 72 ${}^{\rm nd}$ observation (dependent variable) with its actual value using log predictive scores (LPS) across parameter g-priors