Abstract
It has recently been suggested that financial speculation is now playing an important role in daily price movements of global oil prices. This raises the question: what are important drivers of price changes given this new speculative regime? We identify new factors of the oil market related to speculation by fitting subset vector autoregression models with exogenous variables (SubVARX) and rank them by importance. Further, to account for model uncertainty and to obtain robust parameter estimation in this study, we apply a bootstrap model selection procedure. We find that certain speculative factors explain a large portion of the variation in oil price for the given data set.
1. Introduction
Increased volatility and price surges in the global oil market over the last 10 years have sparked a renewed interest in understanding the contributing factors to oil price dynamics. In particular, it has been noted that financial institutions have had a greater influence on the price of oil due to the allocation of assets to commodity index trading strategies increasing from 13 billion dollars in 2004 to 260 billion dollars as of March 2008. Commodities behave differently to stocks and bonds (Roache, 2008) providing investors with portfolio diversification opportunities (Chen and Pinsky, 2003; Jennings, 2012) and a hedge against inflation (Froot, 1995). More importantly, it has also been suggested that price speculation is now playing an important role in daily price movements (Juvenal and Petrella, 2011; Kilian and Murphy, 2011). This raises a number of questions. First, oil price models have traditionally relied on oil inventories as a factor of global oil prices due to the way they reflect the demand and supply imbalances that drive fluctuations in the oil price (He et al., 2010; Kilian and Murphy, 2011; Manera et al., 2007; Merino and Ortiz, 2005; Pang et al., 2011; Ye et al., 2002, 2005, 2006). However, given this new ‘speculative regime’ are such models still appropriate? Second, if we are in a new regime, are there new and relevant factors that one should incorporate into a model?
Answering these questions is important due to their numerous policy implications. It is now well understood that periods of high oil prices have a number of significant effects on the global economy. As most industrialised countries are net oil importers, periods of high oil prices can have significant negative effects on their economic growth and may even cause recessions (Gisser and Goodwin, 1986; Hamilton, 1983). In essence, higher prices transfer wealth from oil-importing economies to oil-exporting economies (Sachs et al., 1981). Higher oil prices have economy-wide implications including increasing the cost of production, increasing the cost of services, and increasing the cost of agriculture. Persisting high oil prices may lead to inflation (Cunado and Perez de Gracia, 2005) resulting in reduced consumer confidence as higher oil prices suppress consumption (Uri and Boyd, 1997). Oil prices can also have an impact on financial markets. Historically, the oil price shocks in 1973, 1979, and 1990 were followed by stock market downturns and high oil prices can also affect the financial performance and cash flows of firms. In turn, this affects shareholders due to reduced dividend payments and equity prices (Huang et al., 1996). Given the significant role the world oil market has in almost all aspects of the global economy, it is imperative that economic planners understand the factors that drive this market.
Our study identifies new factors related to financial speculation that are not yet well recognised in the oil markets. Our data-driven approach starts with the application of an unrestricted vector autoregression with exogenous variables (VARX). This approach is well known to be useful for determining relationships between economic variables and for conducting evidence-based policy. We have chosen a set of economic and financial variables through consultation with commodities strategists involved in the asset allocation decision process. While it has been identified that over-fitting can be an issue with such studies (Anderson and Burnham, 2003; Cox and Reid, 2000), market participants seeking to identify unrecognised investment opportunities do require such analysis. By only attempting to fit models with already well understood and well recognised parameters, market participants are unable to identify changes to market structure and any new relationships that may have emerged.
Most studies of this type face the problem of data-dredging or data-snooping bias. One way to circumvent this is to apply a statistical methodology that takes model uncertainty into account. Model uncertainty typically occurs as one only has a single realisation of the data generating process to analyse, thus any model fitted may only be capturing characteristics specific to the single sample path and may not generalise to the population. This can lead to a number of problems such as noise variables being frequently identified as true factors, true factors being excluded from the model, small p-values, and coefficients biased away from zero; see Anderson et al. (2001), Austin (2008), Bernanke et al. (2004), Clyde (2000), and Clyde and George (2004). This can lead to overestimation of the importance of the variables themselves.
Our study is the first to address the problem of model selection uncertainty in the context of commodity modelling. We apply non-parametric bootstrap techniques in conjunction with an automated general-to-specific (GetS) model selection algorithm to estimate coefficients for VARX models via ordinary least squares. The bootstrap provides the relative importance of variables as well as the relative importance of combinations of variables. This allows us to more accurately assess the importance of factors under study when compared to traditional methods. Whilst recent studies have investigated the role speculation as a whole has played on the oil market (e.g. Juvenal and Petrella, 2011), our study specifically investigates the importance of each of the factors under study. That is, previous studies have asked the question ‘How much do speculative factors influence the oil market?’ whereas we ask the question ‘which speculative factors affect the oil market and which ones are the most important?’ Further, combined with the application of VARX, our study is also the first to explicitly identify the time it takes for changes to these factors to flow through to the oil market.
2. Data
We base our study on a seven-dimensional vector time series
Description and sources of data.
The endogenous (Endo.) and exogenous (Exo.) variables considered in our study. Business confidence is considered a leading indicator of economic activity and industrial production. To investors, a possible increase in economic activity would lead to greater demand for oil, therefore representing possible growth in the asset class and a good investment (Aastveity et al., 2012). Bonds and cash are seen as ‘safe’ alternative asset classes and provide a haven during volatile periods and therefore are considered as part of any asset allocation decision. Oil is often argued to provide inflation hedging properties and so the change to CPI has also been included (Chen and Pinsky, 2003; Frankel, 1986, 2008; Frankel and Rose, 2010).
While some series are only reported at low frequencies (e.g. monthly), our model must allow for signals from data series that are high frequency. Our model must also allow for signals that are low frequency but may not be released/reported at regular intervals. Given the critical time value of information, it is important to reflect changes to these series as they happen in our model. Therefore, we implemented our model based on a daily frequency and converted any data that is reported on a monthly basis to a daily frequency by carrying forward the last observation.
Over the sample period of March 1995 to August 2010 (consisting of 4022 data points), we consider daily changes
where
Given the daily changes vector zt, we also introduce the lags
3. Model and methodology
To robustly identify if speculation plays a role in the determination of world oil prices we have applied a methodology that we shall now describe.
3.1 The vector autoregression model
We begin the construction of the oil price model with a full-order unrestricted VARX of the form
where
3.2 Capturing dynamic relationships
In (1), all coefficients of the matrices
Starting with a model of the form (1) and changing notation slightly to reflect that quantities defined previously may also depend on the model order and some lags, the multivariate SubVARX
This approach of first identifying the full model in the VARX avoids over-identifying restrictions and accidentally excluding important variables especially given the lack of strong economic theory on how restrictions should be imposed. Although economic theory may help to determine what the restrictions should be, we take a data-driven approach given our limited a priori knowledge.
3.3 Model selection
The structure of (2) gives a large number of possible models that could be fitted. In fact,
Autometrics is a multiple search path strategy and revolves around a tree search that considers every possible model given the initial unrestricted model known as the general unrestricted model (GUM), that is, the VARX model (1). The true data generating process is assumed to be a subset of the GUM, that is, a particular choice of (2). The Autometrics algorithm estimates single equations using ordinary least squares as opposed to system estimation. While it could be argued that a system approach would be a better approach in cases where the variance covariance matrix of the system is non-diagonal, studies such as Krolzig (2000) and Hendry and Krolzig (2004) demonstrate that the parameters selected by Autometrics can successfully describe the true underlying data generating process, provide good size to power trade-offs, and are computationally more efficient when compared to full system estimation methods. Additionally, the GetS procedure can be applied equation-by-equation without loss of asymptotic efficiency (Krolzig, 2003).
To identify an optimal model, we selected a maximum lag q that ensures the true order (maximum lag of the endogenous and exogenous variables) of the model
3.4 Model selection uncertainty
Identifying factors that influence world oil prices based on a single time series sample path is not robust against the problem of model selection uncertainty. This can lead to a number of problems such as certain irrelevant variables being identified as true factors, true factors being excluded from the model, small p-values, and coefficients biased away from zero; see Anderson et al. (2001), Austin (2008), Bernanke et al. (2004), Clyde (2000), and Clyde and George (2004). This can lead to overestimation of the importance of the factors themselves.
A number of approaches have been developed that attempt to deal with the model selection problem. Among them are bootstrap model averaging (Buckland et al., 1997), Bayesian model averaging (Clyde, 2000), bootstrap estimation of coefficients and confidence intervals (Austin, 2008), and multi-model inference (Anderson and Burnham, 2003). Each approach has its strengths and weaknesses. For example, some of these methods rely on distributional assumptions and a well-defined set of candidate models (Alfaro and Huelsenbeck, 2006). For an excellent outline of the model selection problem, see Anderson and Burnham (2004).
We applied a model-based resampling method called the non-parametric bootstrap (Efron and Tibshirani, 1994; Srinivas and Srinivasan, 2001). This approach builds a sampling distribution for the coefficient estimates by generating a large number of replicates of the original time series using a model-based resampling procedure. The only assumption required about the underlying data generating process is that it is based on i.i.d. innovations. This allows us to assess model uncertainty and identify possible model structures without the need to make distributional assumptions (Hinkley and Davison, 2006) and is particularly relevant in our setting as the residuals are not normally distributed; see Figure 1.

Normality test for the residuals
The residuals
3.5 Robustly identifying factors
To robustly identify factors, we combine the steps explained above to obtain L possible coefficient estimates
Generate the first set of approximately i.i.d. residuals
Bootstrap the residuals by sampling with replacement from the components of
Generate L bootstrapped replicate time series using the model
for
For each bootstrapped replicate time series
Store bootstrapped coefficient estimates
Although this methodology is highly computational, the main advantages of this approach are that an approximate sampling distribution for each of the coefficient estimates is established and the non-parametric bootstrap procedure (steps 2 and 3) allows us to estimate the sampling distribution of a statistic empirically without making assumptions about the form of the population. There are two sources of error when implementing the bootstrap: first, the error resulting from using a particular sample to represent the population, and second, the sampling error. We can mitigate sampling error by generating a large number of samples L.
In step 1, we fitted the model (1) that includes all available factors with all lags. By doing so, we are as far as possible removing any structure or dependence from the resultant observed residuals. Ideally, we would like to initially fit ‘all possible’ factors, so that we can be sure that the real model (assuming one exists) is nested within the full model fit. Of course, fitting all possible factors is not feasible in reality. Nevertheless, this is a necessary assumption that the model-based bootstrap procedure rests on.
We whitened the residuals before bootstrapping and then recolouring them to form the bootstrap. Our approach involved first fitting a very large number of predictors to help ensure as much as possible that the residuals did not contain any dependence structure. Further, analysis of the pre-whitened residuals indicated that there was no dependence remaining that required additional fitting (e.g. block bootstrap).
3.6 Relative importance of factors
As previously discussed, simply identifying important factors from the application of a model selection routine to the original time series may not give the researcher a deep enough appreciation of the importance of particular factors. For example, particular variables or combinations of variables may be selected, but not necessarily selected as part of the most frequently occurring models. That is, one might falsely conclude that a factor has no relationship to the response variable, simply because the most frequently occurring models excluded this variable as part of the model selection process. However, by tallying the number of times that any particular model is selected and dividing through by the number of bootstrap simulations L, we are able to calculate model selection probabilities or weights. In turn, this allows us to identify the relative importance of factors and combinations of factors.
To assess the relative importance of any particular factor we follow the approach of Anderson and Burnham (2003) and find the sum of the bootstrapped model selection weights for all the models that include the factor of interest. The higher the sum of the weights, the higher the relative importance of the factor. By following the same approach with pairs, triplets, and quadruplets, we are able to gain a better understanding of the importance of factor groupings. Care is needed in interpreting the weights. They should not be interpreted in absolute terms and should not be compared to the importance weights derived from studies where different model structures and/or variables are considered.
4. Empirical results
4.1 Initial model selected
We begin by examining the SubVARX model (2) identified by the automatic model selection algorithm applied to the original (non-bootstrapped) time series data with a significance level of 0.01, see Table 5. The factors for the oil supply equation St and the oil price equation Ot with the associated coefficient determined are given in Table 2. Also given are the standard errors, the t-values, and the t-probabilities for the coefficients. The results suggest that the important factors in the oil spot price equation Ot are the US 10 year government bond yield at lags 0 and 13 (US10GB and US10GB_13), US business confidence at lags 8 and 11 (USBC_8 and USBC_11), and the US inflation rate at lag 0 (USInfl). The important factor in the oil supply equation St is the US inflation rate at lag 11 (USInfl_11).
Initial models selected by the automatic model selection algorithm.
The initial models for the oil price Ot and oil supply St selected by the automatic model selection algorithm Autometrics using a significance level of 0.01. The oil equation model Ot is shown on the left and the oil supply equation St is shown on the right. Conditional coefficient estimates (Coefficient) and conditional standard errors (Std error) are given and are calculated directly from the fitted model. The standard error reported estimates the variability conditional on the chosen model being selected and is determined as in (7). Also shown are the test statistics (t-value) and the p-values associated with each test statistic (t-prob). Significant t-probabilities (i.e. factors) are indicated with a star (*) and are calculated directly from the fitted model, see Appendix.
Some economists have previously suggested that a relationship between the government bond yields and oil prices exists whereby higher oil prices increase world net savings and then, subsequently, saved petrodollars are used to purchase securities. They suggest that higher oil prices would produce downward pressure on US government security yields. Some evidence of this relationship was found in the 12-month period through May 2005; see International Monetary Fund (2006) and Warnock and Warnock (2009). However, when industrial production was included in their regressions, this link became statistically insignificant. Interestingly, our initial model points to a new relationship: increasing government bond yields leads to increasing oil prices. One could argue that this relationship may be linked to government securities being sold from portfolios (hence, increasing bond yields) to purchase oil securities.
Identification of US business confidence at lags 8 and 11 as a significant factor indicates that it takes around one and a half to two trading weeks for shocks to business confidence to flow through to world oil prices. Business confidence is an important leading indicator of economic performance and an increase in economic activity would lead to greater demand for oil. One would typically argue that business confidence is a proxy for demand and higher demand would drive oil prices higher if it is not matched by sufficient oil supply. However, this does not rule out speculative play. Traders may use business confidence as an indicator of possible growth in the asset class and a sudden increase could signal a good time to invest (Aastveity et al., 2012).
The relationship between oil prices and inflation has been extensively discussed in the literature. The standard view is that higher oil prices increase inflation. However, much of the research on the relationship between oil prices and inflation focuses only on the relationship between either oil price levels (rather than returns) and inflation or the relationship between oil price changes and changes to CPI over a comparable period (e.g. daily correlations, monthly correlations, year-on-year changes). Our study however allows for the analysis of inter-temporal relationships.
Surprisingly, there are no significant lags for the oil supply factor and oil price factor appearing in either the oil equation or the oil supply equation. It would seem reasonable to expect that oil supply should impact on the price of oil to at least some degree. So we re-ran the algorithm with different settings for the p-values for the diagnostics and factor significance levels. We decreased the p-value to 0.10 from 0.01 with no significant change. These results appear to suggest that the supply levels themselves are not important. Relative inventories have historically been the important driver of oil markets rather than actual supply levels.
One should be careful in interpreting these results as factors are sometimes included simply because the corresponding variable in the other equation is significant according to its value. As a result, an insignificant factor will sometimes appear in an equation even if it is deemed to have no explanatory power according to its p-value. For example, the US inflation rate at lag 11 (USInfl_11) is the only one of two factors deemed important in the oil supply equation St and, although it is not significant in the oil equation Ot, it still appears as part of that equation. When we calculate the bootstrap coefficient and standard error estimates, it would have been more accurate to set these insignificant coefficients to zero. However, there will only be a small component of variation contributed by these variables, which will not materially affect our results.
4.2 Ranking models and factors
For each of the L replicate time series
Ranking models and factors that explain price changes.
The top 20 models based on model tally (column 2) against the top 10 factor/lags related to financial speculation and the top 10 ‘classic’ factor/lags (i.e. related to supply and demand). The table ranks the models (model importance weight) and the factor/lags based on their frequency among the L models. We colour the table cell grey if the factor/lag is included in the specific model and white if it is not. For each factor/lag we give its importance (1.0 means the factor is included in 100% of all models) in the top row of the table. A standard error estimate for each factor can be found in Table 6.
Relative importance of factor groups and their lags.
The relative importance of not only single factors, but pairs, triplets, and quadruplets. As US10GB and Infl_11 are factors present in 100% of all L models, in this table we treat them as one factor.
Automated model selection algorithm versus the bootstrap.
Once we have our L bootstrapped replications, we can estimate the standard error of the parameter estimates in our model as the standard deviation of the coefficient estimates found over each of the bootstrapped replications. We can also estimate the population coefficients as the average of the bootstrapped fitted values. Conditional estimates therefore bias coefficient estimates away from zero and standard error estimates downwards. Unconditional estimates set parameter estimates to zero wherever the model selected (for a particular bootstrap replication) does not contain that factor. Conditional estimates do not account for model selection uncertainty. If we do not take model selection uncertainty into account then we introduce bias into our estimates. The difference between the unconditional and conditional estimates quantifies the model uncertainty component as described previously. According to the results of the automatic model selection algorithm the important factors in the oil spot price equation y1 (t) are the US 10 year government bond yield (lags 0 and 13), the US business confidence (lags 8 and 11), and the US inflation rate (lag 0). The important factor in the oil supply equation y2 (t) is the US inflation rate (lag 11). There are no significant lags for the oil supply and oil price variables appearing in either the oil equation or the oil supply equation.
Each model structure is also shown against 20 factor/lags. After calculating relative importance of each factor, we classified these factors into two types: the factors related to financial decision-making and the ‘classic’ supply-and-demand factors used for oil modelling. Within each of these types, our table shows the top 10 factor/lags based on the relative importance of that factor/lag. The relative importance is determined by the percentage of times the factor/lag appeared within the L models. An importance weight of 1.00 means that the particular factor/lag was included in 100%
Comparison of Model 1 in Table 3 to the initial model selected for Ot (see Table 2) shows that the top three factors (US10GB, USInfl_11, USInfl) were also selected by the initial model selection. However, US business confidence at lag 11 (USBC_11) with an importance weighting of 0.6904 and US 10 year government bond yield at lag 4 (USGB10_4) with an importance weighting of 0.5517 were not selected by the initial model selection. Comparing with Model 2, which includes the speculative factor/lags (US10GB, USInfl, US10GB_13, US10GB_4, USInfl_7) and the classic factor/lags (USBC_8, USBC_1), we see that USInfl_7 with an importance weight of 0.4214 is not selected by the initial model selection. Interestingly, Model 2 is more similar to the initial model selected than Model 1. However, overall, there is a lot of variability between the top 20 models so comparing models on a case-by-case basis is futile. Therefore, to better understand our results, in Table 3 we decided to split the factors into two categories: ‘speculative’ and ‘classic’. This categorisation is based on the view that the speculative factors could be considered variables that a trader may consider as a way to pre-empt increases in oil prices whilst classic factors could be considered as proxies for the supply-and-demand dynamics of oil prices. By making this categorisation and by colouring cells grey if the factor is included in the model and white otherwise, we observe that speculative factors have a higher presence in the top 20 models compared to classic factors.
The 10 year government bond yield at lag 0 was selected in 100% of all models and at lags 4 and 13 at least 50% of the time. From an asset-allocation perspective bonds are seen as a ‘quality’ safe asset and in times of financial market turmoil a flight to quality out of assets declining in value is often observed. If one asset class as part of your portfolio allocation strategy is experiencing significant volatility, the safe haven of bonds may present as attractive. In such cases yields are bid lower and prices higher. This can explain the positive and important relationship between decreasing oil prices and decreasing yields on bonds.
The 10 year government bond yield at lag 0 and the inflation rate at lag 11 (around the two-trading-week mark) are the most important factors given that they were chosen 100% of the time. Next, inflation at lag 0 was selected 84% of the time, followed by business confidence at lag 14, business confidence at lag 8 and then the 10 year government bond yield at lag 4.
Inflation was also selected in 100% of all models. Interestingly, when considering the coefficient estimates (see Table 6) we find that news of a positive shock to inflation causes the market to react negatively on a (very) short-term basis to oil, but this relationship is reversed over the longer period (lag 11 approximately two trading weeks). The reversal at the two-week mark (lag 11) is however the most important variable (selected 100% of the time) and lends support to the idea that oil is often considered an investable asset class that acts as an inflation hedge (Chen and Pinsky, 2003; Frankel, 1986, 2008; Frankel and Rose, 2010) and is therefore considered a speculative factor rather than a supply/demand factor.
Most important factors by bootstrap weight.
This table summarises the top 20 factors by bootstrap weight for the oil equation y1 and the oil supply equation y2. These results suggest that the 10 year government bond yield at lag 0 and the inflation rate at lag 11 (around the two-trading-week mark) are the most important factors given that they were chosen 100% of the time in 10,000 simulations. Next, inflation at lag 0 was selected 84% of the time, followed by business confidence at lag 14, business confidence at lag 8, and then the 10 year government bond yield at lag 4.
By incorporating model uncertainty, we have obtained a more robust view of the driving factors of world oil prices. If we took the approach of only viewing factors as important if they were chosen as part of the initial model selection, we may have mistakenly concluded that certain factors are unimportant.
Due to the low tally for each model structure, it seems pointless to argue that one of the top 20 model structures is the ‘true model’. Apart from the speculative factors (US10GB, USInfl_11, USInfl), and to some extent the classic factor USBC_11, there is quite a high variability as to whether a factor is included or not. Given the high degree of variability, one should apply a grouping method to capture the basic structure (if any) that occurs most frequently. This could be achieved using clustering methods. However, we found that the high degree of variability limited the usefulness of clustering analysis in our study. Another method of identifying important groups of variables (and therefore possible model structures) is to calculate the relative importance weights of factor/lag groups.
4.3 Factor groupings
To understand the importance of certain factor/lag groups found by our procedure, we calculated the frequency of pairs, triplets, and quadruplets appearing in the L models found by our procedure, see Table 4.
These results indicate that, at a minimum, any model must include the factors US 10 year government bond yield (lag 0) and US inflation rate (lag 11). We see from the ‘singles’ and ‘pairs’ results that both these variables were selected 100% of the time and are important when it comes to explaining the global oil market.
Obviously correlation is something to consider here and we need to be aware that correlation can impact the interpretation of the relative importance. Multicollinearity is a situation when two or more variables are factors for the same variable and are correlated to one another. Multicollinearity makes it difficult to determine the relative importance of each factor on the dependent variable (Smith et al., 2009).
If we turn our attention to the ‘quadruplets’ rankings (which incidentally are by default quintuplets), we see that the second row contains the exact combination of these top five individually ranked factors. The ranking for this combination of factors is only 0.404. This tells us that this particular combination of factors was selected less than half the time. If the results of this investigation were intended to be used to build a model intended for predictive purposes, simply constructing a model based on single factor frequency alone is possibly not the best strategy.
Now consider the relative importance of the highest ranked triplets, see Table 7. We see that the combination of US 10 year government bond yield (lag 0), US inflation rate (lag 11) and US inflation rate (lag 0) yield a relative importance weighting of 0.843. Almost 85% of the time, these three variables turned up together as a result of the model selection process. Obviously, we would recommend that any further analysis with respect to building models for the oil market should take into consideration this combination. We also recommend examining subsets of sizes two, three, and four from this set of variables for use in multi-model inferences, in particular model averaging.
Relative importance of different factors and lags for the oil equation y1.
This table shows the relative importance of different factors for the oil equation y1. The US 10 year government bond yield (lag 13) factor has a relative importance weighting of 0.687. As the frequency with which a factor is selected increases, the disparity between the conditional and unconditional estimates will decrease (as evidenced by the conditional and unconditional estimates for the US 10 year government bond yield (lag 0) factor). The percentage difference between the bootstrap unconditional and conditional standard errors for this factor is 164%.
These bootstrap weights can be used in a couple of ways. For example, the weights used here could be used to make weighted average predictions out of sample as part of a confirmatory study. Alternatively, the bootstrap model weights can be used to determine a set of candidate models a priori, ranked according to their bootstrap relative importance. Out of sample weighted average predictions using Bayesian model averaging techniques or Akaike weights could then be performed. Multi-model inferences are shown to have better predictive performance than those made from a single model. See Eicher et al. (2011) for a discussion on improved predictive performance and Lavoué and Droz (2009) for practical implementations of multi-model inferencing using bootstrap and Bayesian model averaging. Anderson and Burnham (2003, 2004) explain a multi-model inferencing approach that utilises Akaike weights after first identifying a small a priori set of candidate models. In an exploratory context, the bootstrap weights can be used to help identify a set of a priori candidate models when researchers are seeking to identify relationships between variables that are not already well recognised by the market.
Initial results suggest that the combination of variables in Table 8 with a relative importance weighting of 0.590 is the most important.
Oil price variation attributed to top speculative exogenous factors.
We measure the partial R2 of the speculative exogenous factors US10GB, USInfl, and USInfl_11 with a relative importance weight of approximately 85% (i.e. the three predictors chosen as a group together 85% of the time). We find that the partial R2 is 23.84% for US10GB while USInfl and USInfl_11 have partial R2 of 5.54% and 2.03%. In total, these three speculative factors explain 31.41% of the variation in the price of oil for the given data set.
While the analysis above assists us in developing a better understanding of which variables and combinations of variables are ‘important’ in terms of frequency of selection, it does not give any indication of the relative impact that each variable has on the oil market given a unit change in those variables. A more complete understanding of the factors driving the oil market will come with an analysis of the relative size of coefficients and standard errors. However, we remind readers that this analysis is exploratory in nature; we caution against making inferences about the size of the coefficients based on the results presented. All results should be interpreted as possible hypotheses to be tested in a more confirmatory analysis. As such we do not interpret the size of the estimated coefficients in this study.
5. Conclusion
Identifying speculation in the world oil markets is an important topic due to the policy implications, however, one must be careful due to the presence of model uncertainty. Our paper is the first study of this kind to address the presence of model uncertainty and is therefore robust to model mis-specification. This allows us to provide further evidence to the hypothesis that speculation is playing a role in the determination of prices.
The oil market has evolved from one dominated by pure commercial interests into an asset class providing inflation hedging opportunities, diversification, and alpha resulting from speculative bets. This motivates our interest in identifying possible new factors of oil prices beyond the supply and demand imbalances driving relative inventories.
Coefficient estimates and their standard errors have traditionally been determined by conditioning on a model form assumed to be correct. However, the model form is a source of variation in itself and adds to the uncertainty in our coefficient estimates. Traditional model selection techniques underestimate standard errors, overestimate coefficient estimates, select noise variables, and exclude true factors from the model because they only account for within-model uncertainty and ignore between-model uncertainty. We use a model-based resampling technique known as the bootstrap to more accurately identify important new speculative drivers of the global oil market. Using the bootstrap we are able to quantify the variability associated with model uncertainty as well as rank models and variables according to their relative importance. We propose a set of five possible new factors for their possible speculative relationships with the oil market: industrial production, US fed funds target rate, inflation, 10 year government bond yield, and US business (manufacturing) confidence indicator. Our results explicitly identify a set of new important speculative factors, rank the factors by relative importance, and identify the time value of shocks to these factors.
Our results suggest that the 10 year government bond yield at lag 0 and the inflation rate at lag 11 (around the two-trading-week mark) are the most important factors given that they were chosen 100% of the time. Next, inflation at lag 0 was selected 84% of the time, followed by business confidence at lag 14, business confidence at lag 8, and then the 10 year government bond yield at lag 4. Further, we see that the combination of US 10 year government bond yield (lag 0), US inflation rate (lag 11), and US inflation rate (lag 0) yield a relative importance weighting of 0.843. Almost 85% of the time, these three variables turned up together as a result of the model selection process. In Table 8, we measure the partial R2 of the speculative exogenous factors US10GB, USInfl, and USInfl_11. We find that these factors explain 31.41% of the variation in oil prices for the given data set. This further supports the claim that speculative factors are now intimately tied with the dynamics of oil prices.
We believe our results lend support to the notion that asset allocation decisions, inflation concerns, portfolio diversification considerations, and speculation are influencing the oil market. In particular, the compelling evidence for the importance of the inflation rate at lag 11 lends support to the idea that oil is often considered an investable asset class that acts as an inflation hedge. Similarly, the importance of the 10 year government bond yield supports the hypothesis that alternative asset classes and their relationship to the oil market are now part of the asset allocation decision. The importance of the business confidence factor implies that speculation about oil prices, whilst less important, is still a major consideration in the determination of oil prices. These results lend support to the hypothesis that financial market participants are indeed playing a role in the determination of oil prices.
Footnotes
Appendix
The conditional bootstrap coefficient estimate for
Here, we find the average coefficient estimate over the set of models that report an estimate for
The conditional standard error estimate for
where Il is 1 if the parameter
where SV is the standard error of the estimate: the standard deviation of the residuals. This is the standard error calculated directly from the fitted model.
We use (8) to determine the unconditional bootstrap coefficient estimate for
Equation (9) is the unconditional standard error estimate for
Final transcript accepted 21 March 2014 by Hodaka Morita (AE Economics)
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
