Abstract
This paper is aimed at the analysis and verification of the formula for computing the number of degrees of freedom for the combined model when averaging across a set of regression models, which was proposed by Moiseev (2017) but was not thoroughly analyzed. The key feature of this formula is that it is applicable to absolutely any averaging method what dramatically widens its scope of application. We notice that the exact number of degrees of freedom for the combined model can not be computed due to uncertainty of variance-covariance matrix of submodels’ errors. However, it is shown by conducted simulation study that even using unbiased estimator of this matrix yields reliable confidence intervals. Therefore, considered formula appears to be crucial when computing interval forecast by model averaging methods.
Keywords
Introduction
When analyzing and modeling economic data researchers very often resort to construction of regression models, which nearly always requires model specification and model selection procedures. Model selection procedure can be performed using different types of model efficiency criteria, e.g. mean-squared forecast error, F-statistics, mean-squared bootstrapped error and various information criteria. One of them is Bayesian information criterion (BIC), which was introduced by Schwarz (1978). Since then there were a lot of papers written on its application in econometrics, which include Raftery et al. (1997), Hoeting et al. (1999), Brock and Durlauf (2001), Brock et al. (2003), Fernandez and Steel (2001), Garratt et al. (2003) and Sala-i-Martin et al. (2004). Another selection method is Mallows criterion, which was introduced by Mallows (1973) and resembles Akaike (1973) and Shibata (1980) information criteria whose asymptotic optimality was researched by Shibata (1980, 1981, 1983), Lee and Karagrigoriou (2001), Ing (2003, 2004, 2007) and Ing and Wei (2003, 2005). Focused information criterion (FIC) was first proposed by Claeskens and Hjort (2003) and is also used when choosing among a set of models.
However recent literature is focused more on averaging across the set of models rather than picking just one of them. Combination of forecasts was pioneered by Bates and Granger (1969) and further developed in Granger and Ramanathan (1984). Since then its synergetic effect of decreasing the forecast error was affirmed by number of econometricians and no doubts arise about its efficiency, see for example Granger (1989), Clemen (1989), Hendry and Clements (2002), Timmermann (2006) and Stock and Watson (2006). However, there is still no consensus about selecting forecast weights. The most recent papers are focused on five major methods of weights selection: simple averaging, Bayesian and Akaike model averaging (BMA), Mallows model averaging (MMA), focused model averaging (FMA) and mean-squared forecast error (MSFE) model averaging.
Simple average works quite well, when submodels are properly specified. However, in case a poor model is included in the set of models for averaging, simple averaging will pay the penalty. In this respect simple averaging should incorporate submodels, which are very close in their characteristics, to be considered as a complete method, what imposes quite tough restrictions on submodel selection. Bayesian information criterion was proposed to be used in model averaging by Min and Zellner (1993) and since then its efficiency was explicitly demonstrated by Stock and Watson (1999, 2004, 2005) and Wright (2003a, b). Using the exponential Akaike information criteria (AIC) for model weights computation was introduced by Akaike (1979) and further developed in Burnham and Anderson (2002). Mallows model averaging and closely associated averaging techniques were profoundly researched by Hansen (2007, 2008, 2014), Hansen and Racine (2012) and Cheng and Hansen (2015) where authors provided a strong empirical evidence of their efficiency compared to most common averaging methods. Focused model averaging works on the basis of focused information criterion (FIC) and was elaborated by Hjort and Claeskens (2006) and further developed in Claeskens and Hjort (2008). MSFE model averaging was proposed in Zubakin et al. (2015) and further elaborated by Moiseev (2016, 2017) and focuses on minimizing an unbiased estimator of mean-squared forecast error of combined model.
As was mentioned by numerous econometricians there is no existing method that would outperform all the others in all types of initial settings. Therefore averaging methods can coexist yielding satisfying results when applied in a proper situation. However when forecasting economic processes it is also (if not more) important to obtain a reliable confidence interval for constructed forecast as this goal is crucial for risk-management purposes, investment, strategic planning and etc. Unfortunately, most of the papers, devoted to model averaging techniques, inexplicably omit this issue except for Zubakin et al. (2015) and Moiseev (2016, 2017), where authors show, that one can obtain interval forecast by applying t-distribution quantiles with the number of degrees of freedom, computed by the formula also provided in their research. In this paper we will focus on analyzing properties of degrees of freedom formula and provide simulation and empirical testing to prove its applicability in forecasting economic processes by combination of models. We also show that obtaining reliable confidence intervals does not depend on the chosen model averaging method, what significantly widens the field of its application.
The paper has the following structure. Section 2 reviews the principles of model averaging, in particular MSFE model averaging, and discusses the analytical expression of the number of degrees of freedom for combined model. Section 3 analyzes the properties of number of degrees of freedom and presents the results of out-of-sample simulation testing. In Section 4 we sum up the key points of the paper and emphasize the main characteristics of proposed method.
Review of MSFE model averaging principles
Let
where
where
Here we notice that
where
and
We also suppose that each of
Additionally to traditional OLS prerequisites we assume the following:
Here it is worth noticing, that Assumption 6 holds almost automatically if Assumption 5 holds.
For MSFE model averaging the goal is to adjust weights in such a way, that they would yield the lowest expected mean-squared forecast error and at the same time satisfy the constraints shown in Eq. (6). It is a general knowledge that MSFE of a linear model consists of an error term variance plus the regression line variance, see for example Mood et al. (1974). Thus, this model averaging method is aimed at obtaining the minimal value of the expression below:
As it was shown in Moiseev (2016), the explicit formula for MSFE looks as follows:
where
where
Weight selection can be implemented by solving the following optimization problem:
Due to imposed constraints analytical solution for Eq. (10) is not available for
Moving forward to interval forecast, we know that in order to compute a confidence interval for a linear regression we use t-distribution with
where
In case of combined model to obtain confidence intervals for the point forecast at
where
Hence the problem of confidence interval definition converges to the computation of the number of degrees of freedom
then we can infer, that
The explicit formula for the number of degrees of freedom was derived in Moiseev (2017) and looks as follows:
where
where
where
Here we notice, that for computation of exact number of degrees of freedom one should use the variance-covariance matrix of true errors of considered submodels, but since it is not available matrix COV, displayed below, with matrix UCOV.
To summarize the MSFE model averaging procedure, we provide a stepwise algorithm for combined model computation.
Specify a set of submodels satisfying Assumptions 1–6 computed either on one or on different data frames. Compute a weight vector by solving the optimization problem Eq. (10). Implement combination of submodels by Eq. (4) using the weight vector from Step 2. Calculate a point forecast using the combined model from Step 3. Compute an interval forecast by Eq. (11) with the number of degrees of freedom computed by Eq. (13).
As it was previously noted, the number of degrees of freedom for the combined model, calculated from Eq. (13) is not exact, and to some extent depends on unbiased estimations of variances and covariances of true errors of the submodels under consideration. Let us show this dependence graphically. To do this, we will conduct a simulation experiment. Suppose we average across two regression models, which model
As it can be seen from Fig. 1a–d, the distribution of the number of degrees of freedom for the combined model has a positive skewness for
a. Probability density of the number of degrees of freedom given 
a. Probability density of the number of degrees of freedom given 
It should also be noted that even if the variance-covariance matrix of true errors of weighted submodels is known, the number of degrees of freedom for the combined model would still not be a constant, but in some way it would depend on the elements
Figure 2a and b show the probability densities of the number of degrees of freedom for constant covariances and variances of errors and changing
As can be seen from Fig. 2a and b with the known variance-covariance matrix, the number of degrees of freedom for the combined model is not subject to such volatility as in the case of unknown variances and covariances of true errors. The distribution density in this case has almost a symmetrical form, regardless of the values of the weight coefficients.
Thus, under uncertainty conditions concerning the variance-covariance matrix of true errors, an accurate interval forecast for the combined model can be calculated using Bayesian statistics. The idea is to obtain a marginal distribution of the probability density of the true errors for the combined model, taking into account the probability distributions of the variances and the covariances of the true errors of the submodels. This marginal distribution can be represented as shown below:
where simplex
As it was said before, the variance-covariance matrix of true errors of submodels can be modeled using the Wishart distribution. However, in this case the problem lies in the fact that when combining models each element of this matrix has its own number of degrees of freedom, what makes the known functional form of distribution inapplicable. Technically it is possible to derive an analytic expression for the probability distribution of the variance-covariance matrix for a different number of degrees of freedom for each element, but in this paper this derivation was omitted due to the lack of significant contribution to the accuracy and speed of the calculations performed. To calculate the confidence interval for the forecasts of the combined model, it is recommended to use the numerical method. Below we present a step-by-step algorithm for calculating the interval forecast using this method.
Perform the Cholesky decomposition of an unbiased estimate of the variance-covariance matrix of submodels’ errors FOR Set WHILE
Generate a realization of matrix COV with one degree of freedom according to the following formula:
where Repeat Step a) until In case Go to the next ENDDO. Sum the obtained realizations of ENDDO. The desired probability distribution for the predicted value will be a simple average of scaled t-distributions with
a. Empirical probability of 
Dependence of the number of degrees of freedom for combined model on the weight coefficients for two submodels.
Dependence of the number of degrees of freedom for combined model on the weight coefficients for three submodels.
However, it should be noted that even when using a simple calculation of the number of degrees of freedom for the combined model by Eq. (13), the interval forecast turns out to be sufficiently reliable. To verify this statement, we will perform a simulation experiment. As in the previous case, suppose that two regression models that model
Figure 3a–d show the results of the simulation experiment, in particular the empirical probabilities of predicted value
As can be seen from Fig. 3a–d, the use of
Next, we study the form of the dependence of the number of degrees of freedom for combined model on the values of the weight coefficients. First, we consider a two-dimensional case, when averaging occurs only across two submodels. Then the weight coefficient
In Fig. 4 we present a graph of dependence of the number of degrees of freedom for combined model on
In Fig. 4 one can trace a distinct nonlinear asymmetric logistic function. Hence we note that for
Next, consider the three-dimensional case, when three submodels are averaged. The weight coefficients
The graph in Fig. 5 confirms the earlier statement about the greater “priority” of a smaller number of degrees of freedom over a larger one, since it has a distinct asymmetric shape. For the number of degrees of freedom to approach
This paper has revealed the properties of the formula of the number of degrees of freedom for the combined model, constructed according to an arbitrary averaging method. We notice that the number of degrees of freedom depends not only on the variance-covariance matrix of true errors of submodels but also on values of the factors involved, what we presented in several graphs. However, despite the presence of the uncertainty concerning errors variances and covariances analyzed formula yields reliable and consistent results, what was confirmed by the simulation study. We also investigate the dependence of the number of degrees of freedom on the value of a weight coefficient, which appears to give more priority to the number of degrees of freedom of the submodel, constructed on the shortest data frame. Therefore, the analyzed formula is needed in a lot of real life situations, since even if a researcher decides to average across data frames of lengths twenty and one hundred, just using the normal distribution for obtaining a confidence interval will return a significantly underestimated interval forecast. The same underestimation will happen also for a simple average of degrees of freedom of averaged submodels, although this method can be considered as a significant improvement compared to using the normal distribution. Thus, we recommend that one use analyzed formula in cases when the shortest considered data frame has fewer than thirty observations and want to encourage econometricians to resort to correct calculation of the number of degrees of freedom when dealing with model averaging methods.
Footnotes
Acknowledgments
This research was funded by Plekhanov Russian University of Economics.
Conflict of interest
Authors declare that they have no conflict of interests.
