On Bayesian model selection for INGARCH models viatrans-dimensional Markov chain Monte Carlo methods

Abstract

There is an increasing interest in models for discrete valued time series. Among them, the integer autoregressive conditional heteroscedastic (INGARCH) is a model that has found several applications. In the present article, we study the problem of model selection for this family of models. Namely we consider that an observation conditional on the past follows a Poisson distribution where its mean depends on its past mean values and on past observations. We consider both linear and log-linear models. Our purpose is to select the most appropriate order of such models, using a trans-dimensional Bayesian approach that allows jumps between competing models. A small simulation experiment supports the usage of the method. We apply the methodology to real datasets to illustrate the potential of the approach.

Keywords

INGARCH models Observation-driven model trans-dimensional Markov chain Monte Carlo pseudopriors

1 Introduction

Discrete valued time series data occur in several disciplines, as for example, finance, criminology, epidemiology, sports just to name a few. There is an increasing interest for such models, see for example the recent book of Weiß (2018).

Models for univariate count time series can be split into two main categories. The first one is known as parameter-driven models introduced by Zeger (1988) where the time autocorrelation comes from an underlying latent process for the mean of the discrete process. While such approaches allow to connect the models to the well-known time series models for continuous data, the latent process is hard to be estimated and hence application of such models, despite their flexibility, is limited. The second category is the so-called observation-driven models where the current observation is related to the past observations. This allows for easier construction and estimation of the model, but due to the discreteness and the positivity of the counts, special treatment is needed. In this category, popular models are the INteger AutoRegressive (INAR) models of Al-Osh and Alzaid (1987) and their variants. In this class of models, the INteger AutoRegressive Conditional Heteroscedastic model (INGARCH) plays an important role. The model was introduced by Ferland et al.(2006)Ferland, Latour, and Oraichi and developed further in Fokianos et al.(2009)Fokianos, Rahbek, and Tjøstheim. The model has a feedback mechanism for the mean process which is related deterministically with its past values together with past observations. INGARCH models and their extended versions have been proven to be very successful in modelling the volatility of count time series.

In this article, we consider the INGARCH models proposed by Ferland et al.(2006)Ferland, Latour, and Oraichi, Fokianos et al.(2009)Fokianos, Rahbek, and Tjøstheim and Fokianos and Tjøstheim (2011). In both, the conditional distribution of the observed value at time $t$ denoted as $Y_{t}$ given the past values is a Poisson distribution. The model has been extended for the cases of other conditional distributions for $Y_{t}$ such as a negative binomial, a generalized Poisson, a COM-Poisson distribution and a zero-inflated Poisson by Zhu (2011); Zhu(2012a); Zhu(2012b); Zhu(2012c) respectively. While INGARCH type models have gained interest, the problem of selecting the order of the terms in the specification of the models is not developed. Note that while we treat only the Poisson case here, extension to other INGARCH models is straightforward.

We aim at contributing to this direction by proposing a model selection approach for INGARCH models. We consider both the cases of linear and log-linear specifications of the mean. We propose a Bayesian approach based on a trans-dimensional MCMC approach. At the same time, we describe Bayesian estimation for INGARCH models.

The remainder of the article is organized as follows. In Section 2, INGARCH models are briefly introduced. MCMC methodologies for the reversible jump and other methods are discussed in Section 3. In Section 4, we present a simulation for a number of linear and log-linear INGARCH models, to examine how well the proposed approach can identify the correct model. Finally, in Section 5, we illustrate the method using real datasets. Concluding remarks can be found in Section 6.

2 INGARCH models

2.1 Linear INGARCH models

In general, we consider observations ${Y_{t}} \in 0, 1, \dots$ and $λ_{t}, t \in Z$ be a sequence of a mean process. First of all, we will consider that the conditional distribution of $Y_{t}$ given the past values is a Poisson distribution with mean $λ_{t}$ which is linked linearly with its past values and past observations. Namely we assume that

Y_{t} ∣ F_{t - 1} \sim Poisson (λ_{t}),

(2.1)

where $F_{t - 1}$ is the $σ$ ’ field generated by ${Y_{s} : s \leq t - 1}$ and

λ_{t} = b_{0} + \sum_{i = 1}^{p} α_{i} λ_{t - i} + \sum_{j = 1}^{q} b_{j} y_{t - j},

(2.2)

where $b_{0}$ , $b_{i}$ and $α_{j}$ are assumed to be positive for $i = 1, \dots, p$ , $j = 1, \dots, q$ and initial values $Y_{0}$ and $λ_{0}$ are fixed. $p$ and $q$ are positive integers that determine the order of the model. We denote the model as INGARCH( $p, q$ ). In addition, considering Poisson process, $E [Y_{t} ∣ F_{t - 1}] = Var [Y_{t} ∣ F_{t - 1}] = λ_{t}$ .

Necessary and sufficient condition for stationarity is that

\sum_{i = 1}^{p} α_{i} + \sum_{j = 1}^{q} b_{j} 〈1, w i t h α_{i}, b_{j}〉 0, f o r i = 1, \dots, p, j = 1, \dots, q,

(2.3)

given by Ferland et al.(2006)Ferland, Latour, and Oraichi.

Maximum likelihood estimation for this model has been discussed by Ferland et al.(2006)Ferland, Latour, and Oraichi and Fokianos et al.(2009)Fokianos, Rahbek, and Tjøstheim. Conditional likelihood function of the observed data $Y_{t}$ , $t = 1, \dots, T$ under the assumption that $Y_{t} ∣ F_{t - 1} \sim Poisson (λ_{t})$ is given by

L (y ∣ θ) = \prod_{t = 1}^{T} \frac{e^{- λ_{t}} λ_{t}^{y_{t}}}{y_{t}!}

(2.4)

where $λ_{t}$ is expressed via (2.2) and $θ = (θ_{1}, θ_{2}, \dots, θ_{p + q + 1}) = (α_{1}, \dots, α_{p}, b_{0}, b_{1}, \dots, b_{q})$ is the vector of the unknown parameters and $y_{t}$ the observed value at time $t$ . We also assume that $y_{0}$ and $λ_{0}$ are known values.

2.2 Log-linear INGARCH models

Since $λ_{t}$ has to be positive, the linear model constrains the parameters in order to achieve this. A modification of the linear INGARCH model is the log-linear INGARCH model introduced by Fokianos and Tjøstheim (2011), assuming

Y_{t} ∣ F_{t - 1} \sim Poisson (v_{t} = log (λ_{t})),

(2.5)

where

v_{t} = b_{0} + \sum_{i = 1}^{p} α_{i} v_{t - i} + \sum_{j = 1}^{q} b_{j} log (y_{t - j} + 1) .

(2.6)

In this model, parameters $b_{0}$ , $α_{i}$ , $b_{j}$ take values in $ℝ$ and both negative and positive correlations can occur. Certainly other specifications are possible. In particular, one may consider a model for the log-mean process by introducing $log (Y_{t - j} + u)$ , where $u$ is a constant. Fokianos and Tjøstheim (2011) presented results of the data analysis that do not indicate any gross deviations in terms of the mean square error of residuals for values of $u$ varying from 1 to 10 with a step equal to 0.5. In our study, we consider that $u = 1$ and parameter $λ_{t}$ is defined in (2.5) and (2.6). This model is more computationally demanding because values of $λ_{t}$ increase or decrease fast according to the parameters’ values $α_{i}$ and $b_{j}$ .

Ergodic properties of this model have been studied in Fokianos and Tjøstheim (2011). Stationarity conditions have been examined by Fokianos and Tjøstheim (2011), and they have been used by Liboschik et al.(2015)Liboschik, Fokianos, and Fried. Based on this work, we consider the stationarity conditions to be

| α_{i} |, | b_{j} | < 1, | \sum_{i = 1}^{p} α_{i} + \sum_{j = 1}^{q} b_{j} | < 1, i = 1, \dots, p,, j = 1, \dots, q .

(2.7)

2.3 Bayesian formulation for INGARCH(

p, q

) model

Denote the available data up to time point $T$ as $y$ , that is, $y = (y_{1}, \dots, y_{color green T})$ . Using Bayes theorem, posterior density $p (θ ∣ y)$ for the parameter vector $θ = (θ_{1}, . . ., θ_{p + q + 1})$ can be derived. We have as usual that

\begin{matrix} p (θ ∣ y) & \propto & L (y ∣ θ) p (θ) \\ \propto & L (y ∣ θ) p (α_{1}) \dots p (α_{p}) p (b_{0}) \dots p (b_{q}), \end{matrix}

where $L (y ∣ θ)$ is the likelihood function and $p (θ)$ is the prior distribution. Here, we assume independent priors for all the parameters. We use MCMC method to obtain samples from the posterior distribution. Based on Bayes theorem and under the assumption of stationarity, we choose as prior distributions truncated normals for the parameters $α_{i}$ and $b_{j}$ where $0 < \sum_{i = 1}^{p} α_{i} + \sum_{j = 1}^{q} b_{j} < 1$ or $- 1 < α_{i}, b_{j} < 1$ and $- 1 \leq \sum_{i = 1}^{p} α_{i} + \sum_{j = 1}^{q} b_{j} < 1$ for the linear and the log-linear case accordingly and a Gamma(1,2) prior distribution for the parameter $b_{0}$ , and after Metropolis-Hasting steps, we reject values that they do not satisfy stationarity conditions.

Given that $Y_{t} ∣ F_{t - 1} \sim Poisson (λ_{t})$ , where $λ_{t}$ is expressed via equations (2.2) or (2.6), posterior density of parameter vector $θ = (α_{1}, . . ., α_{p}, b_{0}, b_{1}, . . . b_{q})$ can be expressed as

\begin{matrix} p (θ ∣ y) & \propto & \prod_{t = 1}^{T} e^{- λ_{t}} λ_{t}^{y_{t}} \times e^{\frac{- 1}{2 σ_{α_{1}}^{2}} (α_{1} - μ_{α_{1}})^{2}} \times . . . \times e^{\frac{- 1}{2 σ_{α_{p}}^{2}} (α_{p} - μ_{α_{p}})^{2}} \times \\ e^{\frac{- 1}{2 σ_{b_{1}}^{2}} (b_{1} - μ_{b_{1}})^{2}} \times . . . \times e^{\frac{- 1}{2 σ_{b_{q}}^{2}} (b_{q} - μ_{b_{q}})^{2}} \times b_{0}^{γ - 1} e^{\frac{b_{0}}{β}} \times I (θ \in S), \end{matrix}

(2.8)

where

S = {α_{1}, \dots a_{p}, b_{1}, \dots, b_{q}} : = \{\begin{matrix} \sum_{i = 1}^{p} a_{i} + \sum_{j = 1}^{q} b_{j} \in (0, 1) \\ | α_{i} |, | b_{j} | < 1, | \sum_{i = 1}^{p} α_{i} + \sum_{j = 1}^{q} b_{j} | < 1, \end{matrix}

for the linear and the log-linear cases accordingly.

We use $p + q + 1$ independent Metropolis steps (Tierney (1994); Chib and Greenberg (1995)) for all parameters $θ_{i}$ , and as proposal distribution $q (θ^{(can)} ∣ θ_{1}, θ_{2}, . . ., θ_{p + q + 1})$ we use $N (θ, σ^{2})$ distributions. The choice for this variance $σ^{2}$ is important for the convergence of the chain. Considering the conditions of stationarity, the admissible interval for our work is $(0, 1)$ and $(- 1, 1)$ for the linear and the log-linear cases respectively. In the work of Fokianos and Tjøstheim (2011), the restrictions of the parameters have been discussed, because they provide different correlation structure and they are deterministic for the calculation of the maximum likelihood estimators. Acceptance probability at the $j$ th iteration is calculated by

α = \min \{1, \frac{\prod_{t = 1}^{T} e^{- λ_{t}^{(can)}} {(λ_{t}^{(can)})}^{y_{t}} p (θ_{i}^{(can)})}{\prod_{t = 1}^{T} e^{- λ_{t}^{(j)}} {(λ_{t}^{(j)})}^{y_{t}} p (θ_{i}^{(j)})}\},

(2.9)

where $λ^{(can)}$ implies that $λ$ has been calculated based on the candidate value for the parameter vector. Marginal posterior means and posterior variances are used on trans-dimensional MCMC method for the definition of pseudopriors. A detailed description of the MCMC algorithm can be found in the Appendix A.

2.4 Model selection for discrete valued time series

While the literature on discrete valued time series is increasing very fast, there are very few papers that consider the problem of model selection, namely that of selecting the order of the model. In simple INAR( $p$ ) models, this is merely the order of autoregressive terms, in our INGARCH setting this relates to the order of $p$ and $q$ to be used. An efficient reversible jump Markov chain Monte Carlo (RJMCMC) algorithm has been constructed by Enciso-Mora et al.(2009)Enciso-Mora, Neal, and Subba Rao for moving between INARMA processes of different orders. See also work for model selection in INAR( $p$ ) models in Bu and McCabe (2008).

In addition, Alzahrani et al.(2018)Alzahrani, Neal, Spencer, McKinley, and Touloupou studied the model selection problem among INAR and linear INGARCH models by using particle filtering MCMC method. Wang et al.(2020)Wang, Wang, and Zhang used a penalized conditional maximum likelihood to estimate the parameters of an INGARCH model. Note that this approach while can help to identify useful terms in the model, it does not select between competing models.

2.5 Prediction and Model averaging

Predictions in time varying volatility in linear and log-linear INGARCH models can be obtained from the output of the MCMC. McCabe and Martin (2005) studied methods for prediction of INAR(1) model. Raftery et al.(1997)Raftery, Madigan, and Hoeting discussed model averaging and the choice of models for Bayesian model averaging. A $p$ -step ahead predictive pmf is defined as

p (y_{T + p} ∣ y) = \sum_{m \in M} p (y_{T + p} ∣ y, m) p (m ∣ y),

(2.10)

where \ensuremath \cal M denotes the countable set of candidate models and $P (y_{T + p} ∣ y, m)$ is the $p$ -step ahead predictive pmf from model $m$ and $P (m ∣ y)$ is the posterior probability of model $m$ calculated in trans-dimensional MCMC. So in our work

p (λ_{T + 1} ∣ y) = \sum_{m \in M} p (λ_{T + 1} ∣ y, m) p (m ∣ y),

(2.11)

which is an average of the posterior predictive distribution under each model weighted by their posterior model probabilities. For each parameter vector, we calculate at each iteration $λ_{T + 1}$ . Each sampled point should be taken with probability $p (m ∣ y)$ . Then we obtain the sample of $p (λ_{T + 1} ∣ y)$ by weighting all samples of $p (λ_{T + 1} ∣ m, y)$ by the corresponding $p (m ∣ y)$ .

3 Trans-dimensional MCMC

Assume that we have a countable set of models denoted by \ensuremath \cal M for a given set of data $y$ . Let consider a model indexed by $m \in M$ and a vector of unknown parameters for model $m$ denoted as $θ_{m} \in Θ_{m}$ . Two models, say $m_{1}$ and $m_{2}$ , can have different dimensions. Let $p (m)$ be the prior distribution for each model $m$ .

Then posterior probability for model $m \in M$ is given by

p (m ∣ y) = \frac{p (m) L (y ∣ m)}{\sum_{m^{'} \in M} p (m^{'}) L (y ∣ m^{'})}

(3.1)

where $L (y ∣ m)$ is the marginal likelihood for model $m$ calculated by the integral $\int L (y ∣ θ_{m}) p (θ_{m} ∣ m) d θ_{m}$ and $p (θ_{m} ∣ m)$ is the prior distribution of the parameter vector $θ_{m}$ . There are many approaches for model selection problem using Bayes factor of model $m_{i}$ against model $m_{j}$ . Basically, there are two interpretations for Bayes factor. First, Bayes factor is the ratio of two marginal likelihoods, representing how well the data are fitted under each model. More analytically,

B_{ij} = \frac{p (m_{i} ∣ y) p (m_{j})}{p (m_{j} ∣ y) p (m_{i})} = \frac{\int_{Θ_{m_{i}}} p (y ∣ m_{i}, θ_{m_{i}}) p (θ_{m_{i}} ∣ m_{i}) d θ_{m_{i}}}{\int_{Θ_{m_{j}}} p (y ∣ m_{j}, θ_{m_{j}}) p (θ_{m_{j}} ∣ m_{j}) d θ_{m_{j}}} .

(3.2)

Second we could say that posterior model odds can be represented as

\frac{p (m_{i} ∣ y)}{p (m_{j} ∣ y)} = B_{ij} \times \frac{p (m_{i})}{p (m_{j})} .

(3.3)

The integrals required for Bayes factor are analytically intractable. Consequently, many methods have been proposed to approximate Bayes factor. Methods introduced by Green (1995), Carlin and Chib (1995) and Dellaportas et al.(2002)Dellaportas, Forster, and Ntzoufras, generate observations from the joint posterior distribution from $(m, θ_{m})$ for estimating $p (θ_{m} ∣ m)$ .

RJMCMC is a more complicated method in our case because we have a certain number of models, so an appropriate transformation is necessary to define each time the parameter spaces in order to make jumps between models. We do not pursue this in the present article. A method introduced by Carlin and Chib (1995) proposes jumps between all models. The full conditional posterior density for each model is given by

p (m ∣ θ_{m}, y) = \frac{A_{m}}{\sum_{m^{'} \in M} A_{m^{'}}},

(3.4)

where numerator is given by equation

A_{m} = p (y ∣ θ_{m}, m) \prod_{m^{'} \in M} p (θ_{m^{'}} ∣ y) p (m^{'}) .

(3.5)

There are alternative constructions proposed by Bayarri et al.(2012)Bayarri, Berger, Forte, Garcí a-Donato, et al. in the case where the models are nested. Metropolized Carlin and Chib method proposes a move from one model to some other and examine the acceptance or rejection of this proposal. Lodewyckx et al.(2011)Lodewyckx, Kim, Lee, Tuerlinckx, Kuppens, and Wagenmakers suggested the product space method both in the cases of nested and non-nested models.

For example, let us consider three models, namely INGARCH(1,1), INGARCH(1,2) and INGARCH(2,1) denoted as $m_{1}, m_{2}$ and $m_{3}$ . The parameter vectors are $θ_{m_{1}} = (α_{1}, b_{0}, b_{1})$ , $θ_{m_{2}} = (α_{1}, α_{2}, b_{0}, b_{1})$ and $θ_{m_{3}} = (α_{1}, b_{0}, b_{1}, b_{2})$ . The three parameter vectors are combined in one parameter vector $θ = (θ_{m_{1}}, θ_{m_{2}}, θ_{m_{3}})$ which takes any value of the Cartesian product of the three models’ parameter spaces $Θ_{m_{1}} \times Θ_{m_{2}} \times Θ_{m_{3}}$ . For this method, and also for the method introduced by Dellaportas et al.(2002)Dellaportas, Forster, and Ntzoufras, there is no linked densities and also the calculation of the Jacobian is not needed. Those methods require proposal densities of higher dimensions called pseudopriors.

The crucial problem here is the matching of dimensions of the three models. For doing so we consider that if model $m_{1}$ is visited then the corresponding parameter vector $θ_{m_{1}}$ is connected to the model likelihood. So this parameter vector can be updated from posterior distributions. If $m_{1}$ is not visited then the parameter vector is disconnected from the likelihood and is generated from pseudopriors. A similar reasoning can be followed for the models $m_{2}$ and $m_{3}$ . For example, for model $m_{1}$ the joint distribution is defined by

\begin{matrix} p (y, θ ∣ m_{1}) & = & p (y ∣ θ, m_{1}) p (θ ∣ m_{1}) \\ = & p (y ∣ θ, m_{1}) p (θ_{m_{1}} ∣ m_{1}) p (θ_{m_{2}} ∣ m_{1}) p (θ_{m_{3}} ∣ m_{1}), \end{matrix}

where $p (θ_{m_{2}} ∣ m_{1}), p (θ_{m_{3}} ∣ m_{1})$ are proper distributions integrating to 1. Then for model $m_{i}$ , $i = 1, 2, 3$ we sample from

p (m_{i} ∣ θ, y) \propto \{\begin{matrix} L (y ∣ θ, m_{1}) p (θ_{m_{1}} ∣ m_{1}) p (θ_{m_{2}} ∣ m_{1}) p (θ_{m_{3}} ∣ m_{1}) p (m_{1}), & for i = 1 \\ L (y ∣ θ, m_{2}) p (θ_{m_{2}} ∣ m_{2}) p (θ_{m_{1}} ∣ m_{2}) p (θ_{m_{3}} ∣ m_{2}) p (m_{2}), & for i = 2 \\ L (y ∣ θ, m_{3}) p (θ_{m_{3}} ∣ m_{3}) p (θ_{m_{2}} ∣ m_{3}) p (θ_{m_{1}} ∣ m_{3}) p (m_{3}), & for i = 3 \end{matrix}

(3.6)

where

p (θ_{m_{1}} ∣ m_{i}) \propto \{\begin{matrix} p (y ∣ θ_{m_{1}}, m_{1}) p (θ_{m_{1}} ∣ m_{i}) & if i = 1 \\ p (θ_{m_{1}} ∣ m_{i}) & if i = 2 \\ p (θ_{m_{1}} ∣ m_{i}) & if i = 3 \end{matrix}

(3.7)

and $p (m_{1}), p (m_{2}), p (m_{3})$ are prior model probabilities. Those methods are deterministic for the time that the algorithm visits and stays at each model. Finally, posterior model probabilities for model $m \in M$ are estimated by

\hat{p} (m ∣ y) = \frac{\sum_{i = 1}^{B} I (m^{(i)} = m)}{B}

(3.8)

where $B$ is the total number of iterations and $m^{(i)}$ denotes the model we visit in the $i$ th iteration. We will apply this trans-dimensional approach to jump between different INGARCH models. A detailed description of the trans-dimensional MCMC can be found in Appendix B.

4 Simulation study

In this section, we present results from a small simulation experiment aiming at examining whether our approach can identify the correct structure of the time series that generated the data. We make use of sample size $n = 200$ , and we consider 12 competing models from the linear and log-linear INGARCH families. In our simulation study, we consider two criteria proposed by Liang et al.(2008)Liang, Paulo, Molina, Clyde, and Berger. The first is a general criterion and the second is a criterion about model selection consistency. Namely:

Criterion 1: Each conditional prior must be proper (integrating to one) and cannot be arbitrarily vague in the sense of almost all of its mass being outside any believable compact set.

Criterion 2: (Model selection consistency) If data $y$ have been generated from model $m_{i}$ , then posterior model probability of model $m_{i}$ should converge to 1 as the sample size n $⟶$ $\infty$ .

If Criterion 2 is not accomplished a different choice of prior distributions is necessary. Recently, effort has been devoted to the choice of model priors. Bayarri et al.(2012)Bayarri, Berger, Forte, Garcí a-Donato, et al. and Consonni et al.(2018)Consonni, Fouskakis, Liseo, Ntzoufras, et al. proposed different choices of priors in objective Bayesian analysis. In our case, for the accomplishment of Criterion 1, we suggest as pseudopriors normal densities obtaining mean and variance after ‘pilot runs’ of MCMC for each model and considering that those pseudopriors are proper. More specifically, we generate 100 datasets of size $n = 200$ with randomly chosen parameter values for each model, and we run a trans-dimensional Markov chain with length 10 000. According to Criterion 2 posterior model probability of model from which we generated the data, must be close to 1 while the probabilities must be small for the other models. Furthermore, the choice of prior model probabilities is deterministic for the definition of the posterior model probabilities. When we have only two models, bisection algorithm is a way to choose and optimize the prior model probabilities (Lodewyckx et al.(2011)Lodewyckx, Kim, Lee, Tuerlinckx, Kuppens, and Wagenmakers).

We present the results of averages based on 100 replications for each model. The columns correspond to the selected model out of 10 models, while the rows are the true models that generated the data. Note that in addition to the 10 INGARCH models we have two additional models different from INGARCH type. The first model was an INAR(2) model of the form $Y_{t} = α_{1} \circ Y_{t - 1} + α_{2} \circ Y_{t - 2} + R_{t}$ , $t = 1, \dots, T$ where $\circ$ is the binomial thinning operator and $R_{t}$ are innovations terms that follow a Poisson distribution, while the second model was an INAR(1) model of the form $Y_{t} = α_{1} \circ Y_{t - 1} + R_{t}$ , $t = 1, \dots, T$ where now the innovation term depends on some covariate, namely $R_{t} \sim Poisson (exp (γ z_{t}))$ with $Z_{t} \sim Ber (ζ)$ where $γ = 2$ and $ζ = 0.2$ . For example, 91.5% of the times that we have generated the data from model $M_{1}$ we found this model as the chosen one.

Looking at the results from Table 1, we can see that even for small sample size we can identify the correct structure with great success for most of the models. As expected in most cases when we do not identify the model that generated the data, the preferred model is one close (in the sense of the parameters set-up) which is reasonable.

Overall, the experiment shows that the method can identify the underlying model with good success.

Table 1

Averages of posterior model probabilities after 100 samples of data from each model (multiplied by 100) where $M_{i}$ is the class of linear and ${LM}_{i}$ of log-linear INGARCH models accordingly. Our approach correctly detects the model with high probability

Model	$M_{1}$	$M_{2}$	$M_{3}$	$M_{4}$	$M_{5}$	${LM}_{1}$	${LM}_{2}$	${LM}_{3}$	${LM}_{4}$	${LM}_{5}$
$M_{1}$ = INGARCH(0,1)	91.50	4.18	0.19	2.16	0.03	1.75	0.10	0.06	0.01	0.01
$M_{2}$ = INGARCH(1,1)	0.10	78.73	0.39	18.91	0.15	0.79	0.45	0.21	0.21	0.07
$M_{3}$ = INGARCH(1,2)	0.01	0.56	98.27	0.09	0.22	0.01	0.08	0.36	0.05	0.36
$M_{4}$ = INGARCH(2,1)	0.00	2.82	0.60	92.80	0.47	2.50	0.42	0.07	0.26	0.05
$M_{5}$ = INGARCH(2,2)	0.00	1.19	3.22	0.27	82.98	0.26	10.24	1.08	0.18	0.56
${LM}_{1}$ = INGARCH(1)	0.00	0.00	0.00	0.00	0.00	99.05	0.41	0.28	0.15	0.11
${LM}_{2}$ = INGARCH(1,1)	0.00	0.01	0.01	0.00	0.00	2.81	91.23	2.19	2.77	0.97
${LM}_{3}$ = INGARCH(1,2)	0.00	0.01	0.00	0.00	0.00	0.00	1.92	89.13	7.63	1.29
${LM}_{4}$ = INGARCH(2,1)	0.03	0.09	0.01	0.04	0.01	0.05	9.46	8.17	72.88	9.27
${LM}_{5}$ = INGARCH(2,2)	0.00	0.00	0.01	0.00	0.04	0.02	0.77	1.67	9.95	87.54
IN2=INAR(2)	4.67	31.96	4.20	13.19	1.42	25.24	14.30	1.71	3.12	0.19
INexp=Exponential INAR(1)	0.15	41.50	3.10	13.01	1.32	8.05	26.81	1.33	4.54	0.19

5 Applications

5.1 Polio data

The data consist of monthly counts of poliomyelitis cases in the United States from 1970 to 1983 (168 observations) reported by the Centres for Disease Control and discussed in Zeger (1988) among others. Figure 1 presents the original series and the autocorrelation function of the series. In trans-dimensional method, we consider jumps between five linear and five log-linear INGARCH model.

Figure 1

Series of polio dataset (left) and its ACF (right)

Table 2 presents estimators for each of five linear and log-linear INGARCH( $p, q$ ) models. The proposal densities are normal distributions where the value for the variance is very small and was selected to achieve acceptance rate between 25% and 30%, namely it has a value such as $0.01 < σ^{2} < 0.025$ for the parameters $α_{1}, . . . α_{p}, b_{1}, . . ., b_{q}$ and 0.04 for the parameter $b_{0}$ in linear case and 0.01 in log-linear case accordingly. The chain was run for 10 000 to 20 000 iterations depending on the model. A burn-in part of 2 500 iterations was discarded and the resulting samples were checked for convergence by using the test proposed by Geweke (1992). Z-scores are presented in Table 3 and indicate that the convergence has been achieved.

Table 2

Estimated posterior means and standard deviations(sd) for the competing models

Model		linear					log-linear
Parameters		$b_{0}$	$α_{1}$	$b_{1}$	$α_{2}$	$b_{2}$	$b_{0}$	$α_{1}$	$b_{1}$	$α_{2}$	$b_{2}$
INGARCH(0,1)	mean	0.86	$-$	0.37	$-$	$-$	0.04	$-$	0.47	$-$	$-$
	sd	0.09	$-$	0.07	$-$	$-$	0.03	$-$	0.07	$-$	$-$
INGARCH(1,1)	mean	0.62	0.20	0.35	$-$	$-$	0.04	0.04	0.44	$-$	$-$
	sd	0.15	0.12	0.07	$-$	$-$	0.03	0.15	0.08	$-$	$-$
INGARCH(1,2)	mean	0.67	0.11	0.34	$-$	0.08	0.05	$-$ 0.47	0.39	$-$	0.35
	sd	0.12	0.09	0.07	$-$	0.06	0.05	0.32	0.08	$-$	0.19
INGARCH(2,1)	mean	0.54	0.19	0.33	0.09	$-$	0.07	0.06	0.54	$-$ 0.52	$-$
	sd	0.15	0.11	0.07	0.08	$-$	0.07	0.14	0.08	0.11	$-$
INGARCH(2,2)	mean	0.56	0.11	0.31	0.09	0.08	0.08	$-$ 0.49	0.47	$-$ 0.54	0.42
	sd	0.14	0.09	0.07	0.05	0.08	0.08	0.22	0.09	0.12	0.16

Table 3

Geweke's convergence z-scores for the parameters of five linear and five log-linear INGARCH models

Model	linear					log-linear
Parameters	$b_{0}$	$α_{1}$	$b_{1}$	$α_{2}$	$b_{2}$	$b_{0}$	$α_{1}$	$b_{1}$	$α_{2}$	$b_{2}$
INGARCH(0,1)	0.454	$-$	0.128	$-$	$-$	$-$ 0.081	$-$	$-$ .444	$-$	$-$
INGARCH(1,1)	$-$ 0.677	0.737	$-$ 0.842	$-$	$-$	$-$ 0.455	$-$ 0.129	$-$ 0.481	$-$	$-$
INGARCH(1,2)	0.182	$-$ 0.117	$-$ 0.854	$-$	1.178	1.292	$-$ 0.109	$-$ 0.192	$-$	0.191
INGARCH(2,1)	1.644	0.642	$-$ 1.281	$-$ 0.848	$-$	$-$ 1.163	$-$ 0.394	0.057	0.478	$-$
INGARCH(2,2)	1.724	$-$ 0.849	$-$ 0.404	0.283	$-$ 0.733	0.644	0.708	0.857	0.870	0.825

For the models where parameters satisfy stationarity conditions, we apply trans-dimensional MCMC method of Lodewyckx et al.(2011)Lodewyckx, Kim, Lee, Tuerlinckx, Kuppens, and Wagenmakers and posterior probabilities are presented in Table 4. The INGARCH(1,1) model is the one mostly visited which indicates that this is the selected model. Note that the 4 best models are of the linear type while the log-linear models have much smaller posterior probabilities.

Bayes factors are also reported in Table 4. They have been calculated with respect to the last (and more complicated) model. They also support simple models.

Table 4

Posterior probabilities and Bayes factor for the competing models

Model	Posterior probability	Bayes Factor
INGARCH(1,1)	0.4564	3245.066
INGARCH(0,1)	0.3741	2659.801
INGARCH(1,2)	0.0641	455.838
INGARCH(2,1)	0.0588	418.188
LINGARCH(0,1)	0.0341	242.644
INGARCH(2,2)	0.0085	60.462
LINGARCH(1,2)	0.0023	16.199
LINGARCH(2,1)	0.0016	11.527
LINGARCH(2,2)	0.0001	1.000

Finally, considering that we have posterior model probabilities for all models, we apply a Bayesian model-averaging (BMA) procedure. Based on posterior model probabilities, as estimated via trans-dimensional MCMC, we derive the density of predictive volatilities. Consequently, we calculate $p (λ_{T + 1} ∣ m, y)$ , that is, the mean of the next unseen observation, weighted at its posterior model probability. We present densities for the four linear models with the higher posterior model probabilities in Figure 2. As expected, the BMA estimate summarizes the individual densities.

Figure 2

Posterior density for $λ_{T + 1}$ for polio data

5.2 Campylobacterosis data

The data (e.g., Liboschik et al.(2015)Liboschik, Fokianos, and Fried) consist of number of campylobacterosis cases in the North of Québec in Canada. Figure 3 presents the original series and the autocorrelation function of the series. Similarly, as for the polio dataset, we present parameters estimation and Z-scores concerning and checking the convergence. Considering the same conditions of stationarity as for the polio dataset, we present in Table 7 results after the trans-dimensional MCMC algorithm. In this case, we observe that the 4 best models are not only linear but also they have a complicated structure. Furthermore, the simple model INGARCH(1,0) is the less preferable. In Tables 5 and 6 one can see the posterior means and standard deviations as well as the Geweke's convergence diagnostics for the competing models.

Figure 3

Series of campylobacterosis dataset (left) and its ACF (right)

Table 5

Estimated posterior means and standard deviations(sd) for the competing models

Model		linear					log-linear
Parameters		$b_{0}$	$α_{1}$	$b_{1}$	$α_{2}$	$b_{2}$	$b_{0}$	$α_{1}$	$b_{1}$	$α_{2}$	$b_{2}$
INGARCH(0,1)	mean	2.81	$-$	0.67	$-$	$-$	0.68	$-$	0.71	$-$	$-$
	sd	0.44	$-$	0.66	$-$	$-$	0.21	$-$	0.09	$-$	$-$
INGARCH(1,1)	mean	1.82	0.32	0.52	$-$	$-$	0.41	0.23	0.59	$-$	$-$
	sd	0.43	0.08	0.07	$-$	$-$	0.12	0.08	0.07	$-$	$-$
INGARCH(1,2)	mean	1.87	0.27	0.52	$-$	0.05	0.41	0.24	0.60	$-$	$-$ 0.02
	sd	0.45	0.10	0.05	$-$	0.07	0.22	0.21	0.07	$-$	0.30
INGARCH(2,1)	mean	1.64	0.13	0.51	0.23	$-$	0.37	0.15	0.58	0.11	$-$
	sd	0.44	0.08	0.07	0.08	$-$	0.12	0.11	0.07	0.09	$-$
INGARCH(2,2)	mean	1.69	0.10	0.51	0.20	0.05	0.42	$-$ 0.01	0.59	0.13	0.09
	sd	0.41	0.08	0.07	0.09	0.04	0.17	0.30	0.07	0.10	0.20

Table 6

Geweke's convergence z-scores for the parameters of five linear and five log-linear INGARCH models

Model	linear					log-linear
Parameters	$b_{0}$	$α_{1}$	$b_{1}$	$α_{2}$	$b_{2}$	$b_{0}$	$α_{1}$	$b_{1}$	$α_{2}$	$b_{2}$
INGARCH(0,1)	0.037	$-$	$-$ 0.099	$-$	$-$	0.856	$-$	$-$ 1.101	$-$	$-$
INGARCH(1,1)	0.592	$-$ 0.990	1.318	$-$	$-$	$-$ 0.241	$-$ 0.987	1.480	$-$	$-$
INGARCH(1,2)	$-$ 0.057	0.544	$-$ 0.335	$-$	$-$ 0.962	1.722	1.409	$-$ 0.765	$-$	$-$ 0.487
INGARCH(2,1)	$-$ 0.373	$-$ 1.256	$-$ 1.607	1.749	$-$	$-$ 0.909	0.592	$-$ 0.839	0.147	$-$
INGARCH(2,2)	$-$ 0.823	0.431	0.745	$-$ 0.719	$-$ 0.294	1.235	0.316	$-$ 1.543	$-$ 1.478	0.746

Table 7

Posterior probabilities and Bayes factor for the competing models

Model	Posterior probability	Bayes Factor
INGARCH(2,1)	0.5595	576.0052
INGARCH(1,1)	0.2784	286.5797
LINGARCH(1,1)	0.0515	52.9751
INGARCH(2,2)	0.0339	34.8693
LINGARCH(1,2)	0.0222	22.8036
LINGARCH(0,1)	0.0177	18.2094
INGARCH(1,2)	0.0170	17.5422
LINGARCH(2,1)	0.0125	12.9080
LINGARCH(2,2)	0.0064	6.5652
INGARCH(0,1)	0.0009	1.0000

Figure 4

Posterior density for $λ_{T + 1}$ for Campylobacterosis data

Finally, we have again calculated $p (λ_{T + 1} ∣ y)$ for the next unseen observation, weighted at its posterior model probability, depicted in Figure 4).

6 Conclusion

We have developed a trans-dimensional MCMC approach for model selection between a family of Poisson INGARCH type models including linear and log-linear models. This allows to fit the models and select the one that best fits the data. Model selection for discrete valued time series is a topic of less research, and we think that we contribute towards this. Our approach is based on satisfying the stationarity conditions, that is, assuming that the series are stationary. Allowing for combinations of parameters beyond stationarity is possible but the interpretation of the models is more difficult afterwards.

The method that has been discussed by Carlin and Chib (1995) is the most flexible method in our case when we have a number of nested models. This is an important issue because since now there are limited studies about trans-dimensional methods for count data.

Furthermore we note that, although we only analysed methodology by using linear and log-linear INGARCH models when all parameters are positive, our analysis can be extended when parameters are negative in case of log-linear INGARCH models. In addition, the use of other functions for the mean process $λ_{t}$ presented by Fokianos et al.(2009)Fokianos, Rahbek, and Tjøstheim and Fokianos and Tjøstheim (2011) could be examined for computational efficacy.

Finally, in the present article, we treated only Poisson INGARCH models. However, the method can be easily extended to the case of other models of the same type with different conditional distributions that have appeared in the literature.

Appendix A: MCMC for the general INGARCH model

Denote as $θ = (α_{1} \dots, α_{p}, b_{0}, b_{1}, \dots, b_{q})$ the vector with all the parameters. We describe the MCMC algorithm in a general form applicable to both linear and log-linear models. The key distinction is the conditions that need to be satisfied for each model to ensure stationarity.

We use $p + q + 1$ independent Metropolis steps for all parameters $θ_{i}$ , $i = 1, \dots, p + q + 1$ . The algorithm runs as follows:

Initialize $θ^{(0)} = (α_{1}^{(0)}, \dots, α_{p}^{(0)}, b_{1}^{(0)}, \dots, b_{q}^{(0)}, b_{0}^{(0)})$ from truncated normal distributions for $α$ ’s and $b_{1}, \dots, b_{q}$ and a Gamma for $b_{0}$ such as to satisfy the stationarity conditions of the particular model.

For the ( $r + 1$ )th iteration of the algorithm, for each $θ_{i}$ , $i = 1, \dots, p + q + 1$ –

Generate values from proposal densities for $θ_{i}^{(can)} \sim N (θ_{i}, σ_{i}^{2})$ where the tuning parameter $σ_{i}^{2}$ was selected such as to achieve 25% acceptance rate.

–

Update $θ_{i}^{(r)}$ to $θ_{i}^{(r + 1)}$ with probability given by (2.9) and if the candidate value does not violate the stationarity conditions of the model

–

Iterate this updating procedure.

Go to the next iteration or stop if the chain has converged.

Appendix B: The transdimensional MCMC

Consider $s$ candidate models, namely $m_{1}, \dots, m_{s}$ . We denote the parameter vector for model $m$ as $θ_{m}$ . Due to the different number of parameters for each model, they may have different dimension. Also denote the set of indices not having $j$ as $S_{- j} = {1, 2, \dots, j - 1, j + 1, \dots, s}$ . We have obtained from the MCMC for all the models the mean ${\tilde{μ}}_{θ_{ij}}$ and the variance ${\tilde{σ}}^{2}_{θ_{ij}}$ of parameter $θ_{ij}$ which is the $i$ th parameters of the $j$ th model. Then the algorithm runs as follows:

For a number of iterations say $B$ :

For model $m_{k}$ , we generate the parameter vector for this model, say $θ_{m_{k}}$ from the conditional posterior distribution $L (y ∣ θ_{m_{k}}, m_{k}) p (θ_{m_{k}} ∣ m_{k})$ given by (2.8) and the other parameters $θ_{m_{j}}$ , from proper pseudopriors $p (θ_{m_{j}} ∣ m_{j}) = \prod_{i = 1}^{p_{j}} N ({\tilde{μ}}_{θ_{ij}}, {\tilde{σ}}_{θ_{ij}}^{2})$ for $j \in S_{- k}$ where $p_{k}$ denotes the number of parameters for model $k$ .

We repeat this for each candidate model so for all $k = 1, \dots, s$

Calculate posterior probability for model $k$ defined by

\Pr_{k} = p (m_{k} ∣ θ, y) = \frac{L (y ∣ θ_{m_{k}}, m_{k}) p (θ_{m_{k}} ∣ m_{k}) p (m_{k}) \times \prod_{j \in S_{- k}} \prod_{i = 1}^{p_{j}} \frac{1}{\sqrt{2 π} {\tilde{σ}}_{θ_{ij}}} exp (- \frac{(θ_{ij} - {\tilde{μ}}_{θ_{ij}})^{2}}{2 {\tilde{σ}}_{θ_{ij}}^{2}})}{\sum_{w = 1}^{s} L (y ∣ θ_{m_{w}}, m_{w}) p (θ_{m_{w}} ∣ m_{w}) p (m_{w}) \times \prod_{j \in S_{- w}} \prod_{i = 1}^{p_{j}} \frac{1}{{\tilde{σ}}_{θ_{ij}} \sqrt{2 π}} exp (- \frac{(θ_{ij} - {\tilde{μ}}_{θ_{ij}})^{2}}{2 {\tilde{σ}}_{θ_{ij}}^{2}})}

We select the model with larger value $\Pr_{k}$ . This is the model selected at the $i$ th iteration,

Repeat steps 1–4 for $B$ times.

At the end calculate s posterior model probabilities given by (3.8)

Footnotes

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Acknowledgments

This research is co-financed by Greece and the European Union (European Social Fund [ESF]) through the Operational Programme ≪Human Resources Development, Education and Lifelong Learning≫ in the context of the project ‘Strengthening Human Resources Research Potential via Doctorate Research’ (MIS-5000432), implemented by the State Scholarships Foundation (IKY). The second author Dimitris Karlis received a grant from the Research Center of the Athens University of Economics and Business under the project ‘Innovative Publications’.

References

Al-Osh

and Alzaid

(1987) First-order integer-valued autoregressive (INAR (1)) process. Journal of Time Series Analysis , 8, 261–75.

Alzahrani

, Neal

, Spencer

, McKinley

and Touloupou

(2018) Model selection for time series of count data. Computational Statistics & Data Analysis , 122, 33–44.

Bayarri

, Berger

, Forte

and Garc–a-Donato

(2012) Criteria for Bayesian model choice with application to variable selection. The Annals of Statistics , 40, 1550–77.

and McCabe

(2008) Model selection, estimation and forecasting in INAR (p) models: A likelihood-based Markov chain approach. International Journal of Forecasting , 24, 151–62.

Carlin

and Chib

(1995) Bayesian model choice via Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Methodological) , 57, 473–84.

Chib

and Greenberg

(1995) Understanding the Metropolis-Hastings algorithm. The American Statistician , 49, 327–35.

Consonni

, Fouskakis

, Liseo

and Ntzoufras

(2018) Prior distributions for objective Bayesian analysis. Bayesian Analysis , 13, 627–79.

Dellaportas

, Forster

and Ntzoufras

(2002) On Bayesian model and variable selection using MCMC. Statistics and Computing , 12, 27–36.

Enciso-Mora

, Neal

and Subba Rao

(2009) Efficient order selection algorithms for integer-valued ARMA processes. Journal of Time Series Analysis , 30, 1–18.

10.

Ferland

, Latour

and Oraichi

(2006) Integer-valued GARCH process. Journal of Time Series Analysis , 27, 923–42.

11.

Fokianos

, Rahbek

and Tj–stheim

(2009). Poisson autoregression. Journal of the American Statistical Association , 104, 1430–39.

12.

Fokianos

and Tjøstheim

(2011) Log-linear Poisson autoregression. Journal of Multivariate Analysis , 102, 563–78.

13.

Geweke

(1992) Evaluating the accuracy of sampling-based approaches to the calculations of posterior moments. Bayesian Statistics , 4, 641–49.

14.

Green

(1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika , 82, 711–32.

15.

Liang

, Paulo

, Molina

, Clyde

and Berger

(2008) Mixtures of g-priors for Bayesian variable selection. Journal of the American Statistical Association , 103, 410–23.

16.

Liboschik

, Fokianos

and Fried

(2015). tscount: An R Package for Analysis of Count Time Series Following Generalized Linear Models.Germany : Universitäts- bibliothek Dortmund Dortmund.

17.

Lodewyckx

, Kim

, Lee

, Tuerlinckx

, Kuppens

and Wagenmakers

E-J

(2011) A tutorial on Bayes factor estimation with the product space method. Journal of Mathematical Psychology , 55, 331–47.

18.

McCabe

and Martin

(2005) Bayesian predictions of low count time series. International Journal of Forecasting , 21, 315–30.

19.

Raftery

, Madigan

and Hoeting

(1997) Bayesian model averaging for linear regression models. Journal of the American Statistical Association , 92, 179–91.

20.

Tierney

(1994) Markov chains for exploring posterior distributions. The Annals of Statistics , 22, 1701–28.

21.

Wang

, Wang

and Zhang

(2020) Poisson autoregressive process modeling via the penalized conditional maximum likelihood procedure. Statistical Papers , 61, 245–60.

22.

Weiß

(2018) An Introduction to Discrete- valued Time Series . Hoboken, NJ: John Wiley & Sons.

23.

Zeger

(1988) A regression model for time series of counts. Biometrika , 75, 621–29.

24.

Zhu

(2011) A negative binomial integer-valued GARCH model. Journal of Time Series Analysis , 32, 54–67.

25.

Zhu

(2012a) Modeling time series of counts with CoM-Poisson INGARCH models. Mathematical and Computer Modelling , 56, 191–203.

26.

Zhu

(2012b) Modeling overdispersed or underdispersed count data with generalized Poisson integer-valued GARCH models. Journal of Mathematical Analysis and Applications , 389, 58–71.

27.

Zhu

(2012c) Zero-inflated Poisson and negative binomial integer-valued GARCH models. Journal of Statistical Planning and Inference , 142, 826–39.

On Bayesian model selection for INGARCH models viatrans-dimensional Markov chain Monte Carlo methods

Abstract

Keywords

1 Introduction

2 INGARCH models

2.1 Linear INGARCH models

2.5 Prediction and Model averaging

Table 1

Averages of posterior model probabilities after 100 samples of data from each model (multiplied by 100) where M i is the class of linear and LM i of log-linear INGARCH models accordingly. Our approach correctly detects the model with high probability

5.1 Polio data

Figure 1

Series of polio dataset (left) and its ACF (right)

Estimated posterior means and standard deviations(sd) for the competing models

Geweke's convergence z-scores for the parameters of five linear and five log-linear INGARCH models

Posterior probabilities and Bayes factor for the competing models

Posterior density for λ T + 1 for polio data

Figure 3

Series of campylobacterosis dataset (left) and its ACF (right)

Estimated posterior means and standard deviations(sd) for the competing models

Geweke's convergence z-scores for the parameters of five linear and five log-linear INGARCH models

Posterior probabilities and Bayes factor for the competing models

Posterior density for λ T + 1 for Campylobacterosis data

Appendix A: MCMC for the general INGARCH model

Appendix B: The transdimensional MCMC

Footnotes

Declaration of conflicting interests

Funding

Acknowledgments

References

Averages of posterior model probabilities after 100 samples of data from each model (multiplied by 100) where $M_{i}$ is the class of linear and ${LM}_{i}$ of log-linear INGARCH models accordingly. Our approach correctly detects the model with high probability

Posterior density for $λ_{T + 1}$ for polio data

Posterior density for $λ_{T + 1}$ for Campylobacterosis data