Abstract
This paper deals with the problem of modelling time series data with structural breaks occur at multiple time points that may result in varying order of the model at every structural break. A flexible and generalized class of Autoregressive (AR) models with multiple structural breaks is proposed for modelling in such situations. Estimation of model parameters are discussed in both classical and Bayesian frameworks. Since the joint posterior of the parameters is not analytically tractable, we employ a Markov Chain Monte Carlo method, Gibbs sampling to simulate posterior sample. To verify the order change, a hypotheses test is constructed using posterior probability and compared with that of without breaks. The methodologies proposed here are illustrated by means of simulation study and a real data analysis.
Introduction
In the fast-growing world, data observed on chronological order show dynamical phenomenon of significant growth occurred due to implementation of new government policy, new technology and other policies. Such phenomena may often exhibit structural changes over a period of time. At a structural break, changes may occur in process mean, error variation and trends. An extensive study took place in this field. Perron (1990), Perron and Vogelsang (1992), Zivot and Andrews (1992), Chib (1998), Lee and Strazicich (2003), Maddala and Kim (2003), Chaturvedi and Kumar (2007), Shao and Zhang (2010) among others addressed the problem of testing, detection and estimation under structural breaks in the mean or trend. They demonstrated various applications in oil price, stock market and GNP growth rate etc using classical and Bayesian methods of estimation. The problem of modelling of structural breaks with varying error variance has been addressed by Inclan (1993), Kim et al. (2002), Cook (2002), Kumar et al. (2012) and Kim et al. (2002). In the literature cite above, the problem of structural breaks in mean and error variance are addressed for both univariate and multivariate setups (see Bai, 2010; Meligkotsidou et al., 2011 and Eo, 2016). Meligkotsidou et al. (2017) and Slama and Saggou (2017) also conceded the multiple break points in autoregressive coefficient and applied Bayesian significance test to detect the change occurs in the parameters. Most of the authors considered AR(1) model for detecting structural breaks in association with unit root, see Perron (1990), Perron and Vogelsang (1992), Zivot and Andrews (1992), Inclan (1993), Chib (1998). However, the following literature, Wang and Zivot (2000), Meligkotsidou et al. (2017), Vosseler (2016), Slama and Saggou (2017) considered the problem of estimation and testing under single or multiple breaks for general AR(p) model.
In the above studies, the order of AR model is assumed to be predetermined at each structural break and only changes in model parameters are considered. That seems to be unrealistic for real life situations since characteristics of any random phenomenon do not remain constant and priory know due to inherent uncertainty. A process may also change from one order (lag) to another order (lag) due to presence of structural break, known as order change process. For example, in economic and finance series a great depression was occurred in oil price stocks due to sudden changes in economic policy and stock market trading which shifted the structure of series. Therefore, this article aims to propose a flexible and generalized AR(p) model where the order of the model is assumed to be unknown and to be estimated from the given time series data. Thus, proposed method will have potential to model the situation where the AR process completely changes it’s order at each break point.
The problem of Bayesian estimation for autoregressive process under structural break is discussed by many authors. Broemeling (1972) discussed the change point problem with known variance and construct Bayesian estimators of the parameters under non-informative priors. Smith (1975) discussed the problem of change point based on the posterior probability of the possible change. A Bayesian significance test for stationarity of a regression model was proposed by Kim (1991). Barbieri and Conigliani (1998) derived a Bayes factor for identification of a change in mean of a stationary autoregressive model. Kezim and Abdelli (2004) studied a Bayesian analysis of an AR(1) model subject to one change in both error variance and autocorrelation coefficient. Meligkotsidou et al. (2011) considered structural change in level, error variance and autoregressive coefficient at unknown break point and estimation the model parameters by using marginal likelihood. Kumar et al. (2012) investigated the impact of structural break in error variance using Bayesian framework and applied on export data of selected ASEAN countries for AR(1) time series model.
The objective of the paper is many folds. First objective is to introduce an order change autoregressive model (O-AR) in consideration for structural breaks on other parameters of the series. Second is to develop maximum likelihood and Bayesian estimation methods for estimating the unknown parameters of the model. Finally, we examine the impact of such change on its regression parameters as well as order. Rest of the manuscript is organized in the following sections. The proposed model is stated in Section 2. Considering change in mean, variance and trend, four different models are given with order change. The maximum likelihood estimation is used to estimate the parameters. Bayesian estimation is discussed in Section 3. Considering conjugate prior distributions, Bayes estimation is performed under two symmetric loss functions, squared error loss and absolute loss. Posterior probability is used for detecting an arbitrary change in the order as well as breaks in mean and error variance of an autoregressive time series model under the assumption that order of the model at each break is known. To evaluate the performance of our proposed model, a simulation study and an empirical analysis of natural gas series of United State are carried out in Sections 4 and 5 respectively. At the end of the paper, conclusions are stated in Section 6.
Model specification
In this section, we consider a univariate autoregressive time series model of order p which is generated through the stochastic process
where
Mainly, structural breaks take place when parameters of the model are shifted or changed permanently from one period to another period for a given series. In many cases, breaks do not affect only parameters but also affect the order p of autoregressive time series model. If the change in order of AR process is not taken place with the change on autoregressive coefficients, then this may be termed as misidentifications. So, it is equally important to consider the structural break with order change too. The main motivation of present study is to explore AR model which allows break on all parameters with the change on order. One may also be interested to explore the model where some parameters may not have exposed by the structural break. Therefore, we are formulating following models in respect to the various break(s) situations:
M1: Autoregressive model with change on order but no change on mean and variance. M2: Autoregressive model with change on order and mean but no change in variance. M3: Autoregressive model with change on order and error variance but no change on mean. M4: Autoregressive model with change on order, mean and error variance.
Let
For mathematical manipulation, we may write model in Eq. (2) in matrix form,
where
Similarly, model Eq. (1) may be written as under model condition M2
and under model condition M3, model Eq. (1) expressed as
Above model Eq. (3) to Eq. (5), allow break on parameters one by one. However, our main interest is to study the model which allows break on all parameters. Under model condition M4, we get the model
Here it is noted that Model M1–M3 are the particular case of Model M4. In classical approach, the most commonly used estimation method is maximum likelihood method which is obtained by maximizing the likelihood or equivalently log-likelihood function. The likelihood functions for models M1–M4 are respectively obtained by
where,
The general forms of the maximum likelihood estimators of the parameters can be obtained as,
where,
All model conditions M1–M4 allows break in respective parameters and one may be interested to test that there is a break or not. That can be stated with the following hypothesis. Under model M1:
Bayesian inference allows to use the information available with data by likelihood function and information about the parameters by prior information together which produces the joint posterior distribution as defined by,
The joint posterior distribution of
The
For change point parameter, all possible location of break
The joint posterior distribution for models M1 to M4 are respectively given by
For hypothesis testing under Bayesian framework, we first obtain marginal posterior probability for each model and compare with other models. Marginal probability is obtained by integrated over the specified range of parameters. The marginal posterior probabilities for M1 to M4 are given by
where
Using the posterior probabilities, we select the most appropriate model corresponding to the maximum posterior probability and then estimate the model parameters. Sometimes, normal numerical methods are not capable to do the estimation due to complexity because of not getting standard format of a distribution. In such cases, Markov chain Monte Carlo (MCMC) method is very useful to simulate the samples from posterior distribution. Geman and Geman (1984) described a procedure known as Gibbs sampler for generating the values based on full conditional posterior distribution. The conditional posterior distribution of the autoregressive coefficient, intercept and error variance under all models are derived and respective mean and variance are presented in appendix Table A1.
In order to obtain the conditional posterior distribution of location of break points, first identify the number of break(s) where order change takes place. In present study, we consider number of break (
where,
We also obtain the distribution of
In this section, we summarize the simulation results and compare the performance of Bayes estimators obtained under different loss functions with maximum likelihood estimator. We first generate a series from the model M4 proposed in previous section with known order for selected initial values of the parameters. To compute the posterior probabilities and Bayes estimates, we consider the conjugate prior distribution for model parameters and derive the posterior distribution. Since, the conditional posterior distribution takes the form of the standard distribution for each parameter. We use Gibbs sampler to compute the Bayes estimates of parameters based on simulated posterior samples and then obtain average estimates and HPD confidence interval. The efficiency of estimators is compared by using the average absolute bias (AB) and mean squared error (MSE).
For numerical simplification, first we explore the model for single break in autoregressive process which changes the order from
with
Model is trained for three different breaks position, where the order change occurs at T/4, T/2 and 3T/4. The posterior probability of the break point under the assumption that order change will occur at each point is obtained as sketched in Fig. 1 for M4 model. In Fig. 1, one can easily see that the posterior probability is maximum at 25
Average posterior model probability when data generated from the model M4
Average posterior model probability when data generated from the model M4
Simulated series and posterior probability of the order change point with different value of 
As per discussion, the model is identified with help of the posterior probabilities. The posterior probabilities are recorded in Table 1. From Table 1, it is observed that Bayesian model probability is accurately identifying the data generating process (DGP) across for M4 model. The posterior probability of M4 is increased with increase the series size i.e.,
Posterior mean (standard deviation) of M4 model parameter when
Confidence interval and HPD interval of M4 model parameters when
Average absolute bias and mean squared error of M4 model parameter when
Studying the results, one can observe that the AB and MSE of all estimators with different break points decrease as size of the time series is increased. Bayes estimator’s performance is better than the MLE. However, Bayes estimators obtained under SELF and ABS have shown similar performances. The simulation study is also analyzed for other models M1–M3 as well as different sizes of the series. But due to page limits, this is not included in main manuscript and provided as per need.
In this section, we apply our proposed methodology to illustrate real life applications and consider an empirical application of time series data on import and export prices of U.S. natural gas. Natural gas is taken as an energy source as the ideal transition fuel among traditional fossil fuels, coal and oil, and it is renewable for the future. With increasing concerns about the environmental and climate changes, consequence of greenhouse gas emissions, natural gas is heralded as a less harmful energy source. So, every nation wants to develop natural gas as an alternative resource. This is popular as a source of heating and cooking power in private residences as well as businesses purpose in most of the countries. The average price of the import and export natural gas was recorded in dollars per thousand cubic feet. Due to varying the natural gas price, there may be change in the structure of the time series. This variation may be explained with our proposed model.
Natural gas price of import series
For analysis purpose, the natural gas price of the import monthly series from 2001:1 to 2017:10 obtained from U.S. Energy Information Administration (EIA) which bears the structural break on all parameters including order. In order to identification of order change points, 10% data is trimmed. The posterior probability helps to identify the break point where the order is changed. The original series with posterior probability corresponding to each point is shown in Fig. 2.
In Fig. 2, we observed that there are two different regions where posterior probability of the break point forms a cluster. Considering more generalized view, we have considered one by one break point of the cluster as order change point and then fit the model. For simplicity, we consider maximum lag five of AR and get twenty combinations. In Fig. 2, highest probability recorded at break points Feb 2003 and Jan 2015, which lies between 1
Model coefficient of the best order change model for import series onsidering single break
Model coefficient of the best order change model for import series onsidering single break
If we consider both order change points jointly and then search the best autoregressive model for each region. In order to construction of order combination for three regions, we get 125 different pairs. For each pair, we calculate the model probability for all four models and AIC values. The M2 model has maximum probability as compare to other three models and recorded pair has minimum AIC value for
Model coefficient of the best order change model for import series onsidering two breaks
The import series
Spot price of the imported natural gas series with posterior probability of the breaks.
Fitted series of the imported natural gas series when considering two breaks at Feb 2003 and Jan 2015.
Similarly, we do the modelling of the export series and identify the order change points using the posterior probability. The export time series and posterior probability are shown in Fig. 4. After trimmed 10% series, we see that the first quarter of 2014 has the largest posterior probability as compared to rest of the points as shown in Fig. 4, which indicates that in this period order change phenomenon takes place.
Spot price of the export natural gas series with posterior probability of breaks.
All three months of the first quarter are considered as an order change points and models are fitted. The best model is selected with the help of information criterion and posterior probability. The model posterior probability of the export series is recorded at change points Jan 2014, Feb 2014 and March 2014 with corresponding maximum probability values are 0.992, 0.996 and 0.992 respectively. Among all four models, M1 model has recorded the maximum probability. Thus, only order change model is suitable for export series also. The estimated coefficients, MSE and AIC value of M1 model corresponding to all three possible break points are recorded in Table 7 and observe that MSE and AIC values is minimum when March 2014 is considered as order change point with order (2, 5). The export time series and fitted series are plotted in Fig. 5. In this figure, SELF and ABS estimate give better fit as compared to MLE estimate.
Model coefficient of the best order change model for export series
Fitted series of the export natural gas series when considering a break at March 2014.
Time series modelling is a special approach of modelling where present states are dependent on it’s past. In this article, a times series model is proposed which allows the break on autoregressive coefficients, error variance as well as order of dependence in case of autoregressive time series. The estimation methodologies such as maximum likelihood and Bayesian estimations with squared error and absolute loss functions are discussed for estimating the parameters of the proposed model. The simulation study is carried out to compare the performances of the estimators. The proposed model is also trained for U.S. natural gas price of import and export time series for empirical example. Study recorded the two breaks in import series on Feb 2003 and Jan 2015. However, in export series order change phenomenon is happened in 1st quarter. The best model was identified corresponding to minimum AIC as well as maximum posterior probability. As the model is very less explored so this may also train for model selection based on accuracy of forecasting. The present work may be extended in case of multivariate time series and time series model with non-normal error in future work.
Footnotes
Appendix
Posterior mean and variance of the parameters of the model M1–M4
Model
Distribution
Mean
Variance
M1
Multivariate Normal
Normal
Inverse Gamma
M2
Multivariate Normal
Normal
Inverse Gamma
M3
Multivariate Normal
Normal
Inverse Gamma
M4
Multivariate Normal
Normal
Inverse Gamma
