Abstract
Abstract:
Very often time series are subject to abrupt changes in the level, which are generally represented by Markov Switching (MS) models, assuming that the level is constant within a certain state (regime). This is not a realistic framework because in the same regime the level could change with minor jumps with respect to a change of state; this is a typical situation in many economic time series such as the Gross Domestic Product (GDP) or the volatility of financial markets. We propose to make the state flexible, introducing a very general model which provides oscillations of the level of the time series within each state of the MS model; these movements are driven by a forcing variable. The new model allows for consideration of extreme jumps in a parsimonious way, without the adoption of a large number of regimes (in our examples the two-state MS models are used). Moreover, this model increases the interpretability and in particular the out-of-sample performance with respect to the most used alternative models. This approach can be applied in several fields, also using unobservable data. We show its advantages in three distinct applications, extending particular MS models, which involve macroeconomic variables, volatilities of financial markets and conditional correlations.
Introduction
The nonlinear behaviour of many economic time series, characterized by abrupt level changes, has been the object of several studies in the last decades, favouring the development of switching regime models; see Hamilton (2016) for a review of this kind of models in a macroeconomic framework, Franses and van Dijk (2000) for financial time series. In particular, the level of the time series changes in unknown time points; this characteristic has favoured the great success of models with two or more regimes that are able to detect the unknown change points. Among them, we recall the Markov Switching Autoregressive (MS–AR) model of Hamilton (1989); Hamilton (1990),
The AR–MS model was originally proposed by Lindgren (1978), but its diffusion and success in the economic framework is due to the works of Hamilton.
The simple form of the MS model, its clear interpretation and the possibility to infer on the regime using the Hamilton (1990) filter, justify its enormous success in statistical and econometrical literature and its application in several fields of economics. In general, the MS model provides a good fit of data capturing the presence of abrupt changes, but the consideration of fixed coefficients within each state is a rigid constraint. This problem could be solved with a larger number of states, which involves computational problems (high computational time, convergence, identification, Markov chains with zero elements). Also, the identification of the number of states is an open problem; very often the practitioners adopt a two-state MS model to avoid loss of efficiency in estimation due to the small possible number of observations falling in a certain state. Similarly, a higher number of states might imply the absence of transitions from one state to another (there are no cases in which the state changes from regime
A graphical example is helpful to understand the main motivations of this work. Let us observe the first differences of the quarterly US GDP series (in logs) in Figure 1. We can notice that the series is characterized by frequent oscillations, in particular in its first half, until 1984, with a few cases in which the series level is more than 4. These points characterize the boom in 1950 after the Second World War and the following quarters: Q1 71, Q2 78, Q4 80 and Q1 81. An MS model with two states can identify these dates, assigning them to the regime with the highest mean, which will be constant within the same regime. But it is evident that the level of these points changes.

It is likely that a better fit of the data could be obtained, providing the possibility to change the parameters within the regimes. In practice, in Figure 1, a certain flexibility could be obtained if we allow a change in the level within the regimes, with the chance to isolate in a single state the highest peaks of the series. For this purpose we propose a new model, called Flexible State Markov Switching (FSMS) model, considering time-varying coefficients within each state. The within-state dynamics of the coefficients is driven by forcing variables, which can also be non-observable. We have developed a very general framework, extending different MS models and providing examples of their applications.
This article is structured in the following way: in the next section we will describe the new model proposed, pointing out how the estimation procedure can follow the steps proposed in Hamilton (1990). Section 3 illustrates three examples of application of the FSMS model, dealing with the US GDP, the volatility of the NASDAQ 100 index and the conditional correlations of the Dow Jones Industrial Average index (DJ henceforth) components; in this section we will develop specific FSMS models and will compare them with MS models and other alternatives by indices of in-sample and out-of-sample performance. Some final remarks will conclude the article.
Let
We assume that, within each regime, the model can vary in a range of models which differ for the values of the switching parameters (representing the level of the series
The estimation of Equation 2.2 can be obtained extending the procedure proposed by Hamilton (1990), including Fix Iterate steps 2–5 from
In practice the density of The first smoothed probability From the same Hamilton filter derive the probabilities
Iterate steps 2–4 from
The inference on the regime is obtained from the smoothed probabilities, assigning the observations with
In the following section we will propose three different specifications of Equation (2.2), which extend three well-known MS models to include the flexible structure within the regimes.
As mentioned before, Equation (2.2) is a very general form of the FSMS model and can include the extension of all MS models. In this section we propose the extension of three MS models developed in different applicative fields. In the first case, we extend the analysis of the US GDP by an MS–AR model, similar to the one adopted by Hamilton (1990); in the second case, we consider the analysis of the volatility of the NASDAQ 100 US index extending the Markov Switching Generalized AutoRegressive Conditional Heteroskedasticity (MS-GARCH) model (see, for example, Dueker, 1997); finally, we extend the analysis of the correlations of the assets compounding the DJ index extending the Regime Switching Dynamic Correlation (RSDC) model proposed by Pelletier (2006). It should be noticed that in the first case the variable analyzed (GDP) is observed, whereas in the last two applications, it is estimated by the model (i.e., the conditional variance in the second case and the conditional correlations in the third one). We will adopt as forcing variables two different quantities: the estimated lagged level of GDP in the first application and the filtered probabilities of state
A similar solution was adopted by Otranto (2008) in developing MS models with transition probabilities driven by latent variables.
In all the examples, the FSMS model is compared with four competitors among the most used models in each framework;
In the first two examples, devoted to the AR and GARCH models and their extensions, we have tried to estimate also smooth transition models (Teräsvirta (1994); Teräsvirta (2009)) and threshold models Tong, (1990), but their estimation provides results similar to the linear case, as they are not able to detect different regimes. For this reason we have not included them in the comparison..
We thank the Associate Editor who suggested this approach.
Let us suppose that the variable of interest
Let us consider the MS model Equation (2.1), adopting an AR(2) specification for the components
The only switching coefficient is the mean of the process
In terms of the notation used in section 2, we put
We also consider four competitive models:
The AR-KC (Known Change-point) model: It is a simple switching AR(2) model with known change-points; in practice, we use the official dating of the business cycle proposed by NBER Available at the website http://www.nber.org/cycles/cyclesmain.html.
The linear AR model: In this case all the parameters are constant;
The time-varying parameter (TVP) model: We used a state–space representation (see, e.g., Harvey, (1990)) with the coefficient
the MS–AR model Equation (3.1).
Parameter estimates (standard errors in parentheses) and evaluation criteria of alternative models for the US GDP variations (in-sample set: Q1 1948–Q4 2012; out-of-sample set: Q1 2013–Q4 2014)
Table 1 shows the estimation of the parameters of the alternative models,
Most of the parameters of the TVP model (3.4) are different from the parameters of the four AR(2) models shown in Table 1, so we do not show them into the same table. They are (standard errors in parentheses):
The estimation of the unconditional mean is equal to 1.40 in the linear AR model; it assumes value 1.47 in regime 0 and 4.35 in regime 1, if we adopt model MS, whereas using model FSMS, it varies in the ranges

In Figure 2 we have shown the two estimated wsd functions; it should be noticed that both functions are approximately constant around 0.43 and 0 respectively with abrupt jumps, reaching the maximum value 1 in correspondence to the highest peaks of GDP. This result implies that when the state is 0, the estimated mean is around 1.36 with small changes, whereas near to 4.78 (the estimated value of

The second part of Table 1 compares the models by residual autocorrelation and forecasting performance. It is interesting to notice that only MS and FSMS models are not affected by residual autocorrelation, if we base the analysis on a Ljung–Box test at a
The AR–KC model possesses the highest log-likelihood, but, as mentioned before, this information has to be considered with due care. More interesting are the results deriving from the MSE evaluation; in in-sample terms the best performance is shown from the AR-KC model and FSMS is the second best. Anyway, comparing the squared errors by the well-known Diebold–Mariano (DM) test (Diebold and Mariano, 1995) with the correction proposed by Harvey et al. (1997), we obtain that the predictions of the two models are not significantly different at a
In practice, the comparison of the five models for the US GDP dataset seems to favour the FSMS model not only in terms of interpretability of the regimes but also by goodness of fit, diagnostic checking and forecasting performance.
A recent application field of MS models concerns the analysis of the volatility of financial time series, in particular, inserting a Markovian dynamics in the coefficients of the GARCH model (Bollerslev, 1986).

Let us consider the series of the returns
In financial analysis the return at time
The dates on the
We have estimated, both in MS and FSMS cases, the usual GARCH(1,1) model, but the coefficient of the lagged return was estimated equal to zero. The asymmetric effects are referred to the increase in the volatility level when the most recent return is negative; see Glosten et al. (1993).
Resuming, the parameters of the FSMS–GARCH model are:
In MS–GARCH models (and, as a consequence, in FSMS models) a path dependence problem arises due to the non-observability of the volatility and the dependence of the state
Kim (1994), using real data, and Gallo and Otranto (2015), via simulations in an asymmetric multiplicative error model framework, show that the Kim approximation provides good results with decreasing bias when the sample size increases. This approximation can also be used for the FSMS model.
In addition to the FSMS–GARCH model, we have estimated the following four models on the NASDAQ series:
We also tried to use a Time Varying Normalized Variance Stabilisation (TV–NoVaS) method (see Politis, (2015)), a model free forecasting approach which is robust against structural breaks. In particular we used a rolling windows of 250 observations, the exponential scheme to detect the weights, the component of the conditional variance to forecast the squared returns (for details see Politis and Thomakos, (2012)). Unfortunately the results show a relatively poor performance when compared with the other models, so we will not describe them.
Exponential smoother variance (ESV) model: It is known as the Riskmetrics variance, established in 1989 by the J.P. Morgan holding company, largely diffused among practitioners. The dynamics of the
Asymmetric GARCH model: We adopt the linear relationship:
Exponential GARCH (EGARCH) model: This model, proposed by Nelson (1991), is an alternative way to consider asymmetries in the volatility. Its logarithmical expression is given by:
the MS–GARCH model: We adopt Equation (2.1), where
We consider the full available span for the in-sample evaluation. For the out-of-sample evaluation we do not consider the last part of the series because it is a quiet period for the markets with small variations in prices (see the behaviour of the returns in Figure 4); as a consequence, the alternative models will estimate low levels of volatility and the results will be very similar. More interesting is the out-of-sample evaluation in turbulent periods such as the recent world crisis. For this reason we have considered 2008 as the out-of-sample span, estimating first the model using the data until 2007, forecasting the volatility for the first quarter of 2008; then by updating the data with the observations relative to the first quarter of 2008, we estimate the model and forecast volatility for the second quarter, and so on. In practice, we estimate each model four times and consider 246 out-of-sample forecasts.
In Table 2 the estimates of the common coefficients of the models belonging to the GARCH family are shown. The MS and FSMS models have similar coefficients, excluding the
Parameter estimates (standard errors in parentheses) and evaluation criteria of alternative models for the volatility of the NASDAQ 100 index (in-sample set 2 January 2004–5 May 2015; out-of-sample set 2 January 2008–31 December 2008)
In Figure 5 we show the behaviour of both wsd functions with the inference on the regime obtained from the FSMS model. The wsd function
In the bottom part of Table 2 it is possible to observe that all the five estimated models do not reject the hypothesis of non-autocorrelation of residuals. Their comparison by loss functions cannot be conducted as in the previous example because the volatility
In Table 2 we can notice that the FSMS model shows the best in-sample and out-of-sample performance, but, using a DM test at a nominal size of


A recent strand of the literature is devoted to the analysis of the time-varying conditional correlation of a set of financial time series; this is due to the empirical evidence that financial time series are subject to co-movements and this relationship is more evident in the periods of high volatility (see, e.g., Longin and Solnik, 1995). Several models have been developed to represent the dynamic conditional correlation, adopting some re-parameterization to avoid the so-called curse of dimensionality: generally a large set of assets (or financial indices) are considered and the number of parameters is explosive. Feasible models were proposed by Engle (2002), Tse and Tsui (TT) (2002) and Silvennoinen and Teräsvirta (2015), to cite just a few. The increase in correlation for high volatility regimes can be developed inserting an MS dynamics in the conditional correlation models; this idea was developed, for example, by Billio and Caporin (2005), Pelletier (2006) and Otranto (2010). In particular, Pelletier (2006) proposed the simple RSDC model, where the full correlation matrix can switch from one regime to another, without path dependence problems; this model was extended by Bauwens and Otranto (2016), who provided the dependence of the state on some measure of volatility of a financial market. Referring to the notation in Equation (2.1),
The wsd functions are represented by smooth transition functions, using the filtered probability of the state
The parameters included in the vector
The dataset we have considered is formed by 28 of the 30 assets composing the DJ index from 2 January 2002 to 31 December 2015 (3525 daily returns for each series; source: Yahoo Finance).
We have excluded the series relative to the assets Nike, which starts in December 2002, and Visa, which starts in March 2008.
Bauwens and Otranto (2016) have proposed a wide class of conditional correlation models, depending on the volatility of the financial markets (they call this class volatility dependent conditional correlation (VDCC) models) and showed their good properties with respect to the other conditional correlation models in both in-sample and out-of-sample terms. They classify their models into three categories—dynamic VDCC, smooth transition VDCC, Markov switching VDCC. For the purpose of comparison, we select, within each category and among the models without volatility dependence, the model which obtained the best performance in the application of Bauwens and Otranto (2016) concerning the DJ components.
The application proposed by Bauwens and Otranto (2016) concerns a different time span and different assets because the DJ basket has been recently updated. In the new basket the assets Apple, Goldman Sachs, Nike and Visa replace the assets Alcoa, AT&T, Bank of America and Hewlett-Packard, present in the dataset of Bauwens and Otranto (2016).
The dynamic conditional correlation model by TT (2002): The conditional correlation is expressed by:
The TT Time-Varying Volatility dependent (TT-TVV) model: It is a TT model with a time–varying coefficient, given by the following set of equations:
The STCC–V model: It is a smooth transition conditional correlation model, proposed by Silvennoinen and Teräsvirta (2015):
The TV–RSDC model: It is a RSDC model with time-varying probabilities, depending on the volatility measure:
VXD is a proxy of the volatility of the DJ index, constructed in the same way as the well-known VIX, which is referred to the Standard & Poor 500 index.
Parameter estimates (standard errors in parentheses) and evaluation criteria of dynamic conditional correlation models for the DJ dataset (in-sample set 2 January 2002–31 December 2014; out-of-sample set 2 January 2015–31 December 2015)
In Table 3 we show the estimation results relative to the
Of course, in this application it does not make sense to evaluate the performance of the models by MSE because the true correlation is unknown and there are no logical proxies. When dealing with financial markets, it is frequent to compare the performance of alternative models by minimum variance portfolio rather than statistical criteria. Following Engle and Colacito (2006), we consider a set of
See Engle and Colacito (2006) for the methodological and practical tools to apply this test in this framework.
We have proposed an extension of the MS model to allow the switching coefficients to vary within the states, increasing the flexibility of the model;
The GAUSS codes and the data for performing estimation in all three examples are available on request or can be downloaded at the website: https://sites.google.com/site/edoardootranto/home/publications.
In our applications we have considered as switching the coefficients that affect the levels of the time series (in particular, the mean in the GDP example, the constant parameter of the volatility in the NASDAQ example and the scale coefficients of the conditional correlation matrix in the DJ dataset). This is done because the analysis of the levels can be made in graphical terms and provides a better appreciation of the interpretability of the new model; anyway, it is possible to extend the analysis also considering the other coefficients of the models as switching and flexible, or splitting the set of switching coefficients in two subsets: a set of flexible switching parameters and a set of fixed switching parameters. In fact, these cases are consistent with Equation (2.2), representing the general FSMS model.
We have considered only 2-state FSMS models; the formal extension to a larger number of states is not difficult, but, as in the MS case, it involves some computational problems, in particular, when a few observations belong to a single state. Anyway, the flexibility added to the MS model, with the possibility to change the value of the switching coefficients within the regimes, is sufficient to capture extreme jumps that, in a classical MS model, would belong to a further regime.
The comparison with other models was made performing in-sample and out-of- sample experiments for each application. In general, using the DM test, we have noticed that the FSMS model belongs to a restricted class of models showing the best in-sample performance, whereas in out-of-sample terms, in all the applications it shows the lowest loss functions, significantly better than those of the alternative models.
In FSMS models the changes within the states are driven by selected forcing variables; a pleasing characteristic is that these variables are not necessarily observed, but they could be derived from the estimation process. For example, in the GARCH and RSDC cases we have used the filtered probabilities, derived from the Hamilton filter, to drive the wsd of the switching coefficients. This idea is particularly appealing because the filtered probabilities are strictly correlated to the movements within the states; moreover, it is the only practical solution when observed forcing variables are not available.
Footnotes
Acknowledgments
I would like to thank an associate editor and an anonymous referee for their detailed reading of the article and many suggestions which led to the clarification of the focus of the article. All remaining errors are mine.
This research was supported by University of Messina (Research & Mobility 2015 Project; project code RES_AND_MOB_2015_ DISTASO).
