A New Constant Memory Recursion for Hidden Markov Models

Abstract

We develop the recursion for hidden Markov (HM) models proposed by Bartolucci and Besag (2002), and we show how it may be used to implement an estimation algorithm for these models that requires an amount of memory not depending on the length of the observed series of data. This recursion allows us to obtain the conditional distribution of the latent state at every occasion, given the previous state and the observed data. With respect to the estimation algorithm based on the well-known Baum-Welch recursions, which requires an amount of memory that increases with the sample size, the proposed algorithm also has the advantage of not requiring dummy renormalizations to avoid numerical problems. Moreover, it directly allows us to perform global decoding of the latent sequence of states, without the need of a Viterbi method and with a consistent reduction of the memory requirement with respect to the latter. The proposed approach is compared, in terms of computing time and memory requirement, with the algorithm based on the Baum-Welch recursions and with the so-called linear memory algorithm of Churbanov and Winters-Hilt. The comparison is also based on a series of simulations involving an HM model for continuous time-series data.

1. Introduction

Hidden Markov (HM) models have become a popular statistical tool for the analysis of time-series data; for an up-to-date review, see Zucchini and MacDonald (2009). These models are also of great interest for the analysis of longitudinal data, where independent short time series are observed for many statistical units; for a review, see Bartolucci et al. (2013). HM models are based on the assumption that the observable random variables, corresponding to consecutive time occasions, are conditionally independent given a not directly observable (or latent) process, which follows a Markov chain. Usually, this Markov chain is time homogeneous and of first order, so that the latent state at a given occasion only depends on the previous state and the transition probabilities are time invariant.

A fundamental tool of inference for HM models is represented by the forward-backward recursions of Baum and Welch (Baum et al., 1970; Welch, 2003). For a first-order HM model, these recursions allow us to compute the manifest probability (or density) of the observed sequence of data and to obtain the posterior distribution of every latent state and of every pair of consecutive latent states given these data. Based on these recursions, it is then possible to implement an Expectation-Maximization (EM) algorithm (Baum et al., 1970; Dempster et al., 1977) for maximum likelihood estimation of the parameters and to perform local decoding (Juang and Rabiner, 1991), which consists of finding the most likely state at every occasion, given the observed data. Moreover, the Viterbi algorithm (Viterbi, 1967) is usually used to perform global decoding that, unlike local decoding, in which the states are found for each time occasion separately, is aimed at finding the most likely sequence of latent states corresponding to the observed sequence of data.

Despite its popularity, the Baum-Welch recursions may suffer from numerical problems due to the fact that certain probabilities become negligible. This problem may be solved by using dummy renormalizations; see Scott (2002) for further comments and Lystig and Hughes (2002) for an alternative solution in dealing with the manifest distribution of the observed data. A more severe limitation is that the Baum-Welch approach requires an amount of memory that linearly increases with the length of the series of data, and this may limit its applicability in certain biological and engineering contexts where huge series are observed; see Miklós and Meyer (2005) and Churbanov and Winters-Hilt (2008) for further comments.

Alternative recursions have been proposed in the literature in order to reduce the amount of memory required by the Baum-Welch algorithm, or to make the memory use independent of the sequence length. In particular, the checkpoint algorithm (Grice et al., 1997; Tarnas and Hughey, 1998; Wheeler and Hughey, 2000) was proposed to reduce the memory complexity, which, however, still depends on the length of the series of data. Further refinements of this algorithm, as described by Miklós and Meyer (2005), were aimed at making the memory demand independent of the observed sequence length, based only on a forward recursion. Moreover, the linear memory implementation of the Baum-Welch algorithm proposed by Churbanov and Winters-Hilt (2008) reversed the originally forward-only algorithm of Miklós and Meyer (2005) by using backward probabilities of state occupation, so as to accelerate the estimation of the prior state probability and to decrease the memory requirement. However, this is achieved with an increase of the computing time. Finally, Khreich et al. (2010) proposed a variation of the Baum-Welch algorithm whose memory requirement does not depend on the length of the time series and whose time complexity is of the same order as the standard Baum-Welch algorithm.

In a rather recent article, Bartolucci and Besag (2002) proposed a probabilistic result to obtain the marginal distribution of a random variable in Markov random field models and showed that this result may be also used to deal with first-order and second-order HM models. In the present article, we use this result to implement a recursion and an estimation algorithm for first-order HM models, whose memory demand and computing time are directly comparable with those required by the algorithm proposed by Churbanov and Winters-Hilt (2008). In particular, the proposed recursion allows us to obtain the posterior distribution of every latent state given the previous state and the observed data, on the basis of only backward steps. With respect to the Baum-Welch recursions, our recursion has the advantage of not requiring dummy renormalizations. More importantly, it also allows us to directly perform global decoding, so as to obtain the predicted sequence of latent states on the basis of the estimated posterior probabilities. In more detail, this alternative approach for global decoding only involves quantities that are obtained from the proposed backward recursion, without requiring the implementation of the Viterbi algorithm, which involves both forward and backward steps. This implies a strong reduction of the amount of memory required for global decoding. Moreover, if implemented using only forward steps, the proposed method allows us to perform on-line decoding (Langley, 1995; Polikar et al., 2001), which is useful when the data are collected by a continuous flow of information, as may happen in engineering or biology; see Khreich et al. (2012) for a review of the main techniques suitable for this type of decoding.

The remainder of the article is organized as follows. In the following section, we briefly review HM models of first order and their maximum likelihood estimation through the EM algorithm. In Section 3, we review the main recursions proposed in the literature for estimation and global decoding, that is, the Baum-Welch recursions, its linear memory implementation, and the Viterbi algorithm. The proposed recursion is illustrated in Section 4, together with its use for maximum likelihood estimation and the alternative approach proposed for global decoding. In Section 5, we illustrate a simulation study aimed at comparing the performance of the proposed estimation algorithm, which is based on the Bartolucci-Besag recursion, with the algorithms based on the Baum-Welch recursions and the linear memory recursion. We also make a comparison between the proposed method for global decoding and the Viterbi algorithm. Finally, Section 6 contains the main conclusions.

The algorithms and recursions illustrated in this article have been implemented in a series of R functions based on Fortran compiled codes that are available to the reader upon request.

2. Preliminaries

In this section, we first illustrate the basic assumptions of the HM models for time series data, and then we briefly review maximum likelihood estimation of certain models of this type through the EM algorithm.

2.1. Hidden Markov models

Consider a sequence of T manifest random variables \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$Y_1 , \ldots , Y_T$$\end{document} that are collected in the vector Y . An HM model assumes that these random variables are conditionally independent given the sequence of latent random variables \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$V_1 , \ldots , V_T$$\end{document} . The latter follows a Markov chain with k states, indexed from 1 to k. We consider, in particular, Markov chains that are homogeneous and of first order, so that V_t is conditionally independent of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$V_1 , \ldots , V_{t - 2}$$\end{document} given V_t₋₁ for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 3 , \ldots , T$$\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}p ( V_t = v \mid V_{t - 1} = u ) = \pi_{v \mid u} , \quad t = 2 , \ldots , T , u , v = 1 , \ldots , k ,\end{align*}\end{document}

with π_v_|u being a transition probability independent of t. The Markov chain is also characterized by the initial probabilities \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\pi_v = p ( V_1 = v ) , \quad v = 1 , \ldots , k.\end{align*}\end{document}

Another basic assumption of HM models is that every response Y_t depends on the latent process only through V_t, and then by f_t(y|v) we denote the probability mass (or density) function of this distribution. In particular, as in Churbanov and Winters-Hilt (2008), we explicitly consider the case in which Y_t given V_t has a normal distribution with mean and variance depending on V_t. Consequently, we have that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f_t ( y \mid v ) = \frac { 1 } { \sqrt { 2 \pi \sigma_v^2 } } \exp \left[ - \frac { 1 } { 2 } \left( \frac { y - \mu_v } { \sigma_v } \right) ^2 \right] , \tag { 1 } \end{align*}\end{document}

where μ_v and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\sigma_v^2 , v = 1 , \ldots , k$$\end{document} , are parameters to be estimated. Note that, in certain formulations, the means of the different states are constrained to be 0. In dealing with financial data, an HM version of the stochastic volatility (SV) model (Taylor, 2005) results in this way; see also Bartolucci and De Luca (2001) and Langrock et al. (2012). On the other hand, when the means are free parameters, we constrain the variances to be equal to each other in order to avoid degenerate solutions, which may also happen with a finite mixture model of normal distributions (see chapter 3.8 in McLachlan and Peel, 2000). The resulting model, which is referred to as HM homoscedastic model, is considered within the simulation study described in Section 5.

The other case we consider explicitly here is that of categorical response variables with c categories, labeled from 1 to c. In this case, we have that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f_t ( y \mid v ) = p ( Y_t = y \mid V_t = v ) = \phi_{y \mid v} , \quad v = 1 , \ldots , k , y = 1 , \ldots , c , \tag{2}\end{align*}\end{document}

with φ_y_|v being parameters to be estimated in place of μ_v and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\sigma_v^2$$\end{document} . This formulation may be of interest in genetics, for the analysis of biological sequences (Bishop and Thompson, 1986), such as DNA and protein sequences. In particular, Churchill (1989) introduced a method to describe DNA sequences using an HM model. Moreover, this model has been used for exploring structural similarities of families of genes (Churchill, 1992) and for producing multiple sequence alignments (Krogh et al., 1994). Furthermore, sequence analysis using HM models is a standard approach in bioinformatics (Durbin et al., 1998) and is a fundamental component of many gene-finding algorithms that identify and delineate genes in the human and other genomes (De Fonzo et al., 2007).

It has to be clear that the same modeling framework as above may be adopted in dealing with longitudinal data, when we observe short sequences of data for n sample units, which are usually assumed to be independent of each other. However, we do not explicitly consider the case of longitudinal data here, because the theory that will be developed may be also easily applied to this case. For more details and examples referring to HM formulations for longitudinal data, see Bartolucci et al. (2013). Moreover, we can also consider HM models based on a Markov chain of higher order and models that include covariates. The results developed here may be easily adapted to deal also with these cases.

2.2. Maximum likelihood estimation

Given a sequence of observations \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$y_1 , \ldots , y_T$$\end{document} collected in the vector y , which is a realization of Y , the model log-likelihood is simply \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\ell ( { \bf\theta} ) = \log f ( \textbf{\textit{y}} ),\end{align*}\end{document}

where θ is the vector collecting all model parameters. The structure of θ depends on the specific parameterization that is adopted for the conditional response distribution f_t(y|v). For instance, for the HM-SV model based on assumption (1) with means equal to 0, θ includes the initial probabilities \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\pi_v , v = 1 , \ldots , k$$\end{document} , the transition probabilities \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\pi_{v \mid u} , u , v = 1 , \ldots , k$$\end{document} , and the conditional variances \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\sigma_v^2 , v = 1 , \ldots , k$$\end{document} . Moreover, f ( y ) is the marginal probability (or density) of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$Y_1 , \ldots , Y_T$$\end{document} computed at y , depending on θ . This function is defined as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f ( \textbf{\textit{y}} ) = \sum_{\textbf{\textit{v}}}f ( \textbf{\textit{y}} \mid \textbf{\textit{v}} ) p ( \textbf{\textit{v}} ) ,\tag{3}\end{align*}\end{document}

where ∑_v means the sum over all the possible latent variable configurations \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\textbf{\textit{v}} = ( v_1 , \ldots , v_T )$$\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f ( \textbf{\textit{y}} \mid \textbf{\textit{v}} ) & = \prod_{t = 1}^Tf_t ( y_t \mid v_t ) , \\ p ( \textbf{\textit{v}} ) & = \pi_{v_1} \prod_{t = 2}^T \pi_{v_t \mid v_{t - 1}}.\end{align*}\end{document}

Computing f ( y ) by the sum in (3) is infeasible in typical applications; then recursions of the type that will be illustrated in Section 3.1 are necessary.

In order to maximize ℓ( θ ), we use an EM algorithm that follows the same principle as that illustrated by Baum et al. (1970); see also Dempster et al. (1977). In particular, this algorithm is based on the complete data log-likelihood \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\ell^* ( { \bf \theta} ) = \sum_{t = 1}^T \sum_{v = 1}^k z_t ( v ) \log f_t ( y_t \mid v ) + \sum_{v = 1}^k z_1 ( v ) \log \pi_v + \sum_{t = 2}^T \sum_{u = 1}^k \sum_{v = 1}^k z_t ( u , v ) \log \pi_{v \mid u} , \tag{4}\end{align*}\end{document}

where z_t(v) is a dummy variable equal to 1 if the latent state at occasion t is v and to 0 otherwise, and z_t(u, v) = z_t₋₁(u)z_t(v) is a dummy variable equal to 1 if there is a transition from state u to v at occasion t and to 0 otherwise.

At the E-step of the EM algorithm, we need to compute the posterior expected values of the above dummy variables given the observed data and the current value of the parameters. In particular, these expected values correspond to the following quantities \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \hat{z}_t ( v ) & = p ( V_t = v \mid \textbf{\textit{Y}} = \textbf{\textit{y}} ) , \quad t = 1 , \ldots , T , v = 1 , \ldots , k , \\ \hat{z}_t ( u , v ) & = p ( V_{t - 1} = u , V_t = v \mid \textbf{\textit{Y}} = \textbf{\textit{y}} ) , \quad t = 2 , \ldots , T , u , v = 1 , \ldots , k.\end{align*}\end{document}

As usual, the M-step of the EM algorithm consists of maximizing \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\ell^*$$\end{document} ( θ ), once the dummy variables in (4) have been substituted by the corresponding expected values obtained as above and, in this way, the parameters in θ are updated. In particular, for the normal model, based on assumption (1), if the conditional means and variances are unconstrained, these parameters are updated as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \mu_v & = \frac {a_1 ( v )} {a_0 ( v )} , \quad v = 1 , \ldots , k , \\ \sigma_v^2 & = \frac { a_2 ( v ) } { a_0 ( v ) } - \mu_v^2 , \quad v = 1 , {\ldots , k , } \tag { 5 } \end{align*}\end{document}

where, in general, we define \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}a_r ( v ) = \sum_{t = 1}^T \hat{z}_t ( v ) g_r ( y_t ) , \tag{6}\end{align*}\end{document}

However, if the means are constrained to 0 as in the HM-SV model, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\mu_v = 0 , v = 1 , \ldots , k$$\end{document} , then the variances are simply updated as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\sigma_v^2 = \frac { a_2 ( v ) } { a_0 ( v ) } .\end{align*}\end{document}

Moreover, if the means are unconstrained but the variances are constrained to be equal, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\sigma_v^2 = \sigma^2 , v = 1 , \ldots , k$$\end{document} , then the means are updated as in (5) and the common variance is updated as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\sigma^2 = \frac { 1 } { T } \sum_ { v = 1 } ^k \left[ a_2 ( v ) - a_0 ( v ) \mu_v^2 \right] .\end{align*}\end{document}

In the case of the model for categorical responses, which is based on assumption (2), we update the conditional response probabilities as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\phi_ { y \mid v } = \frac { a_y ( v ) } { a_0 ( v ) } , \quad v = 1 , \ldots , k , y = 1 , \ldots , c ,\end{align*}\end{document}

where in this case \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}g_r ( y ) = \begin{cases}1 \quad \quad\quad \hbox{if} \ r = 0 , \\ I ( y = r ) \quad \hbox{if} \ r = 1 , \cdots , c , \end{cases} \tag{8}\end{align*}\end{document}

with I(·) denoting the indicator function equal to 1 if its argument is true and to 0 otherwise. Regarding the other parameters, that is, the initial and transition probabilities, they are updated at the M-step in the following way: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \pi_v & = p ( V_1 = v \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) , \quad v = 1 , \ldots , k , \\ \pi_ { v \mid u } & = \frac { b ( u , v ) } { \sum_ { h = 1 } ^k b ( u , h ) } , \quad u , v = 1 , \ldots , k , \end{align*}\end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}b ( u , v ) = \sum_{t = 2}^T \hat{z}_t ( u , v ) . \tag{9}\end{align*}\end{document}

Even the quantities a_r(v) and b(u, v) involved in the update of the model parameters can be computed only through suitable recursions that are illustrated in the following.

3. Recursions for Estimation and Decoding

In the following, we review the Baum-Welch recursions (Baum et al., 1970; Welch, 2003) and the linear memory recursion proposed by Churbanov and Winters-Hilt (2008) to compute quantities that are used within the EM algorithm. We also briefly review alternative recursions aimed at addressing the problem of reducing the memory complexity of the Baum-Welch recursions; see also Khreich et al. (2010). Finally, we illustrate the Viterbi algorithm, which is the most used algorithm for global decoding, that is, to find the most likely sequence of latent states corresponding to the observed sequence of data.

3.1. Forward-backward recursions

In order to efficiently compute the probability (or the density) of the observed sequence of T observations collected in y , Baum and Welch (Baum et al., 1970; Welch, 2003) proposed the following forward recursion for a first-order HM model: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\alpha_t ( v ) = \sum_{u = 1}^k \alpha_{t - 1} ( u ) \pi_{v \mid u}\ f_t ( y_t \mid v ) , \quad t = 2 , \ldots , T , v = 1 , \ldots , k , \tag{10}\end{align*}\end{document}

where α_t(v) corresponds to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$f ( V_t = v , Y_1 = y_1 , \ldots , Y_t = y_t )$$\end{document} . This recursion is initialized with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\alpha_1 ( v ) = \pi_v\ f_1 ( y_1 \mid v ) , v = 1 , \ldots , k$$\end{document} . In the end, we obtain the manifest probability (or density) function of y as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f ( \textbf{\textit{y}} ) = \sum_{v = 1}^k \alpha_T ( v ) .\end{align*}\end{document}

Moreover, Baum and Welch introduced the backward recursion \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\beta_t ( v ) = \sum_{w = 1}^k \beta_{t + 1} ( w ) \pi_{w \mid v}\ f_{t + 1} ( y_{t + 1} \mid w ) , \quad t = 1 , \ldots , T - 1 , v = 1 , \ldots , k , \tag{11}\end{align*}\end{document}

where, in general, β_t(v) corresponds to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$f ( Y_{t + 1} = y_{t + 1} , \ldots , Y_T = y_T \mid V_t = v )$$\end{document} . The recursion is initialized with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\beta_T ( v ) = 1 , v = 1 , \ldots , k$$\end{document} . Using this recursion, we can obtain the posterior probability of every latent state given the observed data. In particular, we have \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\hat { z } _t ( v ) = \frac { \alpha_t ( v ) \beta_t ( v ) } { f ( \textbf { \textit { y } } ) } , \quad t = 1 , \ldots , T , v = 1 , \ldots , k ,\end{align*}\end{document}

whereas for the posterior probability of every pair of consecutive states we have \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\hat { z } _t ( u , v ) = \frac { \alpha_ { t - 1 } ( u ) \pi_ { v \mid u } \ f_t ( y_t \mid v ) \beta_ { t } ( v ) } { f ( \textbf { \textit { y } } ) } , \quad t = 2 , \ldots , T , u , v = 1 , \ldots , k.\end{align*}\end{document}

The number of operations involved in the above recursions is of order O(Tk²), and then it linearly, rather than exponentially, increases with the number of observations. This is the main advantage of the approach. However, as already mentioned, the Baum-Welch recursions may suffer from numerical instability due to the fact that, as t increases, the probabilities in (10) and (11) become negligible. The problem, which is evident when T is large, may be solved by using suitable renormalizations that are rather simple to implement; see Scott (2002) for a deeper description. A more severe limitation concerns the memory requirement, which is of order O(kT); in fact, we need to store all the forward probabilities α_t(v) for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T$$\end{document} . This memory requirement can become excessive, especially in the presence of very long series of data. We will discuss this point in more detail in the following.

3.2. Linear memory recursion

The linear memory procedure, proposed by Churbanov and Winters-Hilt (2008), is aimed at overcoming the above limitation in terms of memory requirement. It represents an alternative version of the Baum-Welch recursions and an improvement of the forward-only algorithm introduced by Miklós and Meyer (2005). The algorithm is based on backward steps with scaling, so as to only involve calculation of α₁(v) for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document} . Then all calculations are based on the backward probabilities β_t(v) that are properly scaled in order to avoid numerical problems for long time series.

In more detail, the backward recursion is initialized for the last time occasion, t = T, with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\beta_T ( v ) = 1 , v = 1 , \ldots , k$$\end{document} , and the following scaled backward probabilities are computed: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\tilde{ \beta}_T ( v ) = D_T \beta_T ( v ) , \quad v = 1 , \ldots , k ,\end{align*}\end{document}

with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$D_T = 1/ k$$\end{document} . Then, for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T - 1$$\end{document} , it is possible to use the results of the backward recursion as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\tilde{ \beta}_t ( v ) = d_t \bar{ \beta}_t ( v ) , \quad v = 1 , \ldots , k ,\end{align*}\end{document}

Finally, for t = 1 we also compute α₁(v) = π_v f₁(y₁|v) for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document} .

Note that, in order to efficiently estimate the model parameters, a new set of functions is introduced, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tilde{a}_{rt} ( u , v ) , r = 0 , \ldots , R , u , v = 1 , \ldots , k ,$$\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tilde{b}_t ( u , v , w ) , u , v , w = 1 , \ldots , k$$\end{document} , where R = 2 for the model for continuous outcomes [see definition (7)] and R = c for the model with categorical outcomes [see definition (8)]. These functions are computed by a backward recursion that is initialized for the last time occasion with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tilde{a}_{rT} ( u , v ) = \tilde{ \beta}_T ( u ) g_r ( y_T )$$\end{document} when u = v, and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tilde{a}_{rT} ( u , v ) = 0$$\end{document} otherwise, and with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tilde{b}_T ( u , v , w ) = 0$$\end{document} . Then the backward recursion is implemented by computing, for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T - 1$$\end{document} , \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \tilde{a}_{rt} ( u , v ) & = d_t \bar{a}_{rt} ( u , v ) , \quad r = 0 , \ldots , R , u , v = 1 , \ldots , k , \\ \tilde{b}_t ( u , v , w ) & = d_t \bar{b}_t ( u , v , w ) , \quad u , v , w = 1 , \ldots , k , \end{align*}\end{document}

and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\bar{b}_t ( u , v , w ) = \sum_{h = 1}^k \pi_{h \mid w} \tilde{b}_{t + 1} ( u , v , h ) f_{t + 1} ( y_{t + 1} \mid h ) + \tilde{ \beta}_{t + 1} ( v ) \pi_{w \mid v}\ f_{t + 1} ( y_{t + 1} \mid v ) I ( u = w ) .\end{align*}\end{document}

When t = 1, we compute the following quantities: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\tilde{a}_r ( u ) = \sum_{v = 1}^k \tilde{a}_{r1} ( u , v ) \pi_v \ f_1 ( y_1 \mid v ) , \quad r = 0 , \ldots , R , u = 1 , \ldots , k ,\end{align*}\end{document}

and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\tilde{b} ( u , v ) = \sum_{w = 1}^k \tilde{b}_1 ( u , v , w ) \pi_w \ f_1 ( y_1 \mid w ) , \quad u , v = 1 \ldots , k ,\end{align*}\end{document}

which can be used to update the initial and transition probabilities according to the following formulas: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \pi_v & = \frac { \alpha_1 ( v ) \tilde { \beta } _1 ( v ) } { \sum_ { u = 1 } ^k \alpha_1 ( u ) \tilde { \beta } _1 ( u ) } , \quad v = 1 , \ldots , k , \\ \pi_ { v \mid u } & = \frac { \tilde { b } ( u , v ) } { \sum_ { h = 1 } ^k \tilde { b } ( u , h ) } , \quad u , v = 1 , \ldots , k; \end{align*}\end{document}

see Churbanov and Winters-Hilt (2008) for details. Moreover, for the normal HM model based on assumption (1), the conditional means and variances are updated as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \mu_v & = \frac { \tilde { a } _1 ( v ) } { \tilde { a } _0 ( v ) } , \quad v = 1 , \ldots , k , \\ \sigma_v^2 & = \frac { \tilde { a } _2 ( v ) } { \tilde { a } _0 ( v ) } - \mu_v^2 , \quad v = 1 , \ldots , k. \end{align*}\end{document}

For the model for categorical responses, which is based on assumption (2), we update the conditional response probabilities as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\phi_ { y \mid v } = \frac { \tilde { a } _y ( v ) } { \tilde { a } _0 ( v ) } , \quad v = 1 , \ldots , k , y = 1 , \ldots , c.\end{align*}\end{document}

Note that the above expressions are similar to those introduced in Section 2.2.

For the linear memory recursion, the number of operations is of order O(k⁴T), whereas the memory requirement is of order O(k³).

3.3. Alternative recursions

In a recent article, Khreich et al. (2010) proposed an algorithm for maximum likelihood estimation of HM models, named Efficient Forward Filtering Backward Smoothing (EFFBS) algorithm, which may be seen as an improved version of the Forward Filtering Backward Smoothing (FFBS) algorithm proposed by Ott (1967), Raviv (1967), Lindgren (1978), and Askar and Derin (1981). The FFBS algorithm represents a numerical stable alternative of the Baum-Welch algorithm, which, however, involves a number of operations and a memory requirement of the same order as the latter. This algorithm is based on a forward recursion to compute the so-called predictive and filtered probabilities, which are defined as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\gamma_{t \mid t - 1} ( v ) = p ( V_t = v \mid Y_1 = y_1 , \ldots , Y_{t - 1} = y_{t - 1} )$$\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\gamma_{t \mid t} ( v ) = p ( V_t = v \mid Y_1 = y_1 , \ldots , Y_t = y_t )$$\end{document} , respectively, with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T$$\end{document} .

The forward recursion is initialized with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\gamma_{1 \mid 0} ( v ) = \pi_v , v = 1 , \ldots , k$$\end{document} , and then, for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T - 1$$\end{document} , it computes \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\gamma_ { t \mid t } ( v ) = \frac { \gamma_ { t \mid t - 1 } ( v ) f_t ( y_t \mid v ) } { \sum_ { u = 1 } ^k \gamma_ { t \mid t - 1 } ( u ) f_t ( y_t \mid u ) } , \quad v = 1 , \ldots , k , \tag { 12 } \end{align*}\end{document} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\gamma_{t + 1 \mid t} ( v ) = \sum_{u = 1}^k \gamma_{t \mid t} ( u ) \pi_{v \mid u} , \quad v = 1 , \ldots , k. \tag{13}\end{align*}\end{document}

In the end, we obtain the posterior probabilities of the last latent state as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\hat { z } _T ( v ) = \gamma_ { T \mid T } ( v ) = \frac { \gamma_ { T \mid T - 1 } ( v ) f_T ( y_T \mid v ) } { \sum_ { u = 1 } ^k \gamma_ { T \mid T - 1 } ( u ) f_T ( y_T \mid u ) } , \quad v = 1 , \ldots , k.\end{align*}\end{document}

The backward recursion consists of computing for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T - 1$$\end{document} (in reverse order) the remaining posterior probabilities as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\hat { z } _ { t + 1 } ( u , v ) = \frac { \gamma_ { t \mid t } ( u ) \pi_ { v \mid u } \gamma_ { t + 1 \mid T } ( v ) } { \sum_ { h = 1 } ^k \gamma_ { t \mid t } ( h ) \pi_ { v \mid h } } , \quad u , v = 1 , \ldots , k , \tag { 14 } \end{align*}\end{document} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\hat{z}_t ( u ) = \sum_{v = 1}^k \hat{z}_{t + 1} ( u , v ) , \quad u = 1 , \ldots , k. \tag{15}\end{align*}\end{document}

Supposing that T is larger than k, as normally happens, the number of operations required by the algorithm is of order O(k²T), whereas the required amount of memory is of order O(kT), because we need to store all the filtered probabilities from the forward recursion.

The improved version of the FFBS algorithm proposed by Khreich et al. (2010) aims at computing the exact posterior probabilities of the latent states given the observed data with a memory complexity that is independent of T, and without increasing the time complexity. The key point of the EFFBS algorithm is that equation (12) may be inverted as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\gamma_ { t \mid t - 1 } ( v ) = \frac { \gamma_ { t \mid t } ( v ) / f_t ( y_t \mid v ) } { \sum_ { u = 1 } ^k \gamma_ { t \mid t } ( u ) / f_t ( y_t \mid u ) } , \quad v = 1 , \ldots , k. \tag { 16 } \end{align*}\end{document}

In a similar way, letting γ _t|t−1 be the column vector with elements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\gamma_{t \mid t - 1} ( v ) , v = 1 , \ldots , k$$\end{document} , with γ _t|t being defined accordingly, we have that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}{ \bf \gamma}_{t \mid t} = ( { \bf \Pi}^{ \prime} ) ^{ - 1}{ \bf \gamma}_{t + 1 \mid t} , \tag{17}\end{align*}\end{document}

where Π is the transition probability matrix with elements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\pi_{v \mid u} , u , v = 1 , \ldots , k$$\end{document} , with the index u running by row and v by column. This is because equation (13) may be simply expressed in matrix notation as γ _t+1|t = Π′ γ _t|t.

Based on the above results, the EFFBS algorithm relies on the same forward recursion of the FFBS algorithm, but it only requires storing of the last predictive and posterior probabilities, that is, γ_T_|T−1(v) and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\gamma_{T \mid T} ( v ) , v = 1 , \ldots , k$$\end{document} . Then, once the inverse of Π has been computed, the backward step consists of computing, for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T - 1$$\end{document} (in reverse order), the probabilities γ_t_|t(v) by (17) and the probabilities γ_t_|t-1(v) by (16). Then we obtain the target posterior probabilities \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\hat{z}_t ( u , v )$$\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\hat{z}_t ( u )$$\end{document} at each step t through equations (14) and (15), respectively. As the EFFBS needs only to store one set of predictive and filtering probabilities at the end of the forward recursion, then the amount of memory required by this algorithm strongly decreases and is of order O(k²), while having the same complexity in terms of number of operations as the FFBS algorithm, that is, O(k²T).

Although this alternative version seems promising and the relationships used in the implementation of the recursion are proved from a mathematical point of view, we found numerical problems, arising when the algorithm is performed. In particular, at a certain point of the backward recursion, some filtering probabilities become negative and, obviously, this result is not correct. We also note that the difference with the posterior probabilities obtained by the FFBS algorithm increases during the backward recursion, leading to the conclusion that there is an accumulation of numerical errors that also affects the posterior probabilities. These errors are likely due to the way in which the filtering probabilities are obtained, through the inverse of the transposed transition probability matrix, which normally contains negative elements. For a detailed description see Bartolucci and Pandolfi (2013). This problem systematically occurs with a different set of parameters, length of the series, and number of latent states.

Finally, it is also worth mentioning that another alternative to the Baum-Welch algorithm is represented by the checkpoint algorithm (Grice et al., 1997; Tarnas and Hughey, 1998; Wheeler and Hughey, 2000), which stores only some reference columns or checkpoints, instead of storing the whole matrix of forward probabilities. Then, in the backward step, the remaining forward values are sequentially recomputed, beginning with their corresponding checkpoints, so as to update the parameter estimates. As mentioned above, the implementation of this algorithm reduces the memory complexity, but it still depends on the length of the time series, being of order \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O ( k \sqrt{T} )$$\end{document} . However, this algorithm is outperformed by the linear memory algorithm described above and then is not illustrated in detail.

3.4. Viterbi algorithm

In many applications, such as in speech recognition, it is usually of interest to reconstruct the sequence of the latent states given the observed sequence of data. This problem is known as global decoding and consists of selecting the sequence of states of the hidden Markov chain that is the most likely (under the fitted model) to have given rise to the observed sequence; this sequence of latent states is denoted by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\textbf{\textit{v}}^* = ( v_1^* , \ldots , v_T^* )$$\end{document} . This problem may be solved by using an efficient algorithm known as the Viterbi algorithm (Viterbi, 1967; Juang and Rabiner, 1991).

Let ψ₁(v) = f (V₁ = v, Y₁ = y₁) and, for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 2 , \ldots , T$$\end{document} , let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\psi_t ( v ) = \max_{v_1 , \ldots , v_{t - 1}}f ( V_1 = v_1 , \ldots , V_{t - 1} = v_{t - 1} , V_t = v , Y_1 = y_1 , \ldots , Y_t = y_t )$$\end{document} . The Viterbi algorithm performs a forward recursion to compute the above quantities. Moreover, a backward recursion is used for path prediction, in order to find the most likely latent sequence.

In particular, the forward recursion consists of computing \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\psi_t ( v ) = f_t ( y_t \mid v ) \max_u [ \psi_{t - 1} ( u ) \pi_{v \mid u} ] , \quad t = 2 , \ldots , T , v = 1 , \ldots , k.\end{align*}\end{document}

This recursion is initialized, for t = 1, with ψ₁(v) = π_v f₁(y₁|v). At the end of the forward recursion, it is possible to obtain the optimal state \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}v^*_T = {\rm argmax}_{u}\, \psi_T ( u ) .\end{align*}\end{document}

Finally, the predicted sequence of latent states can be determined by a backward recursion as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}v^*_t = {\rm argmax}_{u}\,[ \psi_t ( u ) \pi_{v^*_{t + 1} \mid u} ], \quad t = 1 , \ldots , T - 1.\end{align*}\end{document}

The Viterbi algorithm has a time complexity of order O(k²T) and a memory requirement of order O(kT). Churbanov and Winters-Hilt (2008) presented a modification of this algorithm in which the memory requirement is smaller, without affecting the execution time. This alternative version requires an amount of memory of order O(T); see Churbanov and Winters-Hilt (2008) for details.

4. Proposed Algorithm

Developing a result obtained by Bartolucci and Besag (2002) for Markov random fields, in this section we show a backward recursion to directly compute the posterior probabilities \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_t ( u , v ) = p ( V_t = v \mid V_{t - 1} = u , \textbf{\textit{Y}} = \textbf{\textit{y}} ) , \quad t = 2 , \ldots , T , u , v = 1 , \ldots , k , \tag{18}\end{align*}\end{document}

and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_1 ( v ) = p ( V_1 = v \mid \textbf{\textit{Y}} = \textbf{\textit{y}} ) , \quad v = 1 , \ldots , k. \tag{19}\end{align*}\end{document}

The first is the conditional probability of a certain realization of V_t, given V_t₋₁ and the observed response configuration y , whereas the second is the probability of a certain realization of V₁ given y . Note that, to simplify the notation, the argument y is removed from δ_t(u,v) as similarly done for the recursions in Section 3.1.

That the aforementioned result may be applied to deal with HM models was already noted by Bartolucci and Besag (2002), who showed it in detail for the case of first-order and second-order HM models. However, in this article we use this result in a different way, so as to obtain an algorithm whose memory requirement does not depend on T. In the following, this recursion is described in detail together with its use for maximum likelihood estimation. We also show that, from the results of this recursion, it is possible to directly obtain the predicted sequence of latent states, without requiring the implementation of the Viterbi algorithm for global decoding.

4.1. Proposed backward recursion

For the last time occasion, that is, when t = T, the probability in (18) may be simply computed as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_T ( u , v ) = \frac { f_T ( y_T \mid v ) \pi_ { v \mid u } } { \sum_ { h = 1 } ^kf_T ( y_T \mid h ) \pi_ { h \mid u } } , \tag { 20 } \end{align*}\end{document}

whereas, for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 2 , \ldots , T - 1$$\end{document} , Theorem 1 of Bartolucci and Besag (2002) implies that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_t ( u , v ) = \left[ \sum_ { w = 1 } ^k \frac { \delta_ { t + 1 } ( v , w ) } { p ( V_t = v \mid V_ { t - 1 } = u , V_ { t + 1 } = w , \textbf { \textit { Y } } = \textbf { \textit { y } } ) } \right]^ { - 1 } . \tag { 21 } \end{align*}\end{document}

In order to prove this result, consider that the assumption that the HM process is of first order implies that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_{t + 1} ( v , w ) = p ( V_{t + 1} = w \mid V_t = v , \textbf{\textit{Y}} = \textbf{\textit{y}} ) = p ( V_{t + 1} = w \mid V_{t - 1} = u , V_t = v , \textbf{\textit{Y}} = \textbf{\textit{y}} )\end{align*}\end{document}

and then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \frac { \delta_ { t + 1 } ( v , w ) } { p ( V_t = v \mid V_ { t - 1 } = u , V_ { t + 1 } = w , \textbf { \textit { Y } } = \textbf { \textit { y } } ) } = \frac { p ( V_ { t - 1 } = u , V_ { t + 1 } = w \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) } { p ( V_ { t - 1 } = u , V_t = v \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) } .\end{align*}\end{document}

Consequently, the sum in (21) is equal to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \frac { p ( V_ { t - 1 } = u \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) } { p ( V_ { t - 1 } = u , V_t = v \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) } \end{align*}\end{document}

and the result holds. Also note that the conditional probability in (21) is obtained as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}p ( V_t = v \mid V_ { t - 1 } = u , V_ { t + 1 } = w , \textbf { \textit { Y } } = \textbf { \textit { y } } ) = \frac { \pi_ { v \mid u } \pi_ { w \mid v } \ f_t ( y_t \mid v ) } { \sum_ { h = 1 } ^k \pi_ { h \mid u } \pi_ { w \mid h } \ f_t ( y_t \mid h ) } . \tag { 22 } \end{align*}\end{document}

Finally, for t = 1, the recursion is completed by computing \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_1 ( v ) = \left[ \sum_ { w = 1 } ^k \frac { \delta_2 ( v , w ) } { p ( V_1 = v \mid V_2 = w , \textbf { \textit { Y } } = \textbf { \textit { y } } ) } \right] ^ { - 1 } , \tag { 23 } \end{align*}\end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}p ( V_1 = v \mid V_2 = w , \textbf { \textit { Y } } = \textbf { \textit { y } } ) = \frac { \pi_ { v } \pi_ { w \mid v } \ f_1 ( y_1 \mid v ) } { \sum_ { h = 1 } ^k \pi_h \pi_ { w \mid h } \ f_1 ( y_1 \mid h ) } .\end{align*}\end{document}

This rule is again based on Theorem 1 in Bartolucci and Besag (2002).

4.2. Use for maximum likelihood estimation

First of all, for any sequence of latent states \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_1 , \ldots , v_T$$\end{document} collected in v , we have that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f ( \textbf { \textit { y } } ) = \frac { \pi_ { v_1 } \ f_1 ( y_1 \mid v_1 ) \prod_ { t = 2 } ^T \pi_ { v_t \mid v_ { t - 1 } } \ f_t ( y_t \mid v_t ) } { p ( V_1 = v_1 \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) \prod_ { t = 2 } ^T \delta_t ( v_ { t - 1 } , v_t ) } ,\end{align*}\end{document}

where the numerator corresponds to the joint distribution of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$V_1 , \ldots , V_T$$\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$Y_1 , \ldots , Y_T$$\end{document} and the denominator corresponds to the conditional distribution of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$V_1 , \ldots , V_T$$\end{document} given \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$Y_1 , \ldots , Y_T$$\end{document} . Consequently, given an arbitrary sequence v , we compute the model log-likelihood as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\ell ( { \bf \theta } ) = \log \frac { \pi_ { v_1 } \ f_1 ( y_1 \mid v_1 ) } { \delta_1 ( v_1 ) } + \sum_ { t = 2 } ^T \log \frac { \pi_ { v_t \mid v_ { t - 1 } } \ f_t ( y_t \mid v_t ) } { \delta_t ( v_ { t - 1 } , v_t ) } , \tag { 24 } \end{align*}\end{document}

on the basis of the proposed recursion. In this way, we do not need to use any renormalization that is instead necessary for the Baum-Welch recursions. It is also worth noting that, although the sequence of latent states v is arbitrary, its choice may be driven by simplicity and numerical stability. For instance, we can take the sequence \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_t = v^* , t = 1 , \ldots , T$$\end{document} , where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v^*$$\end{document} is the latent state corresponding to the highest persistence probability, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} v^* = {{\rm argmax}\atop {v}} \quad \; \ \pi_{v \mid v}.\end{align*}\end{document}

We now show how the quantities required to perform the M-step of the EM algorithm can be obtained from the recursion developed in Section 4.1, using an amount of memory that does not depend on T. In particular, we recall that in order to update the parameters, the EM algorithm requires at every step the quantities a_r(v) defined in (6), for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$r = 0 , \ldots , R$$\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document} , where R = 2 for the normal model for continuous outcomes [see definition (7)] and R = c for the model for categorical outcomes [see definition (8)]. Maximum likelihood estimation also requires the quantities \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$b ( u , v ) , u , v = 1 , \ldots , k$$\end{document} , which are defined in (9).

In order to efficiently compute a_r(v), we introduce the functions \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}a_{rt} ( u , v ) = \sum_{s = t}^Tp ( V_s = v \mid V_{t - 1} = u , \textbf{\textit{Y}} = \textbf{\textit{y}} ) g_r(y_t), \quad r = 0 , \ldots , R , u , v = 1 , \ldots , k ,\end{align*}\end{document}

so that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}a_r ( v ) = \sum_{u = 1}^k \delta_1 ( u ) a_{r2} ( u , v ) + \delta_1 ( v ) g_r(y_1). \tag{25}\end{align*}\end{document}

For t = T, we have that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}a_{rT} ( u , v ) = \delta_T ( u , v ) g_r ( y_T ) , \quad u , v = 1 , \ldots , k. \tag{26}\end{align*}\end{document}

Moreover, for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 2 , \ldots , T - 1$$\end{document} , this function may be obtained as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}a_{rt} ( u , v ) = \sum_{h = 1}^ka_{r , t + 1} ( h , v ) \delta_t ( u , h ) + \delta_t ( u , v ) g_r ( y_t ) , \quad u , v = 1 , \ldots , k ,\tag{27}\end{align*}\end{document}

with δ_t(u, v) provided by the backward recursion in Section 4.1. Similarly, in order to efficiently compute b(u, v), we introduce the function \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}b_t ( u , v , w ) = \sum_{s = t}^{T - 1}p ( V_s = v , V_{s + 1} = w \mid V_{t - 1} = u , \textbf{\textit{Y}} = \textbf{\textit{y}} ) , \quad u , v , w = 1 , \ldots , k , \tag{28}\end{align*}\end{document}

so that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}b ( u , v ) = \sum_{h = 1}^k \delta_1 ( h ) b_2 ( h , u , v ) + \delta_1 ( u ) \delta_2 ( u , v ) . \tag{29}\end{align*}\end{document}

Even in this case, we use the results from the backward recursion for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 2 , \ldots , T - 1$$\end{document} as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}b_t ( u , v , w ) = \sum_{h = 1}^kb_{t + 1} ( h , v , w ) \delta_t ( u , h ) + \delta_t ( u , v ) \delta_{t + 1} ( v , w ) , \quad u , v , w = 1 , \ldots , k ,\end{align*}\end{document}

It is important to stress that all the above quantities may be computed by using only the results from every backward step of the proposed recursion, without the need to store the results of all steps, as happens for the Baum-Welch recursions.

4.3. Implementation of the algorithm

In this section, we clarify some aspects about the implementation of the algorithm and provide an efficient pseudo-code, which is reported in Table 1. A Fortran implementation of this code is available to the reader upon request.

Table 1.

Pseudo-Code for the Backward Algorithm

1	Initialization
2	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$u , v = 1 , \ldots , k$$\end{document}
3	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\delta_T^* ( u , v ) = \pi_{v\mid u}\ f_T ( y_T \mid v )$$\end{document}
4	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$u = 1 , \ldots , k$$\end{document}
5	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$d_T ( u ) = \sum_{v = 1}^k \delta_T^* ( u , v )$$\end{document}
6	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document}
7	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\delta_T ( u , t ) = \delta_T^* ( u , v ) / d_T ( u )$$\end{document}
8	ℓ = log d_T ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$u^*$$\end{document} )
9	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$u , v = 1 , \ldots , k$$\end{document}
10	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$r = 0 , \ldots , R$$\end{document}
11	a_rT (u, v) = δ_T (u, v)g_r (y_T)
12	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document}
13	b_T (u, v, w) = 0
14	Recurrence
15	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = T - 1 , \ldots , 2$$\end{document}
16	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$w = 1 , \ldots , k$$\end{document}
17	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$u = 1 , \ldots , k$$\end{document}
18	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$q_{1 , t + 1} ( u , w ) = \sum_{h = 1}^h \pi_{h \mid u} \pi_{w \mid h}\ f_t ( y_t \mid h )$$\end{document}
19	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document}
20	q_2, _t+1(v, w) = δ_t₊₁(v, w) / π_w_\|v
21	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$u , v = 1 , \ldots , k$$\end{document}
22	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\delta_t ( u , v ) = \pi_{v \mid u}\ f_t ( y_t \mid v ) \left/ \left[ \sum_{w = 1}^kq_{1 , t + 1} ( u , w ) q_{2 , t + 1} ( v , w ) \right] \right.$$\end{document}
23	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\ell = \ell + \log [ \pi_{v^* \mid v^}f_t ( y_t \mid v^ ) / \delta_t ( v^* , v^* ) ]$$\end{document}
24	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$u , v = 1 , \ldots , k$$\end{document}
25	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$r = 0 , \ldots , R$$\end{document}
26	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$a_{rt} ( u , v ) = \sum_{h = 1}^ka_{r , t + 1} ( h , v ) \delta_t ( u , h ) + \delta_t ( u , v ) g_r ( y_t )$$\end{document}
27	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$w = 1 , \ldots , k$$\end{document}
28	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$b_t ( u , v , w ) = \sum_{h = 1}^k b_{t + 1} ( h , v , w ) \delta_t ( u , h ) + \delta_t ( u , v ) \delta_{t + 1} ( v , w )$$\end{document}
29	Termination
30	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$w = 1 , \ldots , k$$\end{document}
31	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$q_{12} ( w ) = \sum_{h = 1}^k \pi_h \pi_{w \mid h}\ f_1 ( y_1 \mid h )$$\end{document}
32	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document}
33	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$w = 1 , \ldots , k$$\end{document}
34	q₂₂(v, w) = δ₂(v, w) / π_w_\|v
35	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\delta_1 ( v ) = \pi_v\ f_1 ( y_1 \mid v ) \left/ \left[ \sum_{w = 1}^k q_{12} ( w ) q_{22} ( v , w ) \right] \right.$$\end{document}
36	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\ell = \ell + \log [ \pi_{v^}f_1 ( y_1 \mid v^ ) / \delta_1 ( v^* ) ]$$\end{document}
37	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document}
38	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$r = 0 , \ldots , R$$\end{document}
39	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$a_r ( v ) = \sum_{h = 1}^k \delta_1 ( h ) a_{r2} ( h , v ) + \delta_1 ( v ) g_r ( y_1 )$$\end{document}
40	for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document}
41	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$b ( u , v ) = \sum_{h = 1}^k \delta_1 ( h ) b_2 ( h , u , v ) + \delta_1 ( u ) \delta_2 ( u , v )$$\end{document}

First of all, the algorithm is initialized by computing \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\delta_T ( u , v ) , u , v = 1 , \ldots , k$$\end{document} , through equation (20). Moreover, according to (24), the corresponding component of the log-likelihood is computed as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\log \frac { \pi_ { v_T \mid v_ { T - 1 } } \ f_T ( y_T \mid v_T ) } { \delta_T ( v_ { T - 1 } , v_T ) } ,\end{align*}\end{document}

with v_T₋₁ and v_T suitably chosen, and the functions a_rT (u, v) and b_T (u, v, w) are initialized as in (26) and (30), respectively.

Then, for t from T − 1 to 2, the algorithm computes δ_t(u, v) by the backward step in (21), and then it updates a_rt(u, v) and b_t(u, v, w) by (27) and (28), respectively. Note that in order to efficiently compute δ_t(u, v), considering (22) we can reformulate equation (21) as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_t ( u , v ) = \pi_ { v \mid u } \ f_t ( y_t \mid v ) \left[ \sum_ { w = 1 } ^k \frac { \delta_ { t + 1 } ( v , w ) } { \pi_ { w \mid v } } \sum_ { h = 1 } ^k \pi_ { h \mid u } \pi_ { w \mid h } \ f_t ( y_t \mid h ) \right] ^ { - 1 } ,\tag { 31 } \end{align*}\end{document}

and then it is first convenient to compute δ_t₊₁(u, v)/π_v_|u for all \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$u , v = 1 , \ldots , k$$\end{document} and then plug the result in (31). At every cycle, we also update the value of the log-likelihood by adding \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\log \frac { \pi_ { v_t \mid v_ { t - 1 } } \ f _t( y_t \mid v_t ) } { \delta_t ( v_ { t - 1 } , v_t ) } ,\end{align*}\end{document}

with v_t₋₁ and v_t suitably chosen.

In the end, the algorithm provides δ₁(v) by (23), which is efficiently computed as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_1(v) = \pi_v\ f_1 ( y_1 \mid v ) \left[ \sum_ { w = 1 } ^k \frac { \delta_2 ( v , w ) } { \pi_ { w \mid v } } \sum_ { h = 1 } ^k \pi_h \pi_ { h \mid w } \ f_1 ( y_1 \mid h ) \right] ^ { - 1 } ,\end{align*}\end{document}

and then we compute a_r(v) and b(u, v) by (25) and (29), respectively. Moreover, the final value of the log-likelihood is obtained by adding to the previous value the quantity: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\log \frac { \pi_ { v_1 } \ f_1 ( y_1 \mid v_1 ) } { \delta_1 ( v_1 ) } .\end{align*}\end{document}

A crucial point is related to the numerical complexity of the recursion proposed in this section, especially in comparison with the Baum-Welch recursions. It may be easily shown that the number of operations of the proposed recursion is of order O(k⁴T). Therefore, as for the Baum-Welch recursions, the number of operations of our recursion linearly increases with the length of the series of data (T); however, this number increases faster with the number of latent states (k). On the other hand, the memory requirement is of order O(k³) because, as mentioned above, we do not need to store the result of the proposed recursion at every step. This order is comparable to that required by the approach of Churbanov and Winters-Hilt (2008). We also have to recall that our recursion does not require dummy renormalizations, but this is a minor advantage because implementing these renormalizations to solve the numerical instability of the Baum-Welch recursions is not difficult.

4.4. Global decoding

As illustrated in Section 3.4, the most used algorithm to perform global decoding is represented by the Viterbi algorithm, which, however, requires a specific implementation distinct from that of the estimation algorithm. In this section, we illustrate how to directly obtain the most likely sequence of latent states on the basis of the posterior probabilities in (18) and (19), as an additional result coming from the proposed algorithm.

First of all, consider the following quantity: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\eta ( \textbf{\textit{v}} ) = \log \delta_1 ( v_1 ) + \sum_{s = 2}^T \log \delta_s ( v_{s - 1} , v_s ) ,\end{align*}\end{document}

which corresponds to the posterior probability of a certain sequence of latent states v . Its maximum, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\lambda = \max \limits_\textbf{\textit{v}} \eta ( \textbf{\textit{v}} ) ,\end{align*}\end{document}

corresponds to the predicted sequence of latent states \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\textbf{\textit{v}}^*$$\end{document} according to the global decoding method.

In order to efficiently obtain the above prediction on the basis of the results of the proposed algorithm, we define \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\eta_t ( u , v_t , \ldots , v_T ) = \begin{cases} \log \delta_T ( u , v_T ) , \qquad \qquad \qquad \qquad \qquad \, t = T , \\\\ \log \delta_t ( u , v_t ) + \sum_{s = t + 1}^T \log \delta_s ( v_{s - 1} , v_s ) , \ t = 2 , \ldots , T-1 , \end{cases}\end{align*}\end{document}

and the corresponding maximum \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\lambda_t ( u ) = \max \limits_{v_t , \ldots , v_T} \eta_t ( u , v_t , \ldots , v_T ) , \quad u = 1 , \ldots , k , t = 2 , \ldots , T.\end{align*}\end{document}

First of all, it is worth noting that λ_t(u) may be obtained by the backward recursion \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\lambda_t ( u ) = \max \limits_h \left[ \lambda_{t + 1} ( h ) + \log \delta_t ( u , h ) \right] , \quad t = 2 , \ldots , T - 1 , u = 1 , \ldots , k ,\end{align*}\end{document}

which is initialized with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\lambda_T ( u ) = \max \limits_{v_T} \log \delta_T ( u , v_T ) , \quad u = 1 , \ldots , k.\end{align*}\end{document}

Then, we obtain the maximum of the posterior probability as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\lambda = \max \limits_u [ \log \delta_1 ( u ) + \lambda_2 ( u ) ] .\end{align*}\end{document}

In order to obtain the sequence of states corresponding to λ, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\textbf{\textit{v}}^*$$\end{document} , we introduce the indicator \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_{t - 1 , t + s}^* ( u ) , t = 2 , \ldots , T , s = 0 , \ldots , T - t$$\end{document} , for the latent state at time t + s of the most likely sequence if, at time t − 1, the chain was in the state u; in symbols, we have \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}( v_{t - 1 , t}^* ( u ) , \ldots , v_{t - 1 , T}^* ( u ) ) = {{\rm argmax}\atop_{v_t , \ldots, v_T}}\, \eta_t ( u , v_t , \ldots , v_T ) , \quad t = 1 , \ldots , T - 1.\end{align*}\end{document}

Even this sequence may be obtained by the following backward recursion: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}v_{t - 1 , t}^* ( u ) & = {\rm argmax}_{h}\, \left[ \lambda_{t + 1} ( h ) + \log \delta_t ( u , h ) \right] , \quad u = 1 , \ldots , k , \\ v^*_{t - 1 , t + s} ( u ) & = v^*_{t , t + s} ( v^*_{t - 1 , t} ( u ) ) , \quad s = 1 , \ldots , T - t , u = 1 , \ldots , k , & ( 32 ) \end{align*}\end{document}

which is performed for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 2 , \ldots , T - 1$$\end{document} and is initialized by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}v^*_{T - 1 , T} ( u ) = {\rm argmax}_{v_T}\, \log \delta_T ( u , v_T ) , \quad u = 1 , \ldots , k.\end{align*}\end{document}

In the end, the predicted sequence of latent states has elements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}v^*_1 & = {\rm argmax}_{u}\, [ \log \delta_1 ( u ) + \lambda_2 ( u ) ] , \\ v^*_t & = v^*_{1 , t} ( v^*_1 ) , \quad t = 2 , \ldots , T.\end{align*}\end{document}

As already mentioned, the proposed method for global decoding only needs the results of the proposed backward recursion for maximum likelihood estimation, without requiring the implementation of the Viterbi algorithm, which involves both forward and backward steps. Then, after convergence of the EM algorithm, we can use this decoding method, which may be implemented in the same code of the backward recursion used for estimation. Moreover, in order to efficiently implement this method, consider that we do not need to store all the predicted states \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_{t - 1 , t + s}^* ( u )$$\end{document} because \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_{t - 1 , t + s}^* ( u )$$\end{document} is constant in u for s larger than a certain threshold, denoted by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s_t^*$$\end{document} , which is typically equal to a small value; in particular, in our simulations we noted that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s_t^*$$\end{document} is always smaller than 10. In fact, the dependence of the predicted state on a previous latent state rapidly decreases with their distance. Then operation (32) must be performed only for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s = 1 , \ldots , s^*_t - 1$$\end{document} , where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s_t^*$$\end{document} is found as the first value of s such that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_{t - 1 , t + s}^* ( u )$$\end{document} is constant in u. Also note that, for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s = s_t^* , \ldots , T - t , v_{t - 1 , t + s}^* ( u )$$\end{document} directly corresponds to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v^*_{t + s}$$\end{document} for any u, and then global decoding may be partially performed at any step and certain \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_{t - 1 , t + s}^* ( u )$$\end{document} may then be discarded from the memory. This implies that the additional memory requirement of the proposed algorithm for global decoding, with respect to the algorithm illustrated in the previous section, is of order \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O(ks^*)$$\end{document} , where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s^* = \max_t s^*_t$$\end{document} ; in this computation, we do not consider the memory to store \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\textbf{\textit{v}}^*$$\end{document} , which is in any case of order O(T), the same order of the memory requirement to store the observed sequence of data y . On the other hand, the additional requirement in terms of number of operations is of order O(k²T); however, if the global decoding is performed separately from the estimation algorithm, the time complexity is of order O(k³T).

5. Simulation Study

In this section, we illustrate the results of a simulation study in which we compare the performance of the proposed algorithm for maximum likelihood estimation of HM models with two alternative estimation algorithms. In particular, we set up a comparison, in terms of memory requirement and computing time, with the Baum-Welch algorithm (Baum et al., 1970; Welch, 2003) and the linear memory algorithm of Churbanov and Winters-Hilt (2008). As already outlined, the latter represents the most efficient alternative of the Baum-Welch algorithm. Moreover, we also set up a comparison, again in terms of memory requirement and computing time, between the original Viterbi algorithm and the proposed method for global decoding. All the algorithms have been implemented by means of a series of R functions with the recursions implemented in Fortran (gfortran compiler) and were run on an iMac with Intel Core i7 processor at 2.8 GHz with 8 gigabytes of RAM.

In more detail, we simulated 100 samples from the homoscedastic HM model described in Section 2.1 for different values of T (length of the time series), k (number of latent states), and σ² (conditional variance of the response variables that is common to all states), and for different structures of the transition matrix. In particular, the assumed transition matrix has elements parameterized through an index ρ as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\pi_ { v \mid u } = \frac { \rho^ { \mid v - u \mid } } { \sum_ { h = 1 } ^k \rho^ { \mid h - u \mid } } , \quad u , v = 1 , \ldots , k ,\end{align*}\end{document}

so that high values of ρ correspond to a low probability of persistence of the series and vice-versa. Also note that σ² allows us to control the degree of separation between the components: an increase in the value of the variance corresponds to a lower degree of separation. This is because we assumed a sequence of k equally-spaced values for μ_v, from −(k − 1) to k − 1.

The values of T, k, σ², and ρ adopted in the simulation are chosen as follows:

• T = 1,000, 10,000, 100,000;

• k = 3, 5, 10;

• σ² = 0.1, 0.2, 0.5;

• ρ = 0.1, 0.2, 0.5.

In this way, we considered 81 different scenarios. Moreover, we also assumed uniform initial probabilities, that is, π_v = 1/k.

For each of the 100 samples drawn under each scenario described above, we applied the EM algorithm to estimate the model parameters and recorded the memory demand, in kilobytes, and the computing time, in seconds, using the three recursions under comparison. We also recorded the computing time and the amount of memory required for global decoding by the proposed algorithm, together with those required by the Viterbi algorithm. The adopted criterion for stopping the EM algorithm was based on the relative difference in terms of log-likelihood between two consecutive steps. In particular, the EM algorithm is stopped when \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \frac { \mid \ell ( { \bf \theta } ^ { ( m ) } ) - \ell ( { \bf \theta } ^ { ( m - 1 ) } ) \mid } { \mid \ell ( { \bf \theta } ^ { ( m - 1 ) } ) \mid } \leq 10^ { - 10 } ,\end{align*}\end{document}

with m denoting the m-th step of the EM algorithm and θ^(m) being the parameter value at the end of this step. Furthermore, in order to avoid numerical instability, the initial and transition probabilities are raised to the minimum value of 10⁻¹⁰⁰ during the estimation algorithm; the same criteria are applied for the conditional normal densities. Finally, for a meaningful comparison between the algorithms, we did not take into account the amount of memory required to store the observed time series. When comparing the memory complexity of the algorithms for global decoding, we did not consider the amount of memory required to store the predicted sequence of latent states. Finally, in the proposed global decoding, we set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s^*$$\end{document} = 10 as the maximum threshold for memory requirement; we verified that this threshold has never been exceeded in all the scenarios under consideration.

In Table 2, we report the amount of memory (in kilobytes) required by every recursion; this amount is computed on the basis of the results from the 100 samples simulated under each scenario. Note that we report the memory requirement for every combination of T and k, and not also for every combination of ρ and σ², because this memory requirement is not affected by the latter two parameters.

Table 2.

Memory Requirement in 100 Simulated Samples

	T = 1,000			T = 10,000			T = 100,000
Algorithm	k = 3	k = 5	k = 10	k = 3	k = 5	k = 10	k = 3	k = 5	k = 10
BW	26.7	42.6	83.3	237.6	394.1	786.4	2,347.0	3,909.8	7,817.6
lin	4.8	5.2	7.8	4.8	5.2	7.8	4.8	5.2	7.8
BB	4.8	5.2	7.7	4.8	5.2	7.7	4.8	5.2	7.7
Viterbi	12.8	20.8	41.1	118.3	196.5	392.7	1,173.0	1,954.4	3,908.3
BB-glob	1.6	1.8	2.8	1.6	1.8	2.8	1.6	1.8	2.8

The table shows the median of the memory requirement (in Kb) in 100 simulated samples for the recursion algorithms under comparison, for different values of T and k.

BW, the algorithm based on the Baum-Welch recursions; lin, linear memory algorithm; BB, the proposed algorithm based on the Bartolucci-Besag recursion; BB-glob, the proposed algorithm for global decoding; Viterbi, the Viterbi algorithm.

As may be expected, the memory requirement of the recursion proposed by Churbanov and Winters-Hilt (2008) and that of our recursion do not depend on the length of the time series. Moreover, we note that the two results are very close, and are moderately affected by the increase in the number of latent states because there is a group of objects of fixed dimension that are stored in any case. On the other hand, the results confirm that the memory complexity of the Baum-Welch recursions linearly increases with the length of the observed series of data and also increases with the number of states. This makes difficult or impossible its application for very long series that can be encountered in certain applications, as mentioned in Section 1. About the algorithms for global decoding, we observe that the memory requirement of the proposed algorithm is significantly smaller than that of the original Viterbi algorithm, and, unlike the latter, it does not depend on the length of the time series.

Table 3 shows the median, over the simulated samples, of the computing time (in seconds) required by the three estimation algorithms of interest, together with the time complexity of the proposed algorithm for global decoding. In this case, the results are affected by the values of ρ and σ². This is because the degree of persistence of the series and the degree of separation of the components affect the complexity of the model to be estimated and, in turn, the number of iterations required by the EM algorithm to reach the convergence. In more detail, for all the recursions under comparison, high values of the variance correspond to components that are not well separated and to a model that is more difficult to estimate. Then, the computing time required to run the algorithms increases. Furthermore, the computing time required by the algorithms also increases as the probability of persistence of the series decreases.

Table 3.

Median of Time Requirement in 100 Simulated Samples

		T = 1,000			T = 10,000			T = 100,000
	Algorithm	k = 3	k = 5	k = 10	k = 3	k = 5	k = 10	k = 3	k = 5	k = 10
	BW	0.004	0.006	0.026	0.022	0.051	0.176	0.223	0.481	1.688
ρ = 0.1	lin	0.004	0.015	0.314	0.027	0.128	1.753	0.295	1.254	14.875
σ² = 0.1	BB	0.005	0.017	0.325	0.038	0.158	1.977	0.379	1.576	17.807
	Viterbi	0.000	0.001	0.002	0.003	0.006	0.022	0.028	0.062	0.217
	BB-glob	0.001	0.001	0.004	0.004	0.009	0.038	0.036	0.090	0.383
	BW	0.005	0.010	0.065	0.030	0.066	0.453	0.285	0.670	4.050
ρ = 0.1	lin	0.006	0.021	0.807	0.039	0.168	4.394	0.370	1.675	31.075
σ² = 0.2	BB	0.006	0.027	0.767	0.048	0.212	4.876	0.480	2.099	38.691
	Viterbi	0.000	0.001	0.002	0.003	0.006	0.022	0.027	0.062	0.214
	BB-glob	0.000	0.001	0.004	0.004	0.010	0.038	0.036	0.091	0.378
	BW	0.010	0.028	0.122	0.053	0.168	2.766	0.555	1.274	9.794
ρ = 0.1	lin	0.011	0.058	1.035	0.068	0.404	23.277	0.686	3.113	70.347
σ² = 0.5	BB	0.012	0.070	1.184	0.084	0.513	28.799	0.873	3.969	89.729
	Viterbi	0.001	0.000	0.002	0.003	0.006	0.021	0.028	0.062	0.215
	BB-glob	0.001	0.001	0.004	0.004	0.009	0.038	0.037	0.091	0.378
	BW	0.004	0.009	0.031	0.027	0.063	0.233	0.250	0.599	2.102
ρ = 0.2	lin	0.005	0.020	0.285	0.034	0.158	1.890	0.339	1.544	15.755
σ² = 0.1	BB	0.005	0.023	0.314	0.043	0.196	2.304	0.432	1.931	20.051
	Viterbi	0.000	0.001	0.002	0.003	0.006	0.022	0.029	0.063	0.217
	BB-glob	0.000	0.001	0.004	0.004	0.009	0.039	0.036	0.090	0.381
	BW	0.006	0.013	0.072	0.039	0.091	0.380	0.361	0.851	3.015
ρ = 0.2	lin	0.007	0.026	0.720	0.049	0.220	2.853	0.486	2.108	22.180
σ² = 0.2	BB	0.008	0.035	0.757	0.062	0.282	3.668	0.617	2.684	28.311
	Viterbi	0.000	0.001	0.002	0.003	0.006	0.022	0.027	0.064	0.216
	BB-glob	0.000	0.001	0.004	0.004	0.009	0.038	0.036	0.091	0.378
	BW	0.013	0.038	0.248	0.080	0.188	1.423	0.763	1.875	9.980
ρ = 0.2	lin	0.014	0.075	1.757	0.101	0.446	10.361	0.974	4.601	70.865
σ² = 0.5	BB	0.017	0.091	2.143	0.128	0.557	13.129	1.222	5.763	91.009
	Viterbi	0.000	0.001	0.002	0.003	0.006	0.022	0.028	0.064	0.216
	BB-glob	0.000	0.001	0.004	0.004	0.009	0.038	0.036	0.091	0.381
	BW	0.008	0.023	0.125	0.055	0.162	0.653	0.533	1.589	6.219
ρ = 0.5	lin	0.010	0.047	0.826	0.068	0.402	4.623	0.713	3.929	44.631
σ² = 0.1	BB	0.010	0.056	1.048	0.089	0.486	5.916	0.897	4.914	56.948
	Viterbi	0.000	0.000	0.002	0.003	0.006	0.022	0.028	0.064	0.217
	BB-glob	0.000	0.001	0.004	0.003	0.009	0.038	0.035	0.091	0.379
	BW	0.013	0.032	0.339	0.076	0.224	0.902	0.764	2.232	7.987
ρ = 0.5	lin	0.013	0.068	2.136	0.097	0.533	6.332	0.954	5.401	58.127
σ² = 0.2	BB	0.015	0.081	2.711	0.119	0.668	8.155	1.229	6.709	74.166
	Viterbi	0.000	0.001	0.002	0.003	0.006	0.022	0.028	0.064	0.215
	BB-glob	0.000	0.001	0.004	0.004	0.009	0.038	0.036	0.091	0.378
	BW	0.030	0.086	0.979	0.184	0.565	2.463	1.801	5.743	20.621
ρ = 0.5	lin	0.032	0.173	6.322	0.221	1.322	17.426	2.276	13.920	148.659
σ² = 0.5	BB	0.036	0.208	8.076	0.277	1.663	22.317	2.918	17.396	190.062
	Viterbi	0.000	0.001	0.002	0.003	0.006	0.021	0.029	0.063	0.212
	BB-glob	0.001	0.001	0.004	0.004	0.009	0.039	0.036	0.091	0.377

The table shows the median of the time requirement, in seconds, in 100 simulated samples for the recursion algorithms under comparison, for different values of T, k, ρ, and σ².

BW, the algorithm based on the Baum-Welch recursions; lin, linear memory algorithm; BB the proposed algorithm based on the Bartolucci-Besag recursion; BB-glob, the proposed algorithm for global decoding; and Viterbi, the Viterbi algorithm.

Obviously, for all the algorithms, the computing time is also affected by the length of the series (T) and by the number of latent states (k). It is clear from Table 3 that the number of operations increases faster in T and k for the linear memory recursion and for the proposed recursion with respect to the Baum-Welch recursions. Moreover, the computing time required to run our recursion is moderately higher than that required by the linear memory implementation, but the ratio between the two computing times appears to be constant under each scenario.

Finally, Table 3 shows that the time requirement of both algorithms for global decoding is very low, even if the computing time required to run the proposed algorithm is relatively higher than that to run the Viterbi algorithm, and is not affected by the values of ρ and σ². For instance, for the longest series (T = 100,000) and the highest number of states (k = 10), the Viterbi algorithm requires around 0.22 seconds, whereas our global decoding method requires around 0.38 seconds.

6. Conclusions

In this article, we propose a constant memory recursion for HM models, which is based on a result of Bartolucci and Besag (2002) for Markov random fields. This recursion allows us to obtain the posterior distribution of the latent state at every occasion, given the previous state and the observed data. We show that the resulting estimation algorithm requires an amount of memory that is independent of the length of the observed series of data. This is one of the main advantages with respect to the well-known Baum-Welch recursions (Baum et al., 1970; Welch, 2003). Moreover, we also show that the computing time and the memory requirement of the proposed recursion are directly comparable with those required by the so-called linear memory algorithm proposed by Churbanov and Winters-Hilt (2008).

Most importantly, we illustrate how to directly perform global decoding, in order to obtain the predicted sequence of latent states. This represents a clear advantage of the proposed approach, because it allows us to exploit the results of every step of the backward recursion to directly compute the quantities necessary for the global decoding procedure. Then the implementation of the Viterbi algorithm, which involves both forward and backward steps, is not required, with a strong reduction of the memory complexity.

We assessed the performance of the proposed approach through a simulation study. The results of this study confirm that the proposed algorithm has performance, in terms of memory requirement and computing time, very close to the performance of the algorithm of Churbanov and Winters-Hilt (2008). Moreover, the simulations confirm that global decoding may be performed in a very fast way by the approach we propose here with an amount of memory that is independent of the length of the series and significantly smaller than that required by the original Viterbi algorithm.

Finally, it is important to recall that the results proposed in this article may be rather easily extended to the case of HM models including time-varying covariates and to the case of HM models of higher order. In the first case, we only need to use a more complex structure of the g_r(y) functions, defined in Section 2.2, when these covariates directly affect the conditional distribution of every response variable given the corresponding latent state. Moreover, the higher order extension may be implemented by a suitable recoding of the latent states, similar to the one adopted by Bartolucci and Solis-Trapala (2010) in an application involving longitudinal data. Another possibility for this extension is based on deriving at each step of the backward recursion the conditional distribution of every latent state given the previous o latent states and the observed data, where o is the order of the HM model of interest. How to obtain these probabilities has been also shown by Bartolucci and Besag (2002).

As a possible future development, our algorithm for global decoding might be used for incremental learning of new data sequence (Khreich et al., 2012). In particular, if implemented using only forward steps, we think that it could allow us to perform on-line decoding, which is useful when the data are collected by a continuous flow of information.

Footnotes

Acknowledgment

F. Bartolucci acknowledges the financial support from the grant FIRB (“Futuro in ricerca”) 2012 on “Mixture and latent variable models for causal inference and analysis of socio-economic data” which is funded by the Italian Government (RBFR12SHVV).

Disclosure Statement

The authors declare that no competing financial interests exist.

References

Askar

, and Derin

1981. A recursive algorithm for the Bayes solution of the smoothing problem. IEEE Tran. Automat. Contr., 26, 558–561.

Bartolucci

, and Besag

2002. A recursive algorithm for Markov random fields. Biometrika, 89, 724–730.

Bartolucci

, and De Luca

2001. Maximum likelihood estimation of a latent variable time-series model. Appl. Stochastic Models Business Ind., 17, 5–17.

Bartolucci

, and Solis-Trapala

2010. Multidimensional latent Markov models in a developmental study of inhibitory control and attentional flexibility in early childhood. Psychometrika, 75, 725–743.

Bartolucci

, Farcomeni

, and Pennoni

2013. Latent Markov Models for Longitudinal Data. Chapman and Hall/CRC Press, Boca Raton, FL.

Bartolucci

, and Pandolfi

2013. Comment on the paper “On the memory complexity of the forward-backward algorithm.”. Pattern Recognition Letters, in press.

Baum

L.E.

, Petrie

, Soules

, and Weiss

1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat., 41, 164–171.

Bishop

M.J.

, and Thompson

E.A.

1986. Maximum likelihood alignment of DNA sequences. J. Mol. Biol., 190, 159–165.

Churbanov

, and Winters-Hilt

2008. Implementing EM and Viterbi algorithms for hidden Markov model in linear memory. BMC Bioinformatics, 9, 224.

10.

Churchill

G.A.

1989. Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol., 51, 79–94.

11.

Churchill

G.A.

1992. Hidden Markov chains and the analysis of genome structure. Comput. Chem., 16, 107–115.

12.

De Fonzo

, Aluffi-Pentini

, and Parisi

2007. Hidden Markov models in bioinformatics. Curr. Bioinform., 2, 49–61.

13.

Dempster

A.P.

, Laird

N.M.

, and Rubin

D.B.

1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Series B Stat. Methodol., 39, 1–38.

14.

Durbin

, Eddy

S.R.

, Krogh

, and Mitchison

1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, UK.

15.

Grice

J.A.

, Hughey

, and Speck

1997. Reduced space sequence alignment. Comput. Appl. Biosci., 13, 45–53.

16.

Juang

B.H.

, and Rabiner

L.R.

1991. Hidden Markov models for speech recognition. Technometrics, 33, 251–272.

17.

Khreich

, Granger

, Miri

, and Sabourin

2010. On the memory complexity of the forward–backward algorithm. Pattern Recognit. Lett., 31, 91–99.

18.

Khreich

, Granger

, Miri

, and Sabourin

2012. A survey of techniques for incremental learning of HMM parameters. Inf. Sci., 197, 105–130.

19.

Krogh

, Brown

, Mian

I.S.

, et al. 1994. Hidden Markov models in computational biology: applications to protein modeling. J. Mol. Biol., 235, 1501–1531.

20.

Langley

1995. Order effects in incremental learning. In Reimann

, and Spada

, ed. Learning in Humans and Machines: Towards an Interdisciplinary Learning Science. Elsevier, Oxford, UK.

21.

Langrock

, MacDonald

I.L.

, and Zucchini

2012. Some nonstandard stochastic volatility models and their estimation using structured hidden Markov models. J. Empir. Finance, 19, 147–161.

22.

Lindgren

1978. Markov regime models for mixed distributions and switching regressions. Scand. J. Stat., 5, 81–91.

23.

Lystig

T.C.

, and Hughes

J.P.

2002. Exact computation of the observed information matrix for hidden Markov models. J. Comput. Graph. Stat., 11, 678–689.

24.

McLachlan

, and Peel

2000. Finite Mixture Models. Wiley, Hoboken, NJ.

25.

Miklós

, and Meyer

I.M.

2005. A linear memory algorithm for Baum–Welch training. BMC Bioinformatics, 6, 231.

26.

Ott

1967. Compact encoding of stationary markov sources. IEEE Trans. Inf. Theory, 13, 82–86.

27.

Polikar

, Udpa

S.S.

, and Honavar

2001. Learn ++: an incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man Cybern. C Appl. Rev., 31, 497–508.

28.

Raviv

1967. Decision making in Markov chains applied to the problem of pattern recognition. IEEE Trans. Inf. Theory, 13, 536–551.

29.

Scott

S.L.

2002. Bayesian methods for hidden Markov models: recursive computing in the 21st century. J. Am. Stat. Assoc., 97, 337–351.

30.

Tarnas

, and Hughey

1998. Reduced space hidden Markov model training. Bioinformatics, 14, 401–406.

31.

Taylor

S.J.

2005. Asset Price Dynamics, Volatility, and Prediction. Princeton University Press, Princeton, NJ.

32.

Viterbi

A.J.

1967. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Inf. Theory, 13, 260–269.

33.

Welch

L.R.

2003. Hidden Markov models and the Baum–Welch algorithm. IEEE Inf. Theory Soc. Newsl., 53, 1–14.

34.

Wheeler

, and Hughey

2000. Optimizing reduced-space sequence analysis. Bioinformatics, 16, 1082–1090.

35.

Zucchini

, and MacDonald

I.L.

2009. Hidden Markov and Other Models for Time Series: An Introduction Using R. Chapman and Hall/CRC Press, Boca Raton, FL.