1. Introduction
Hidden Markov (HM) models have become a popular statistical tool for the analysis of time-series data; for an up-to-date review, see Zucchini and MacDonald (2009). These models are also of great interest for the analysis of longitudinal data, where independent short time series are observed for many statistical units; for a review, see Bartolucci et al. (2013). HM models are based on the assumption that the observable random variables, corresponding to consecutive time occasions, are conditionally independent given a not directly observable (or latent) process, which follows a Markov chain. Usually, this Markov chain is time homogeneous and of first order, so that the latent state at a given occasion only depends on the previous state and the transition probabilities are time invariant.
A fundamental tool of inference for HM models is represented by the forward-backward recursions of Baum and Welch (Baum et al., 1970; Welch, 2003). For a first-order HM model, these recursions allow us to compute the manifest probability (or density) of the observed sequence of data and to obtain the posterior distribution of every latent state and of every pair of consecutive latent states given these data. Based on these recursions, it is then possible to implement an Expectation-Maximization (EM) algorithm (Baum et al., 1970; Dempster et al., 1977) for maximum likelihood estimation of the parameters and to perform local decoding (Juang and Rabiner, 1991), which consists of finding the most likely state at every occasion, given the observed data. Moreover, the Viterbi algorithm (Viterbi, 1967) is usually used to perform global decoding that, unlike local decoding, in which the states are found for each time occasion separately, is aimed at finding the most likely sequence of latent states corresponding to the observed sequence of data.
Despite its popularity, the Baum-Welch recursions may suffer from numerical problems due to the fact that certain probabilities become negligible. This problem may be solved by using dummy renormalizations; see Scott (2002) for further comments and Lystig and Hughes (2002) for an alternative solution in dealing with the manifest distribution of the observed data. A more severe limitation is that the Baum-Welch approach requires an amount of memory that linearly increases with the length of the series of data, and this may limit its applicability in certain biological and engineering contexts where huge series are observed; see Miklós and Meyer (2005) and Churbanov and Winters-Hilt (2008) for further comments.
Alternative recursions have been proposed in the literature in order to reduce the amount of memory required by the Baum-Welch algorithm, or to make the memory use independent of the sequence length. In particular, the checkpoint algorithm (Grice et al., 1997; Tarnas and Hughey, 1998; Wheeler and Hughey, 2000) was proposed to reduce the memory complexity, which, however, still depends on the length of the series of data. Further refinements of this algorithm, as described by Miklós and Meyer (2005), were aimed at making the memory demand independent of the observed sequence length, based only on a forward recursion. Moreover, the linear memory implementation of the Baum-Welch algorithm proposed by Churbanov and Winters-Hilt (2008) reversed the originally forward-only algorithm of Miklós and Meyer (2005) by using backward probabilities of state occupation, so as to accelerate the estimation of the prior state probability and to decrease the memory requirement. However, this is achieved with an increase of the computing time. Finally, Khreich et al. (2010) proposed a variation of the Baum-Welch algorithm whose memory requirement does not depend on the length of the time series and whose time complexity is of the same order as the standard Baum-Welch algorithm.
In a rather recent article, Bartolucci and Besag (2002) proposed a probabilistic result to obtain the marginal distribution of a random variable in Markov random field models and showed that this result may be also used to deal with first-order and second-order HM models. In the present article, we use this result to implement a recursion and an estimation algorithm for first-order HM models, whose memory demand and computing time are directly comparable with those required by the algorithm proposed by Churbanov and Winters-Hilt (2008). In particular, the proposed recursion allows us to obtain the posterior distribution of every latent state given the previous state and the observed data, on the basis of only backward steps. With respect to the Baum-Welch recursions, our recursion has the advantage of not requiring dummy renormalizations. More importantly, it also allows us to directly perform global decoding, so as to obtain the predicted sequence of latent states on the basis of the estimated posterior probabilities. In more detail, this alternative approach for global decoding only involves quantities that are obtained from the proposed backward recursion, without requiring the implementation of the Viterbi algorithm, which involves both forward and backward steps. This implies a strong reduction of the amount of memory required for global decoding. Moreover, if implemented using only forward steps, the proposed method allows us to perform on-line decoding (Langley, 1995; Polikar et al., 2001), which is useful when the data are collected by a continuous flow of information, as may happen in engineering or biology; see Khreich et al. (2012) for a review of the main techniques suitable for this type of decoding.
The remainder of the article is organized as follows. In the following section, we briefly review HM models of first order and their maximum likelihood estimation through the EM algorithm. In Section 3, we review the main recursions proposed in the literature for estimation and global decoding, that is, the Baum-Welch recursions, its linear memory implementation, and the Viterbi algorithm. The proposed recursion is illustrated in Section 4, together with its use for maximum likelihood estimation and the alternative approach proposed for global decoding. In Section 5, we illustrate a simulation study aimed at comparing the performance of the proposed estimation algorithm, which is based on the Bartolucci-Besag recursion, with the algorithms based on the Baum-Welch recursions and the linear memory recursion. We also make a comparison between the proposed method for global decoding and the Viterbi algorithm. Finally, Section 6 contains the main conclusions.
The algorithms and recursions illustrated in this article have been implemented in a series of R functions based on Fortran compiled codes that are available to the reader upon request.
2. Preliminaries
In this section, we first illustrate the basic assumptions of the HM models for time series data, and then we briefly review maximum likelihood estimation of certain models of this type through the EM algorithm.
2.1. Hidden Markov models
Consider a sequence of T manifest random variables
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$Y_1 , \ldots , Y_T$$\end{document}
that are collected in the vector
Y
. An HM model assumes that these random variables are conditionally independent given the sequence of latent random variables
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$V_1 , \ldots , V_T$$\end{document}
. The latter follows a Markov chain with k states, indexed from 1 to k. We consider, in particular, Markov chains that are homogeneous and of first order, so that Vt is conditionally independent of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$V_1 , \ldots , V_{t - 2}$$\end{document}
given Vt−1 for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 3 , \ldots , T$$\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}p ( V_t = v \mid V_{t - 1} = u ) = \pi_{v \mid u} , \quad t = 2 , \ldots , T , u , v = 1 , \ldots , k ,\end{align*}\end{document}
with πv|u being a transition probability independent of t. The Markov chain is also characterized by the initial probabilities
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\pi_v = p ( V_1 = v ) , \quad v = 1 , \ldots , k.\end{align*}\end{document}
Another basic assumption of HM models is that every response Yt depends on the latent process only through Vt, and then by ft(y|v) we denote the probability mass (or density) function of this distribution. In particular, as in Churbanov and Winters-Hilt (2008), we explicitly consider the case in which Yt given Vt has a normal distribution with mean and variance depending on Vt. Consequently, we have that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f_t ( y \mid v ) = \frac { 1 } { \sqrt { 2 \pi \sigma_v^2 } } \exp \left[ - \frac { 1 } { 2 } \left( \frac { y - \mu_v } { \sigma_v } \right) ^2 \right] , \tag { 1 } \end{align*}\end{document}
where μv and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\sigma_v^2 , v = 1 , \ldots , k$$\end{document}
, are parameters to be estimated. Note that, in certain formulations, the means of the different states are constrained to be 0. In dealing with financial data, an HM version of the stochastic volatility (SV) model (Taylor, 2005) results in this way; see also Bartolucci and De Luca (2001) and Langrock et al. (2012). On the other hand, when the means are free parameters, we constrain the variances to be equal to each other in order to avoid degenerate solutions, which may also happen with a finite mixture model of normal distributions (see chapter 3.8 in McLachlan and Peel, 2000). The resulting model, which is referred to as HM homoscedastic model, is considered within the simulation study described in Section 5.
The other case we consider explicitly here is that of categorical response variables with c categories, labeled from 1 to c. In this case, we have that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f_t ( y \mid v ) = p ( Y_t = y \mid V_t = v ) = \phi_{y \mid v} , \quad v = 1 , \ldots , k , y = 1 , \ldots , c , \tag{2}\end{align*}\end{document}
with φy|v being parameters to be estimated in place of μv and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\sigma_v^2$$\end{document}
. This formulation may be of interest in genetics, for the analysis of biological sequences (Bishop and Thompson, 1986), such as DNA and protein sequences. In particular, Churchill (1989) introduced a method to describe DNA sequences using an HM model. Moreover, this model has been used for exploring structural similarities of families of genes (Churchill, 1992) and for producing multiple sequence alignments (Krogh et al., 1994). Furthermore, sequence analysis using HM models is a standard approach in bioinformatics (Durbin et al., 1998) and is a fundamental component of many gene-finding algorithms that identify and delineate genes in the human and other genomes (De Fonzo et al., 2007).
It has to be clear that the same modeling framework as above may be adopted in dealing with longitudinal data, when we observe short sequences of data for n sample units, which are usually assumed to be independent of each other. However, we do not explicitly consider the case of longitudinal data here, because the theory that will be developed may be also easily applied to this case. For more details and examples referring to HM formulations for longitudinal data, see Bartolucci et al. (2013). Moreover, we can also consider HM models based on a Markov chain of higher order and models that include covariates. The results developed here may be easily adapted to deal also with these cases.
2.2. Maximum likelihood estimation
Given a sequence of observations
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$y_1 , \ldots , y_T$$\end{document}
collected in the vector
y
, which is a realization of
Y
, the model log-likelihood is simply
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\ell ( { \bf\theta} ) = \log f ( \textbf{\textit{y}} ),\end{align*}\end{document}
where
θ
is the vector collecting all model parameters. The structure of
θ
depends on the specific parameterization that is adopted for the conditional response distribution ft(y|v). For instance, for the HM-SV model based on assumption (1) with means equal to 0,
θ
includes the initial probabilities
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\pi_v , v = 1 , \ldots , k$$\end{document}
, the transition probabilities
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\pi_{v \mid u} , u , v = 1 , \ldots , k$$\end{document}
, and the conditional variances
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\sigma_v^2 , v = 1 , \ldots , k$$\end{document}
. Moreover, f (
y
) is the marginal probability (or density) of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$Y_1 , \ldots , Y_T$$\end{document}
computed at
y
, depending on
θ
. This function is defined as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f ( \textbf{\textit{y}} ) = \sum_{\textbf{\textit{v}}}f ( \textbf{\textit{y}} \mid \textbf{\textit{v}} ) p ( \textbf{\textit{v}} ) ,\tag{3}\end{align*}\end{document}
where ∑
v
means the sum over all the possible latent variable configurations
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\textbf{\textit{v}} = ( v_1 , \ldots , v_T )$$\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f ( \textbf{\textit{y}} \mid \textbf{\textit{v}} ) & = \prod_{t = 1}^Tf_t ( y_t \mid v_t ) , \\ p ( \textbf{\textit{v}} ) & = \pi_{v_1} \prod_{t = 2}^T \pi_{v_t \mid v_{t - 1}}.\end{align*}\end{document}
Computing f (
y
) by the sum in (3) is infeasible in typical applications; then recursions of the type that will be illustrated in Section 3.1 are necessary.
In order to maximize ℓ(
θ
), we use an EM algorithm that follows the same principle as that illustrated by Baum et al. (1970); see also Dempster et al. (1977). In particular, this algorithm is based on the complete data log-likelihood
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\ell^* ( { \bf \theta} ) = \sum_{t = 1}^T \sum_{v =
1}^k z_t ( v ) \log f_t ( y_t \mid v ) + \sum_{v = 1}^k z_1 ( v )
\log \pi_v + \sum_{t = 2}^T \sum_{u = 1}^k \sum_{v = 1}^k z_t ( u
, v ) \log \pi_{v \mid u} , \tag{4}\end{align*}\end{document}
where zt(v) is a dummy variable equal to 1 if the latent state at occasion t is v and to 0 otherwise, and zt(u, v) = zt−1(u)zt(v) is a dummy variable equal to 1 if there is a transition from state u to v at occasion t and to 0 otherwise.
At the E-step of the EM algorithm, we need to compute the posterior expected values of the above dummy variables given the observed data and the current value of the parameters. In particular, these expected values correspond to the following quantities
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \hat{z}_t ( v ) & = p ( V_t = v \mid \textbf{\textit{Y}} = \textbf{\textit{y}} ) , \quad t = 1 , \ldots , T , v = 1 , \ldots , k , \\ \hat{z}_t ( u , v ) & = p ( V_{t - 1} = u , V_t = v \mid \textbf{\textit{Y}} = \textbf{\textit{y}} ) , \quad t = 2 , \ldots , T , u , v = 1 , \ldots , k.\end{align*}\end{document}
As usual, the M-step of the EM algorithm consists of maximizing
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\ell^*$$\end{document}
(
θ
), once the dummy variables in (4) have been substituted by the corresponding expected values obtained as above and, in this way, the parameters in
θ
are updated. In particular, for the normal model, based on assumption (1), if the conditional means and variances are unconstrained, these parameters are updated as follows:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}
\mu_v & = \frac {a_1 ( v )} {a_0 ( v )} , \quad v = 1 , \ldots , k
, \\ \sigma_v^2 & = \frac { a_2 ( v ) } { a_0 ( v ) } - \mu_v^2 ,
\quad v = 1 , {\ldots , k , } \tag { 5 }
\end{align*}\end{document}
where, in general, we define
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}a_r ( v ) = \sum_{t = 1}^T \hat{z}_t ( v ) g_r ( y_t ) , \tag{6}\end{align*}\end{document}
with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}g_r ( y ) = \begin{cases}1 & \hbox{if} \it{ r} = 0 , \\ y & \hbox{if } \it{ r} = 1 , \\ y^2 & \hbox{if } \it{ r} = 2.\end{cases} \tag{7}\end{align*}\end{document}
However, if the means are constrained to 0 as in the HM-SV model, that is,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\mu_v = 0 , v = 1 , \ldots , k$$\end{document}
, then the variances are simply updated as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\sigma_v^2 = \frac { a_2 ( v ) } { a_0 ( v ) } .\end{align*}\end{document}
Moreover, if the means are unconstrained but the variances are constrained to be equal, that is,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\sigma_v^2 = \sigma^2 , v = 1 , \ldots , k$$\end{document}
, then the means are updated as in (5) and the common variance is updated as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\sigma^2 = \frac { 1 } { T } \sum_ { v = 1 } ^k \left[ a_2 ( v ) - a_0 ( v ) \mu_v^2 \right] .\end{align*}\end{document}
In the case of the model for categorical responses, which is based on assumption (2), we update the conditional response probabilities as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\phi_ { y \mid v } = \frac { a_y ( v ) } { a_0 ( v ) } , \quad v = 1 , \ldots , k , y = 1 , \ldots , c ,\end{align*}\end{document}
where in this case
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}g_r ( y ) = \begin{cases}1 \quad \quad\quad \hbox{if} \ r = 0 , \\ I ( y = r ) \quad \hbox{if} \ r = 1 , \cdots , c , \end{cases} \tag{8}\end{align*}\end{document}
with I(·) denoting the indicator function equal to 1 if its argument is true and to 0 otherwise. Regarding the other parameters, that is, the initial and transition probabilities, they are updated at the M-step in the following way:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \pi_v & = p ( V_1 = v \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) , \quad v = 1 , \ldots , k , \\ \pi_ { v \mid u } & = \frac { b ( u , v ) } { \sum_ { h = 1 } ^k b ( u , h ) } , \quad u , v = 1 , \ldots , k , \end{align*}\end{document}
where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}b ( u , v ) = \sum_{t = 2}^T \hat{z}_t ( u , v ) . \tag{9}\end{align*}\end{document}
Even the quantities ar(v) and b(u, v) involved in the update of the model parameters can be computed only through suitable recursions that are illustrated in the following.
3. Recursions for Estimation and Decoding
In the following, we review the Baum-Welch recursions (Baum et al., 1970; Welch, 2003) and the linear memory recursion proposed by Churbanov and Winters-Hilt (2008) to compute quantities that are used within the EM algorithm. We also briefly review alternative recursions aimed at addressing the problem of reducing the memory complexity of the Baum-Welch recursions; see also Khreich et al. (2010). Finally, we illustrate the Viterbi algorithm, which is the most used algorithm for global decoding, that is, to find the most likely sequence of latent states corresponding to the observed sequence of data.
3.1. Forward-backward recursions
In order to efficiently compute the probability (or the density) of the observed sequence of T observations collected in
y
, Baum and Welch (Baum et al., 1970; Welch, 2003) proposed the following forward recursion for a first-order HM model:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\alpha_t ( v ) = \sum_{u = 1}^k \alpha_{t - 1} ( u ) \pi_{v \mid u}\ f_t ( y_t \mid v ) , \quad t = 2 , \ldots , T , v = 1 , \ldots , k , \tag{10}\end{align*}\end{document}
where αt(v) corresponds to
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$f ( V_t = v , Y_1 = y_1 , \ldots , Y_t = y_t )$$\end{document}
. This recursion is initialized with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\alpha_1 ( v ) = \pi_v\ f_1 ( y_1 \mid v ) , v = 1 , \ldots , k$$\end{document}
. In the end, we obtain the manifest probability (or density) function of
y
as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f ( \textbf{\textit{y}} ) = \sum_{v = 1}^k \alpha_T ( v ) .\end{align*}\end{document}
Moreover, Baum and Welch introduced the backward recursion
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\beta_t ( v ) = \sum_{w = 1}^k \beta_{t + 1} ( w ) \pi_{w \mid v}\ f_{t + 1} ( y_{t + 1} \mid w ) , \quad t = 1 , \ldots , T - 1 , v = 1 , \ldots , k , \tag{11}\end{align*}\end{document}
where, in general, βt(v) corresponds to
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$f ( Y_{t + 1} = y_{t + 1} , \ldots , Y_T = y_T \mid V_t = v )$$\end{document}
. The recursion is initialized with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\beta_T ( v ) = 1 , v = 1 , \ldots , k$$\end{document}
. Using this recursion, we can obtain the posterior probability of every latent state given the observed data. In particular, we have
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\hat { z } _t ( v ) = \frac { \alpha_t ( v ) \beta_t ( v ) } { f ( \textbf { \textit { y } } ) } , \quad t = 1 , \ldots , T , v = 1 , \ldots , k ,\end{align*}\end{document}
whereas for the posterior probability of every pair of consecutive states we have
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\hat { z } _t ( u , v ) = \frac { \alpha_ { t - 1 } ( u ) \pi_ { v \mid u } \ f_t ( y_t \mid v ) \beta_ { t } ( v ) } { f ( \textbf { \textit { y } } ) } , \quad t = 2 , \ldots , T , u , v = 1 , \ldots , k.\end{align*}\end{document}
The number of operations involved in the above recursions is of order O(Tk2), and then it linearly, rather than exponentially, increases with the number of observations. This is the main advantage of the approach. However, as already mentioned, the Baum-Welch recursions may suffer from numerical instability due to the fact that, as t increases, the probabilities in (10) and (11) become negligible. The problem, which is evident when T is large, may be solved by using suitable renormalizations that are rather simple to implement; see Scott (2002) for a deeper description. A more severe limitation concerns the memory requirement, which is of order O(kT); in fact, we need to store all the forward probabilities αt(v) for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T$$\end{document}
. This memory requirement can become excessive, especially in the presence of very long series of data. We will discuss this point in more detail in the following.
3.2. Linear memory recursion
The linear memory procedure, proposed by Churbanov and Winters-Hilt (2008), is aimed at overcoming the above limitation in terms of memory requirement. It represents an alternative version of the Baum-Welch recursions and an improvement of the forward-only algorithm introduced by Miklós and Meyer (2005). The algorithm is based on backward steps with scaling, so as to only involve calculation of α1(v) for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document}
. Then all calculations are based on the backward probabilities βt(v) that are properly scaled in order to avoid numerical problems for long time series.
In more detail, the backward recursion is initialized for the last time occasion, t = T, with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\beta_T ( v ) = 1 , v = 1 , \ldots , k$$\end{document}
, and the following scaled backward probabilities are computed:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\tilde{ \beta}_T ( v ) = D_T \beta_T ( v ) , \quad v = 1 , \ldots , k ,\end{align*}\end{document}
with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$D_T = 1/ k$$\end{document}
. Then, for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T - 1$$\end{document}
, it is possible to use the results of the backward recursion as follows:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\tilde{ \beta}_t ( v ) = d_t \bar{ \beta}_t ( v ) , \quad v = 1 , \ldots , k ,\end{align*}\end{document}
with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \bar{ \beta}_t ( v ) & = \sum_{u = 1}^k \tilde{ \beta}_{t + 1} ( w )\pi_{w \mid v} \ f_{t + 1} ( y_{t + 1} \mid w ) , \quad v = 1 , \ldots , k , \\ d_t & = \left[ \sum_{u = 1}^k \bar{ \beta}_t ( u ) \right] ^{ - 1}.\end{align*}\end{document}
Finally, for t = 1 we also compute α1(v) = πv f1(y1|v) for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document}
.
Note that, in order to efficiently estimate the model parameters, a new set of functions is introduced, that is,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tilde{a}_{rt} ( u , v ) , r = 0 , \ldots , R , u , v = 1 , \ldots , k ,$$\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tilde{b}_t ( u , v , w ) , u , v , w = 1 , \ldots , k$$\end{document}
, where R = 2 for the model for continuous outcomes [see definition (7)] and R = c for the model with categorical outcomes [see definition (8)]. These functions are computed by a backward recursion that is initialized for the last time occasion with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tilde{a}_{rT} ( u , v ) = \tilde{ \beta}_T ( u ) g_r ( y_T )$$\end{document}
when u = v, and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tilde{a}_{rT} ( u , v ) = 0$$\end{document}
otherwise, and with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tilde{b}_T ( u , v , w ) = 0$$\end{document}
. Then the backward recursion is implemented by computing, for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T - 1$$\end{document}
,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \tilde{a}_{rt} ( u , v ) & = d_t \bar{a}_{rt} ( u , v ) , \quad r = 0 , \ldots , R , u , v = 1 , \ldots , k , \\ \tilde{b}_t ( u , v , w ) & = d_t \bar{b}_t ( u , v , w ) , \quad u , v , w = 1 , \ldots , k , \end{align*}\end{document}
with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \bar{a}_{rt} ( u , v ) = \sum_{h = 1}^k f_{t + 1} ( y_{t + 1} \mid h ) \pi_{h \mid v} \tilde{a}_{r , t + 1} ( u , h ) + \bar{ \beta}_t ( v ) g_r ( y_t ) I ( u = v ) , \end{align*}\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\bar{b}_t ( u , v , w ) = \sum_{h = 1}^k \pi_{h \mid w} \tilde{b}_{t + 1} ( u , v , h ) f_{t + 1} ( y_{t + 1} \mid h ) + \tilde{ \beta}_{t + 1} ( v ) \pi_{w \mid v}\ f_{t + 1} ( y_{t + 1} \mid v ) I ( u = w ) .\end{align*}\end{document}
When t = 1, we compute the following quantities:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\tilde{a}_r ( u ) = \sum_{v = 1}^k \tilde{a}_{r1} ( u , v ) \pi_v \ f_1 ( y_1 \mid v ) , \quad r = 0 , \ldots , R , u = 1 , \ldots , k ,\end{align*}\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\tilde{b} ( u , v ) = \sum_{w = 1}^k \tilde{b}_1 ( u , v , w ) \pi_w \ f_1 ( y_1 \mid w ) , \quad u , v = 1 \ldots , k ,\end{align*}\end{document}
which can be used to update the initial and transition probabilities according to the following formulas:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \pi_v & = \frac { \alpha_1 ( v ) \tilde { \beta } _1 ( v ) } { \sum_ { u = 1 } ^k \alpha_1 ( u ) \tilde { \beta } _1 ( u ) } , \quad v = 1 , \ldots , k , \\ \pi_ { v \mid u } & = \frac { \tilde { b } ( u , v ) } { \sum_ { h = 1 } ^k \tilde { b } ( u , h ) } , \quad u , v = 1 , \ldots , k; \end{align*}\end{document}
see Churbanov and Winters-Hilt (2008) for details. Moreover, for the normal HM model based on assumption (1), the conditional means and variances are updated as follows:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \mu_v & = \frac { \tilde { a } _1 ( v ) } { \tilde { a } _0 ( v ) } , \quad v = 1 , \ldots , k , \\ \sigma_v^2 & = \frac { \tilde { a } _2 ( v ) } { \tilde { a } _0 ( v ) } - \mu_v^2 , \quad v = 1 , \ldots , k. \end{align*}\end{document}
For the model for categorical responses, which is based on assumption (2), we update the conditional response probabilities as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\phi_ { y \mid v } = \frac { \tilde { a } _y ( v ) } { \tilde { a } _0 ( v ) } , \quad v = 1 , \ldots , k , y = 1 , \ldots , c.\end{align*}\end{document}
Note that the above expressions are similar to those introduced in Section 2.2.
For the linear memory recursion, the number of operations is of order O(k4T), whereas the memory requirement is of order O(k3).
3.3. Alternative recursions
In a recent article, Khreich et al. (2010) proposed an algorithm for maximum likelihood estimation of HM models, named Efficient Forward Filtering Backward Smoothing (EFFBS) algorithm, which may be seen as an improved version of the Forward Filtering Backward Smoothing (FFBS) algorithm proposed by Ott (1967), Raviv (1967), Lindgren (1978), and Askar and Derin (1981). The FFBS algorithm represents a numerical stable alternative of the Baum-Welch algorithm, which, however, involves a number of operations and a memory requirement of the same order as the latter. This algorithm is based on a forward recursion to compute the so-called predictive and filtered probabilities, which are defined as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\gamma_{t \mid t - 1} ( v ) = p ( V_t = v \mid Y_1 = y_1 , \ldots , Y_{t - 1} = y_{t - 1} )$$\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\gamma_{t \mid t} ( v ) = p ( V_t = v \mid Y_1 = y_1 , \ldots , Y_t = y_t )$$\end{document}
, respectively, with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T$$\end{document}
.
The forward recursion is initialized with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\gamma_{1 \mid 0} ( v ) = \pi_v , v = 1 , \ldots , k$$\end{document}
, and then, for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T - 1$$\end{document}
, it computes
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\gamma_ { t \mid t } ( v ) = \frac { \gamma_ { t \mid t - 1 } ( v ) f_t ( y_t \mid v ) } { \sum_ { u = 1 } ^k \gamma_ { t \mid t - 1 } ( u ) f_t ( y_t \mid u ) } , \quad v = 1 , \ldots , k , \tag { 12 } \end{align*}\end{document}
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\gamma_{t + 1 \mid t} ( v ) = \sum_{u = 1}^k \gamma_{t \mid t} ( u ) \pi_{v \mid u} , \quad v = 1 , \ldots , k. \tag{13}\end{align*}\end{document}
In the end, we obtain the posterior probabilities of the last latent state as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\hat { z } _T ( v ) = \gamma_ { T \mid T } ( v ) = \frac { \gamma_ { T \mid T - 1 } ( v ) f_T ( y_T \mid v ) } { \sum_ { u = 1 } ^k \gamma_ { T \mid T - 1 } ( u ) f_T ( y_T \mid u ) } , \quad v = 1 , \ldots , k.\end{align*}\end{document}
The backward recursion consists of computing for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T - 1$$\end{document}
(in reverse order) the remaining posterior probabilities as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\hat { z } _ { t + 1 } ( u , v ) = \frac { \gamma_ { t \mid t } ( u ) \pi_ { v \mid u } \gamma_ { t + 1 \mid T } ( v ) } { \sum_ { h = 1 } ^k \gamma_ { t \mid t } ( h ) \pi_ { v \mid h } } , \quad u , v = 1 , \ldots , k , \tag { 14 } \end{align*}\end{document}
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\hat{z}_t ( u ) = \sum_{v = 1}^k \hat{z}_{t + 1} ( u , v ) , \quad u = 1 , \ldots , k. \tag{15}\end{align*}\end{document}
Supposing that T is larger than k, as normally happens, the number of operations required by the algorithm is of order O(k2T), whereas the required amount of memory is of order O(kT), because we need to store all the filtered probabilities from the forward recursion.
The improved version of the FFBS algorithm proposed by Khreich et al. (2010) aims at computing the exact posterior probabilities of the latent states given the observed data with a memory complexity that is independent of T, and without increasing the time complexity. The key point of the EFFBS algorithm is that equation (12) may be inverted as follows:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\gamma_ { t \mid t - 1 } ( v ) = \frac { \gamma_ { t \mid t } ( v ) / f_t ( y_t \mid v ) } { \sum_ { u = 1 } ^k \gamma_ { t \mid t } ( u ) / f_t ( y_t \mid u ) } , \quad v = 1 , \ldots , k. \tag { 16 } \end{align*}\end{document}
In a similar way, letting
γ
t|t−1 be the column vector with elements
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\gamma_{t \mid t - 1} ( v ) , v = 1 , \ldots , k$$\end{document}
, with
γ
t|t being defined accordingly, we have that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}{ \bf \gamma}_{t \mid t} = ( { \bf \Pi}^{ \prime} ) ^{ - 1}{ \bf \gamma}_{t + 1 \mid t} , \tag{17}\end{align*}\end{document}
where Π is the transition probability matrix with elements
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\pi_{v \mid u} , u , v = 1 , \ldots , k$$\end{document}
, with the index u running by row and v by column. This is because equation (13) may be simply expressed in matrix notation as
γ
t+1|t = Π′
γ
t|t.
Based on the above results, the EFFBS algorithm relies on the same forward recursion of the FFBS algorithm, but it only requires storing of the last predictive and posterior probabilities, that is, γT|T−1(v) and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\gamma_{T \mid T} ( v ) , v = 1 , \ldots , k$$\end{document}
. Then, once the inverse of Π has been computed, the backward step consists of computing, for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 1 , \ldots , T - 1$$\end{document}
(in reverse order), the probabilities γt|t(v) by (17) and the probabilities γt|t-1(v) by (16). Then we obtain the target posterior probabilities
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\hat{z}_t ( u , v )$$\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\hat{z}_t ( u )$$\end{document}
at each step t through equations (14) and (15), respectively. As the EFFBS needs only to store one set of predictive and filtering probabilities at the end of the forward recursion, then the amount of memory required by this algorithm strongly decreases and is of order O(k2), while having the same complexity in terms of number of operations as the FFBS algorithm, that is, O(k2T).
Although this alternative version seems promising and the relationships used in the implementation of the recursion are proved from a mathematical point of view, we found numerical problems, arising when the algorithm is performed. In particular, at a certain point of the backward recursion, some filtering probabilities become negative and, obviously, this result is not correct. We also note that the difference with the posterior probabilities obtained by the FFBS algorithm increases during the backward recursion, leading to the conclusion that there is an accumulation of numerical errors that also affects the posterior probabilities. These errors are likely due to the way in which the filtering probabilities are obtained, through the inverse of the transposed transition probability matrix, which normally contains negative elements. For a detailed description see Bartolucci and Pandolfi (2013). This problem systematically occurs with a different set of parameters, length of the series, and number of latent states.
Finally, it is also worth mentioning that another alternative to the Baum-Welch algorithm is represented by the checkpoint algorithm (Grice et al., 1997; Tarnas and Hughey, 1998; Wheeler and Hughey, 2000), which stores only some reference columns or checkpoints, instead of storing the whole matrix of forward probabilities. Then, in the backward step, the remaining forward values are sequentially recomputed, beginning with their corresponding checkpoints, so as to update the parameter estimates. As mentioned above, the implementation of this algorithm reduces the memory complexity, but it still depends on the length of the time series, being of order
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O ( k \sqrt{T} )$$\end{document}
. However, this algorithm is outperformed by the linear memory algorithm described above and then is not illustrated in detail.
3.4. Viterbi algorithm
In many applications, such as in speech recognition, it is usually of interest to reconstruct the sequence of the latent states given the observed sequence of data. This problem is known as global decoding and consists of selecting the sequence of states of the hidden Markov chain that is the most likely (under the fitted model) to have given rise to the observed sequence; this sequence of latent states is denoted by
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\textbf{\textit{v}}^* = ( v_1^* , \ldots , v_T^* )$$\end{document}
. This problem may be solved by using an efficient algorithm known as the Viterbi algorithm (Viterbi, 1967; Juang and Rabiner, 1991).
Let ψ1(v) = f (V1 = v, Y1 = y1) and, for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 2 , \ldots , T$$\end{document}
, let
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\psi_t ( v ) = \max_{v_1 , \ldots , v_{t - 1}}f ( V_1 = v_1 , \ldots , V_{t - 1} = v_{t - 1} , V_t = v , Y_1 = y_1 , \ldots , Y_t = y_t )$$\end{document}
. The Viterbi algorithm performs a forward recursion to compute the above quantities. Moreover, a backward recursion is used for path prediction, in order to find the most likely latent sequence.
In particular, the forward recursion consists of computing
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\psi_t ( v ) = f_t ( y_t \mid v ) \max_u [ \psi_{t - 1} ( u ) \pi_{v \mid u} ] , \quad t = 2 , \ldots , T , v = 1 , \ldots , k.\end{align*}\end{document}
This recursion is initialized, for t = 1, with ψ1(v) = πv f1(y1|v). At the end of the forward recursion, it is possible to obtain the optimal state
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}v^*_T = {\rm argmax}_{u}\, \psi_T ( u ) .\end{align*}\end{document}
Finally, the predicted sequence of latent states can be determined by a backward recursion as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}v^*_t = {\rm argmax}_{u}\,[ \psi_t ( u ) \pi_{v^*_{t + 1} \mid u} ], \quad t = 1 , \ldots , T - 1.\end{align*}\end{document}
The Viterbi algorithm has a time complexity of order O(k2T) and a memory requirement of order O(kT). Churbanov and Winters-Hilt (2008) presented a modification of this algorithm in which the memory requirement is smaller, without affecting the execution time. This alternative version requires an amount of memory of order O(T); see Churbanov and Winters-Hilt (2008) for details.
4. Proposed Algorithm
Developing a result obtained by Bartolucci and Besag (2002) for Markov random fields, in this section we show a backward recursion to directly compute the posterior probabilities
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_t ( u , v ) = p ( V_t = v \mid V_{t - 1} = u , \textbf{\textit{Y}} = \textbf{\textit{y}} ) , \quad t = 2 , \ldots , T , u , v = 1 , \ldots , k , \tag{18}\end{align*}\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_1 ( v ) = p ( V_1 = v \mid \textbf{\textit{Y}} = \textbf{\textit{y}} ) , \quad v = 1 , \ldots , k. \tag{19}\end{align*}\end{document}
The first is the conditional probability of a certain realization of Vt, given Vt−1 and the observed response configuration
y
, whereas the second is the probability of a certain realization of V1 given
y
. Note that, to simplify the notation, the argument
y
is removed from δt(u,v) as similarly done for the recursions in Section 3.1.
That the aforementioned result may be applied to deal with HM models was already noted by Bartolucci and Besag (2002), who showed it in detail for the case of first-order and second-order HM models. However, in this article we use this result in a different way, so as to obtain an algorithm whose memory requirement does not depend on T. In the following, this recursion is described in detail together with its use for maximum likelihood estimation. We also show that, from the results of this recursion, it is possible to directly obtain the predicted sequence of latent states, without requiring the implementation of the Viterbi algorithm for global decoding.
4.1. Proposed backward recursion
For the last time occasion, that is, when t = T, the probability in (18) may be simply computed as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_T ( u , v ) = \frac { f_T ( y_T \mid v ) \pi_ { v \mid u } } { \sum_ { h = 1 } ^kf_T ( y_T \mid h ) \pi_ { h \mid u } } , \tag { 20 } \end{align*}\end{document}
whereas, for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 2 , \ldots , T - 1$$\end{document}
, Theorem 1 of Bartolucci and Besag (2002) implies that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_t ( u , v ) = \left[ \sum_ { w = 1 } ^k \frac { \delta_ { t + 1 } ( v , w ) } { p ( V_t = v \mid V_ { t - 1 } = u , V_ { t + 1 } = w , \textbf { \textit { Y } } = \textbf { \textit { y } } ) } \right]^ { - 1 } . \tag { 21 } \end{align*}\end{document}
In order to prove this result, consider that the assumption that the HM process is of first order implies that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_{t + 1} ( v , w ) = p ( V_{t + 1} = w \mid V_t = v , \textbf{\textit{Y}} = \textbf{\textit{y}} ) = p ( V_{t + 1} = w \mid V_{t - 1} = u , V_t = v , \textbf{\textit{Y}} = \textbf{\textit{y}} )\end{align*}\end{document}
and then
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \frac { \delta_ { t + 1 } ( v , w ) } { p ( V_t = v \mid V_ { t - 1 } = u , V_ { t + 1 } = w , \textbf { \textit { Y } } = \textbf { \textit { y } } ) } = \frac { p ( V_ { t - 1 } = u , V_ { t + 1 } = w \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) } { p ( V_ { t - 1 } = u , V_t = v \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) } .\end{align*}\end{document}
Consequently, the sum in (21) is equal to
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \frac { p ( V_ { t - 1 } = u \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) } { p ( V_ { t - 1 } = u , V_t = v \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) } \end{align*}\end{document}
and the result holds. Also note that the conditional probability in (21) is obtained as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}p ( V_t = v \mid V_ { t - 1 } = u , V_ { t + 1 } = w , \textbf { \textit { Y } } = \textbf { \textit { y } } ) = \frac { \pi_ { v \mid u } \pi_ { w \mid v } \ f_t ( y_t \mid v ) } { \sum_ { h = 1 } ^k \pi_ { h \mid u } \pi_ { w \mid h } \ f_t ( y_t \mid h ) } . \tag { 22 } \end{align*}\end{document}
Finally, for t = 1, the recursion is completed by computing
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_1 ( v ) = \left[ \sum_ { w = 1 } ^k \frac { \delta_2 ( v , w ) } { p ( V_1 = v \mid V_2 = w , \textbf { \textit { Y } } = \textbf { \textit { y } } ) } \right] ^ { - 1 } , \tag { 23 } \end{align*}\end{document}
where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}p ( V_1 = v \mid V_2 = w , \textbf { \textit { Y } } = \textbf { \textit { y } } ) = \frac { \pi_ { v } \pi_ { w \mid v } \ f_1 ( y_1 \mid v ) } { \sum_ { h = 1 } ^k \pi_h \pi_ { w \mid h } \ f_1 ( y_1 \mid h ) } .\end{align*}\end{document}
This rule is again based on Theorem 1 in Bartolucci and Besag (2002).
4.2. Use for maximum likelihood estimation
First of all, for any sequence of latent states
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_1 , \ldots , v_T$$\end{document}
collected in
v
, we have that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}f ( \textbf { \textit { y } } ) = \frac { \pi_ { v_1 } \ f_1 ( y_1 \mid v_1 ) \prod_ { t = 2 } ^T \pi_ { v_t \mid v_ { t - 1 } } \ f_t ( y_t \mid v_t ) } { p ( V_1 = v_1 \mid \textbf { \textit { Y } } = \textbf { \textit { y } } ) \prod_ { t = 2 } ^T \delta_t ( v_ { t - 1 } , v_t ) } ,\end{align*}\end{document}
where the numerator corresponds to the joint distribution of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$V_1 , \ldots , V_T$$\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$Y_1 , \ldots , Y_T$$\end{document}
and the denominator corresponds to the conditional distribution of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$V_1 , \ldots , V_T$$\end{document}
given
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$Y_1 , \ldots , Y_T$$\end{document}
. Consequently, given an arbitrary sequence
v
, we compute the model log-likelihood as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\ell ( { \bf \theta } ) = \log \frac { \pi_ { v_1 }
\ f_1 ( y_1 \mid v_1 ) } { \delta_1 ( v_1 ) } + \sum_ { t = 2 }
^T \log \frac { \pi_ { v_t \mid v_ { t - 1 } } \ f_t ( y_t \mid
v_t ) } { \delta_t ( v_ { t - 1 } , v_t ) } , \tag { 24 }
\end{align*}\end{document}
on the basis of the proposed recursion. In this way, we do not need to use any renormalization that is instead necessary for the Baum-Welch recursions. It is also worth noting that, although the sequence of latent states
v
is arbitrary, its choice may be driven by simplicity and numerical stability. For instance, we can take the sequence
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_t = v^* , t = 1 , \ldots , T$$\end{document}
, where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v^*$$\end{document}
is the latent state corresponding to the highest persistence probability, that is,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} v^* = {{\rm argmax}\atop {v}} \quad \; \ \pi_{v \mid v}.\end{align*}\end{document}
We now show how the quantities required to perform the M-step of the EM algorithm can be obtained from the recursion developed in Section 4.1, using an amount of memory that does not depend on T. In particular, we recall that in order to update the parameters, the EM algorithm requires at every step the quantities ar(v) defined in (6), for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$r = 0 , \ldots , R$$\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v = 1 , \ldots , k$$\end{document}
, where R = 2 for the normal model for continuous outcomes [see definition (7)] and R = c for the model for categorical outcomes [see definition (8)]. Maximum likelihood estimation also requires the quantities
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$b ( u , v ) , u , v = 1 , \ldots , k$$\end{document}
, which are defined in (9).
In order to efficiently compute ar(v), we introduce the functions
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}a_{rt} ( u , v ) = \sum_{s = t}^Tp ( V_s = v \mid V_{t - 1} = u , \textbf{\textit{Y}} = \textbf{\textit{y}} ) g_r(y_t), \quad r = 0 , \ldots , R , u , v = 1 , \ldots , k ,\end{align*}\end{document}
so that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}a_r ( v ) = \sum_{u = 1}^k \delta_1 ( u ) a_{r2} ( u , v ) + \delta_1 ( v ) g_r(y_1). \tag{25}\end{align*}\end{document}
For t = T, we have that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}a_{rT} ( u , v ) = \delta_T ( u , v ) g_r ( y_T ) , \quad u , v = 1 , \ldots , k. \tag{26}\end{align*}\end{document}
Moreover, for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 2 , \ldots , T - 1$$\end{document}
, this function may be obtained as follows:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}a_{rt} ( u , v ) = \sum_{h = 1}^ka_{r , t + 1} ( h , v ) \delta_t ( u , h ) + \delta_t ( u , v ) g_r ( y_t ) , \quad u , v = 1 , \ldots , k ,\tag{27}\end{align*}\end{document}
with δt(u, v) provided by the backward recursion in Section 4.1. Similarly, in order to efficiently compute b(u, v), we introduce the function
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}b_t ( u , v , w ) = \sum_{s = t}^{T - 1}p ( V_s = v , V_{s + 1} = w \mid V_{t - 1} = u , \textbf{\textit{Y}} = \textbf{\textit{y}} ) , \quad u , v , w = 1 , \ldots , k , \tag{28}\end{align*}\end{document}
so that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}b ( u , v ) = \sum_{h = 1}^k \delta_1 ( h ) b_2 ( h , u , v ) + \delta_1 ( u ) \delta_2 ( u , v ) . \tag{29}\end{align*}\end{document}
Even in this case, we use the results from the backward recursion for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 2 , \ldots , T - 1$$\end{document}
as follows:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}b_t ( u , v , w ) = \sum_{h = 1}^kb_{t + 1} ( h , v , w ) \delta_t ( u , h ) + \delta_t ( u , v ) \delta_{t + 1} ( v , w ) , \quad u , v , w = 1 , \ldots , k ,\end{align*}\end{document}
with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}b_T ( u , v , w ) = 0 , \quad u , v , w = 1 , \ldots , k. \tag{30}\end{align*}\end{document}
It is important to stress that all the above quantities may be computed by using only the results from every backward step of the proposed recursion, without the need to store the results of all steps, as happens for the Baum-Welch recursions.
4.3. Implementation of the algorithm
In this section, we clarify some aspects about the implementation of the algorithm and provide an efficient pseudo-code, which is reported in Table 1. A Fortran implementation of this code is available to the reader upon request.
First of all, the algorithm is initialized by computing
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\delta_T ( u , v ) , u , v = 1 , \ldots , k$$\end{document}
, through equation (20). Moreover, according to (24), the corresponding component of the log-likelihood is computed as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\log \frac { \pi_ { v_T \mid v_ { T - 1 } } \ f_T ( y_T \mid v_T ) } { \delta_T ( v_ { T - 1 } , v_T ) } ,\end{align*}\end{document}
with vT−1 and vT suitably chosen, and the functions arT (u, v) and bT (u, v, w) are initialized as in (26) and (30), respectively.
Then, for t from T − 1 to 2, the algorithm computes δt(u, v) by the backward step in (21), and then it updates art(u, v) and bt(u, v, w) by (27) and (28), respectively. Note that in order to efficiently compute δt(u, v), considering (22) we can reformulate equation (21) as follows:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_t ( u , v ) = \pi_ { v \mid u } \ f_t ( y_t \mid v ) \left[ \sum_ { w = 1 } ^k \frac { \delta_ { t + 1 } ( v , w ) } { \pi_ { w \mid v } } \sum_ { h = 1 } ^k \pi_ { h \mid u } \pi_ { w \mid h } \ f_t ( y_t \mid h ) \right] ^ { - 1 } ,\tag { 31 } \end{align*}\end{document}
and then it is first convenient to compute δt+1(u, v)/πv|u for all
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$u , v = 1 , \ldots , k$$\end{document}
and then plug the result in (31). At every cycle, we also update the value of the log-likelihood by adding
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\log \frac { \pi_ { v_t \mid v_ { t - 1 } } \ f _t( y_t \mid v_t ) } { \delta_t ( v_ { t - 1 } , v_t ) } ,\end{align*}\end{document}
with vt−1 and vt suitably chosen.
In the end, the algorithm provides δ1(v) by (23), which is efficiently computed as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\delta_1(v) = \pi_v\ f_1 ( y_1 \mid v ) \left[ \sum_ { w = 1 } ^k \frac { \delta_2 ( v , w ) } { \pi_ { w \mid v } } \sum_ { h = 1 } ^k \pi_h \pi_ { h \mid w } \ f_1 ( y_1 \mid h ) \right] ^ { - 1 } ,\end{align*}\end{document}
and then we compute ar(v) and b(u, v) by (25) and (29), respectively. Moreover, the final value of the log-likelihood is obtained by adding to the previous value the quantity:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\log \frac { \pi_ { v_1 } \ f_1 ( y_1 \mid v_1 ) } { \delta_1 ( v_1 ) } .\end{align*}\end{document}
A crucial point is related to the numerical complexity of the recursion proposed in this section, especially in comparison with the Baum-Welch recursions. It may be easily shown that the number of operations of the proposed recursion is of order O(k4T). Therefore, as for the Baum-Welch recursions, the number of operations of our recursion linearly increases with the length of the series of data (T); however, this number increases faster with the number of latent states (k). On the other hand, the memory requirement is of order O(k3) because, as mentioned above, we do not need to store the result of the proposed recursion at every step. This order is comparable to that required by the approach of Churbanov and Winters-Hilt (2008). We also have to recall that our recursion does not require dummy renormalizations, but this is a minor advantage because implementing these renormalizations to solve the numerical instability of the Baum-Welch recursions is not difficult.
4.4. Global decoding
As illustrated in Section 3.4, the most used algorithm to perform global decoding is represented by the Viterbi algorithm, which, however, requires a specific implementation distinct from that of the estimation algorithm. In this section, we illustrate how to directly obtain the most likely sequence of latent states on the basis of the posterior probabilities in (18) and (19), as an additional result coming from the proposed algorithm.
First of all, consider the following quantity:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\eta ( \textbf{\textit{v}} ) = \log \delta_1 ( v_1 ) + \sum_{s = 2}^T \log \delta_s ( v_{s - 1} , v_s ) ,\end{align*}\end{document}
which corresponds to the posterior probability of a certain sequence of latent states
v
. Its maximum, that is,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\lambda = \max \limits_\textbf{\textit{v}} \eta ( \textbf{\textit{v}} ) ,\end{align*}\end{document}
corresponds to the predicted sequence of latent states
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\textbf{\textit{v}}^*$$\end{document}
according to the global decoding method.
In order to efficiently obtain the above prediction on the basis of the results of the proposed algorithm, we define
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\eta_t ( u , v_t , \ldots , v_T ) = \begin{cases} \log \delta_T ( u , v_T ) , \qquad \qquad \qquad \qquad \qquad \, t = T , \\\\ \log \delta_t ( u , v_t ) + \sum_{s = t + 1}^T \log \delta_s ( v_{s - 1} , v_s ) , \ t = 2 , \ldots , T-1 , \end{cases}\end{align*}\end{document}
and the corresponding maximum
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\lambda_t ( u ) = \max \limits_{v_t , \ldots , v_T} \eta_t ( u , v_t , \ldots , v_T ) , \quad u = 1 , \ldots , k , t = 2 , \ldots , T.\end{align*}\end{document}
First of all, it is worth noting that λt(u) may be obtained by the backward recursion
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\lambda_t ( u ) = \max \limits_h \left[ \lambda_{t + 1} ( h ) + \log \delta_t ( u , h ) \right] , \quad t = 2 , \ldots , T - 1 , u = 1 , \ldots , k ,\end{align*}\end{document}
which is initialized with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\lambda_T ( u ) = \max \limits_{v_T} \log \delta_T ( u , v_T ) , \quad u = 1 , \ldots , k.\end{align*}\end{document}
Then, we obtain the maximum of the posterior probability as follows:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\lambda = \max \limits_u [ \log \delta_1 ( u ) + \lambda_2 ( u ) ] .\end{align*}\end{document}
In order to obtain the sequence of states corresponding to λ, that is,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\textbf{\textit{v}}^*$$\end{document}
, we introduce the indicator
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_{t - 1 , t + s}^* ( u ) , t = 2 , \ldots , T , s = 0 , \ldots , T - t$$\end{document}
, for the latent state at time t + s of the most likely sequence if, at time t − 1, the chain was in the state u; in symbols, we have
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}( v_{t - 1 , t}^* ( u ) , \ldots , v_{t - 1 , T}^*
( u ) ) = {{\rm argmax}\atop_{v_t , \ldots, v_T}}\, \eta_t ( u ,
v_t , \ldots , v_T ) , \quad t = 1 , \ldots , T - 1.\end{align*}\end{document}
Even this sequence may be obtained by the following backward recursion:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}v_{t - 1 , t}^* ( u ) & = {\rm argmax}_{h}\, \left[ \lambda_{t + 1} ( h ) + \log \delta_t ( u , h ) \right] , \quad u = 1 , \ldots , k , \\ v^*_{t - 1 , t + s} ( u ) & = v^*_{t , t + s} ( v^*_{t - 1 , t} ( u ) ) , \quad s = 1 , \ldots , T - t , u = 1 , \ldots , k , & ( 32 ) \end{align*}\end{document}
which is performed for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = 2 , \ldots , T - 1$$\end{document}
and is initialized by
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}v^*_{T - 1 , T} ( u ) = {\rm argmax}_{v_T}\, \log \delta_T ( u , v_T ) , \quad u = 1 , \ldots , k.\end{align*}\end{document}
In the end, the predicted sequence of latent states has elements
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}v^*_1 & = {\rm argmax}_{u}\, [ \log \delta_1 ( u ) + \lambda_2 ( u ) ] , \\ v^*_t & = v^*_{1 , t} ( v^*_1 ) , \quad t = 2 , \ldots , T.\end{align*}\end{document}
As already mentioned, the proposed method for global decoding only needs the results of the proposed backward recursion for maximum likelihood estimation, without requiring the implementation of the Viterbi algorithm, which involves both forward and backward steps. Then, after convergence of the EM algorithm, we can use this decoding method, which may be implemented in the same code of the backward recursion used for estimation. Moreover, in order to efficiently implement this method, consider that we do not need to store all the predicted states
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_{t - 1 , t + s}^* ( u )$$\end{document}
because
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_{t - 1 , t + s}^* ( u )$$\end{document}
is constant in u for s larger than a certain threshold, denoted by
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s_t^*$$\end{document}
, which is typically equal to a small value; in particular, in our simulations we noted that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s_t^*$$\end{document}
is always smaller than 10. In fact, the dependence of the predicted state on a previous latent state rapidly decreases with their distance. Then operation (32) must be performed only for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s = 1 , \ldots , s^*_t - 1$$\end{document}
, where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s_t^*$$\end{document}
is found as the first value of s such that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_{t - 1 , t + s}^* ( u )$$\end{document}
is constant in u. Also note that, for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s = s_t^* , \ldots , T - t , v_{t - 1 , t + s}^* ( u )$$\end{document}
directly corresponds to
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v^*_{t + s}$$\end{document}
for any u, and then global decoding may be partially performed at any step and certain
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$v_{t - 1 , t + s}^* ( u )$$\end{document}
may then be discarded from the memory. This implies that the additional memory requirement of the proposed algorithm for global decoding, with respect to the algorithm illustrated in the previous section, is of order
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O(ks^*)$$\end{document}
, where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s^* = \max_t s^*_t$$\end{document}
; in this computation, we do not consider the memory to store
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\textbf{\textit{v}}^*$$\end{document}
, which is in any case of order O(T), the same order of the memory requirement to store the observed sequence of data
y
. On the other hand, the additional requirement in terms of number of operations is of order O(k2T); however, if the global decoding is performed separately from the estimation algorithm, the time complexity is of order O(k3T).
5. Simulation Study
In this section, we illustrate the results of a simulation study in which we compare the performance of the proposed algorithm for maximum likelihood estimation of HM models with two alternative estimation algorithms. In particular, we set up a comparison, in terms of memory requirement and computing time, with the Baum-Welch algorithm (Baum et al., 1970; Welch, 2003) and the linear memory algorithm of Churbanov and Winters-Hilt (2008). As already outlined, the latter represents the most efficient alternative of the Baum-Welch algorithm. Moreover, we also set up a comparison, again in terms of memory requirement and computing time, between the original Viterbi algorithm and the proposed method for global decoding. All the algorithms have been implemented by means of a series of R functions with the recursions implemented in Fortran (gfortran compiler) and were run on an iMac with Intel Core i7 processor at 2.8 GHz with 8 gigabytes of RAM.
In more detail, we simulated 100 samples from the homoscedastic HM model described in Section 2.1 for different values of T (length of the time series), k (number of latent states), and σ2 (conditional variance of the response variables that is common to all states), and for different structures of the transition matrix. In particular, the assumed transition matrix has elements parameterized through an index ρ as follows:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\pi_ { v \mid u } = \frac { \rho^ { \mid v - u \mid
} } { \sum_ { h = 1 } ^k \rho^ { \mid h - u \mid } } , \quad u ,
v = 1 , \ldots , k ,\end{align*}\end{document}
so that high values of ρ correspond to a low probability of persistence of the series and vice-versa. Also note that σ2 allows us to control the degree of separation between the components: an increase in the value of the variance corresponds to a lower degree of separation. This is because we assumed a sequence of k equally-spaced values for μv, from −(k − 1) to k − 1.
The values of T, k, σ2, and ρ adopted in the simulation are chosen as follows:
• T = 1,000, 10,000, 100,000;
• k = 3, 5, 10;
• σ2 = 0.1, 0.2, 0.5;
• ρ = 0.1, 0.2, 0.5.
In this way, we considered 81 different scenarios. Moreover, we also assumed uniform initial probabilities, that is, πv = 1/k.
For each of the 100 samples drawn under each scenario described above, we applied the EM algorithm to estimate the model parameters and recorded the memory demand, in kilobytes, and the computing time, in seconds, using the three recursions under comparison. We also recorded the computing time and the amount of memory required for global decoding by the proposed algorithm, together with those required by the Viterbi algorithm. The adopted criterion for stopping the EM algorithm was based on the relative difference in terms of log-likelihood between two consecutive steps. In particular, the EM algorithm is stopped when
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \frac { \mid \ell ( { \bf \theta } ^ { ( m ) } ) -
\ell ( { \bf \theta } ^ { ( m - 1 ) } ) \mid } { \mid \ell ( {
\bf \theta } ^ { ( m - 1 ) } ) \mid } \leq 10^ { - 10 }
,\end{align*}\end{document}
with m denoting the m-th step of the EM algorithm and θ(m) being the parameter value at the end of this step. Furthermore, in order to avoid numerical instability, the initial and transition probabilities are raised to the minimum value of 10−100 during the estimation algorithm; the same criteria are applied for the conditional normal densities. Finally, for a meaningful comparison between the algorithms, we did not take into account the amount of memory required to store the observed time series. When comparing the memory complexity of the algorithms for global decoding, we did not consider the amount of memory required to store the predicted sequence of latent states. Finally, in the proposed global decoding, we set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$s^*$$\end{document}
= 10 as the maximum threshold for memory requirement; we verified that this threshold has never been exceeded in all the scenarios under consideration.
In Table 2, we report the amount of memory (in kilobytes) required by every recursion; this amount is computed on the basis of the results from the 100 samples simulated under each scenario. Note that we report the memory requirement for every combination of T and k, and not also for every combination of ρ and σ2, because this memory requirement is not affected by the latter two parameters.
The table shows the median of the memory requirement (in Kb) in 100 simulated samples for the recursion algorithms under comparison, for different values of T and k.
BW, the algorithm based on the Baum-Welch recursions; lin, linear memory algorithm; BB, the proposed algorithm based on the Bartolucci-Besag recursion; BB-glob, the proposed algorithm for global decoding; Viterbi, the Viterbi algorithm.
As may be expected, the memory requirement of the recursion proposed by Churbanov and Winters-Hilt (2008) and that of our recursion do not depend on the length of the time series. Moreover, we note that the two results are very close, and are moderately affected by the increase in the number of latent states because there is a group of objects of fixed dimension that are stored in any case. On the other hand, the results confirm that the memory complexity of the Baum-Welch recursions linearly increases with the length of the observed series of data and also increases with the number of states. This makes difficult or impossible its application for very long series that can be encountered in certain applications, as mentioned in Section 1. About the algorithms for global decoding, we observe that the memory requirement of the proposed algorithm is significantly smaller than that of the original Viterbi algorithm, and, unlike the latter, it does not depend on the length of the time series.
Table 3 shows the median, over the simulated samples, of the computing time (in seconds) required by the three estimation algorithms of interest, together with the time complexity of the proposed algorithm for global decoding. In this case, the results are affected by the values of ρ and σ2. This is because the degree of persistence of the series and the degree of separation of the components affect the complexity of the model to be estimated and, in turn, the number of iterations required by the EM algorithm to reach the convergence. In more detail, for all the recursions under comparison, high values of the variance correspond to components that are not well separated and to a model that is more difficult to estimate. Then, the computing time required to run the algorithms increases. Furthermore, the computing time required by the algorithms also increases as the probability of persistence of the series decreases.
The table shows the median of the time requirement, in seconds, in 100 simulated samples for the recursion algorithms under comparison, for different values of T, k, ρ, and σ2.
BW, the algorithm based on the Baum-Welch recursions; lin, linear memory algorithm; BB the proposed algorithm based on the Bartolucci-Besag recursion; BB-glob, the proposed algorithm for global decoding; and Viterbi, the Viterbi algorithm.
Obviously, for all the algorithms, the computing time is also affected by the length of the series (T) and by the number of latent states (k). It is clear from Table 3 that the number of operations increases faster in T and k for the linear memory recursion and for the proposed recursion with respect to the Baum-Welch recursions. Moreover, the computing time required to run our recursion is moderately higher than that required by the linear memory implementation, but the ratio between the two computing times appears to be constant under each scenario.
Finally, Table 3 shows that the time requirement of both algorithms for global decoding is very low, even if the computing time required to run the proposed algorithm is relatively higher than that to run the Viterbi algorithm, and is not affected by the values of ρ and σ2. For instance, for the longest series (T = 100,000) and the highest number of states (k = 10), the Viterbi algorithm requires around 0.22 seconds, whereas our global decoding method requires around 0.38 seconds.