Abstract
We derive a multivariate latent Markov model with number of latent states that can possibly change at each time point. We model both the manifest and latent distributions conditionally on explanatory variables. Bayesian inference is based on a transdimensional Markov Chain Monte Carlo approach, where Reversible Jump is separately performed for each time occasion. In a simulation study, we show how our approach can recover the true underlying sequence of latent states with high probability, and that it has lower bias than competitors. We conclude with an analysis of the well-being of 100 nations, as expressed by the dimensions of the Human Development Index, for six-time points spanning a period of 22 years. R code with an implementation is available as supplementary material, together with files for reproducing the data analysis.
Keywords
Introduction
Latent Markov (LM) models (Zucchini et al., 2016; Bartolucci et al., 2013, 2014) provide a general and flexible framework for modelling univariate and multivariate panel data. They are based on local independence assumptions, where each outcome is independent of the past and other outcomes conditionally on covariates; and on an unobserved discrete latent variable
Our motivating application involves analysing the progress of nations’ development. In this analysis nations’ progress is not only evaluated by Gross Domestic Product (GDP), but it also involves health and education (United Nations Development Programme, 1990). The two main questions we address in this work are how many clubs of nations (number of latent states) one can identify at each measurement occasion, and what are the determinants of mobility between different levels of development. The first question was addressed also in Anderson et al. (2019a, b), where more details about the motivating economic theory and background are given. Here we use updated data up to 2019, and unlike previous works, we provide a formal statement about the posterior distribution for the sequence of latent states.
In our motivating example, the latent states identify clubs of nations with similar human development profile. It is natural to wonder how many clubs there are, and if and how these change over time. Theory of convergence in economics (Johnson and Papageorgiou, 2020) postulates that the number of clubs should reduce over time, and finally converge to a single club. Rectangular LM models can be useful anyway in several other settings. First of all, there are several applications in which the number of latent states possibly changes over time. Our example is from macroeconomics, where the idea of varying number of clubs of nations is generally valid. In microeconomics latent states often identify individual propensities; and some new attitudes or behaviours might emerge or disappear over time, for example, in the study of fertility, work histories and retail. This is particularly common in our experience with multivariate outcomes, where new patterns (e.g., high income but low work intensity) might emerge or disappear over time. Similar examples exist in epidemiology and ecology: at an aggregate (e.g., area) level the number of clusters might change due to changing conditions (for the disease and/or risk factors in epidemiology, for climate in ecology). At individual level in epidemiology any intervention can cause the number of latent states to change (e.g., in drug abuse whenever a new drug is introduced, or dealing strategies change, including prices; or a new campaign is launched to raise awareness). In ecology, latent states often identify behaviours of animals, which can change unexpectedly in response to cyclic (e.g., rain) patterns and due to interaction with other species. We shall conclude by adding that actually in any application the number of latent states might change over time for idiosyncratic reasons, and a standard LM model might yield biased fixed effects estimates in that cases. In our experience with standard LM models it can happen that some latent states are almost empty at certain time points, an indication of varying number of latent masses. For this reason, we suggest to always explore the class of rectangular LM models, even when the number of latent states is not expected to change over time.
We give two main methodological contributions. First of all, we specify a completely general rectangular LM model, where outcomes are a mix of continuous and categorical measurements and both the manifest and latent distributions are conditioned on covariates. Covariates for the latent distribution (Bartolucci et al., 2007, 2009) can be particularly useful to explain transitions, as in our motivating application, which indeed is based on Gaussian outcomes. Secondly, we derive an efficient transdimensional Markov Chain Monte Carlo sampler to obtain the posterior distribution of all parameters (therefore embedding the intensive model-choice step within the posterior approximation procedure). A natural by-product of our sampler is the posterior distribution for the sequence of the number of latent states.
Reversible Jump Markov Chain Monte Carlo (RJ-MCMC) (Green, 1995) is not new in the LM context, see for instance Robert et al. (2000), Cappà et al. (2003) and Cappà et al. (2005). As in Spezia (2010) and Bartolucci and Pandolfi (2018), we will use random walk Metropolis steps and avoid use of the augmented likelihood. In our experience, these two choices are particularly advantageous in the rectangular LM context, in terms of computational burden and mixing properties of the chain. See also Paroli and Spezia (2010).
The rest of the article is organized as follows: in the next section we will introduce rectangular LM models with covariates, and detail how to compute the likelihood and to specify default prior distributions for the parameters involved. In Section 3 we will give details on a RJ-MCMC algorithm that can be used to approximate the posterior distribution for all parameters involved, including the number of latent states. The approach is illustrated via a brief simulation study in Section 4 and through the analysis of our motivating dataset in Section 5. Some concluding remarks about the methodology and the implications of data analysis are given in Section 6. R code with an implementation is available as supplementary material, together with files for reproducing the data analysis.
Rectangular LM models
In our setting, we wish to flexibly model an
A rectangular LM model is obtained when
where, for example, a proportion
The latent distribution can be conditioned on covariates through generalized linear models (see also Bartolucci et al., 2009, for the case of standard LM models). To be more precise, use of
whereas transition matrices can be modelled using lack-of-transition elements as baseline, for
For ease of notation, we collect initial coefficients in the array
The model is completed by specification of a manifest distribution
Interpretation of the latent states, regardless of the sequence
We now give an expression for the number of parameters when
We do not therefore make a homogeneity assumption for the variance of
Direct computation of the observed log-likelihood would be cumbersome. It would indeed involve a summation over all possible values that the sequence
and, when
Since
Our inferential procedure does not require additional computational overhead, being based on the observed likelihood. After sampling from the posterior we might anyway be interested in obtaining an estimate of
It is finally straightforward to see that
We might use the estimated latent trajectory also for an overall assessment of persistence and variability across latent states. Given our rectangular framework, this is not straightforward. To this end, we claim there is persistence if the same units are assigned to the same latent state over time. Let
where
Let
If covariates are used for the manifest distribution,
An additional prior might have to be specified for each vector of nuisance parameters involved in the conditional distributions of
Posterior inference
In this section, we describe an MCMC approach, based on a set of fixed-dimensional and transdimensional moves, in order to efficiently approximate the posterior distribution for the parameters and the configuration of time-varying latent states. Metropolis-Hastings steps are implemented in order to update the parameters of the model conditionally on
Fixed-dimensional moves
Fixed-dimensional moves are used to update model parameters conditionally on We then update only slopes associated with transitions that actually occur in the current configuration Similarly, we update latent intercepts and standard deviations for each number of latent states occurring at least once in the current configuration If covariates are used for the manifest distribution,
Each block of proposals is accepted or rejected at random. Candidates
where
In case transition matrices are constrained so that certain transitions are impossible, we simply do not update the corresponding parameters.
Split/combine moves
A separate transdimensional sampling must be performed for each
In case a split step is selected, a latent state
This is performed differently according to whether
For a split move when
with
with
with
When
where the operation is intended elementwise. If
for
for
The split move at
where we used a tilde to denote the parameters characterized by
More details are given in the Supplement.
A similar reasoning shall be applied for each
If
with
One then shall check if subsequence
with
The split move is accepted with probability
where we let
The last set of split/combine moves corresponds to the case
A second set of transdimensional moves involve birth and death of regimes. In a birth move, if needed a new regime
When
where as anticipated we generate
When
where as before
Finally, if
Label switching is tackled in-line by reordering the parameters at the end of each iteration so that
In our implementation, after burn-in and thinning, we estimate the posterior distribution of parameters involved as usual. Namely, for the sequence of discrete parameters, we both look for the most frequent configuration sampled, and for the most frequent number of latent states for each
A simulation study
We report here results of a simulation study. We fix
For each scenario, we generate data
In Table 1 we report the probability of selecting the correct sequence of latent states for each method and scenario, and the square Root of the Median Squared Error (RMSE) for the posterior mean of each approach when the true sequence of latent states is varying. Table 2 reports analogous quantities to assess the performance of the proposed methodology in scenarios where the true sequence of latent states is constant. For the RMSE we restrict to estimates of
Simulation study with rectangular data generation process. Probability of correct identification (
) for the sequence of latent states; and RMSE for
,
and
, in different scenarios. RJC denotes our proposal, RJ our proposal with omitted covariates, and FC a standard LM model with
;
denotes the value of covariate coefficients at data generation, and
group separation
Simulation study with rectangular data generation process. Probability of correct identification (
) for the sequence of latent states; and RMSE for
,
and
, in different scenarios. RJC denotes our proposal, RJ our proposal with omitted covariates, and FC a standard LM model with
;
denotes the value of covariate coefficients at data generation, and
group separation
Simulation study with standard data generation process with
. Probability of correct identification (
) for the sequence of latent states; and RMSE for
,
and
, in different scenarios. RJC denotes our proposal, RJ our proposal with omitted covariates, and FC a standard LM model with
;
denotes the value of covariate coefficients at data generation, and
group separation
From Tables 1 and 2 it can be seen that our RJC/RJ approaches are able to correctly identify the true sequence of latent states with high probability; even when covariates are omitted, and independently of whether the true latent sequence varies with time or not. This probability increases with the sample size. Similarly, both with and without covariates estimates for
Well-being, development, and wealth are multidimensional and complex characteristics, which can not be directly measured. There are several possible ways to indirectly measure these characteristics at national level. We rely here on the Human Development Index (HDI), which is the official index of the United Nations Development Programme (1990). The HDI is a geometric average of measurements of three domains. The first domain involves income levels, as measured by the Gross National Income (GNI) per capita in purchasing power parity (PPP) international dollars. The second is an indicator of health, as measured by life expectancy at birth. The third is an indicator of education level, as measured by a weighted average of expected years of schooling and mean years of schooling. In this work, similarly to other papers investigating well-being of nations, we do not use HDI as a univariate endpoint, but instead model the
We collect data for each country and year of interest, over a time horizon spanning from 1998 to 2019 for
Table 3 reports yearly medians and interquartile ranges (IQR) for the described panel. A clear increasing trend is seen for all indicators, excluding Trade which has a quadratic trend. All in all, the world has improved over the past twenty years in terms of HDI dimensions, and government effectiveness. Shares of trades have slightly decreased after peaking. Looking at IQR, we see that heterogeneity across nations in terms of education and health have decreased, while variability in terms of income have increased.
HDI data. Year-specific median and interquartile range (IQR) for HDI components, government effectiveness, and trade.
HDI data. Year-specific median and interquartile range (IQR) for HDI components, government effectiveness, and trade.
We now proceed with data analysis. We first remove overall trends from the data, including the covariates, by subtracting year-specific overall medians. We fit in this section the unconstrained model, while in the Supplement we report results about the model constrained to avoid transitions to non-adjacent states and admitting the number of groups to increase or decrease by at most one unit at each occasion. These are anyway very similar to the ones reported in this section. We then run our Markov Chain Monte Carlo algorithm for 250 000 iterations, which took about 72 hours to complete. At convergence, we discard the initial 20 000 iterations as burn-in. In order to reduce computation time, only 12 000 iterations after burn-in include transdimensional updates. We checked convergence evaluating trends and Auto Correlation Functions, both visually and with formal tests. These results are not shown for reasons of space. We also computed the Gelman and Rubin (1992) Potential Scale Reduction Factor with two parallel chains, obtaining a value of 1.001, well below the cut-off of 1.1 (Gelman et al. 2014). The final sequence is thus satisfactory in terms of convergence.
The upper panel of Table 4 reports the posterior distribution for the number of latent states on each time occasion. Our RJC approach indicates the presence of
HDI data. Posterior distribution for the number of latent states at each time occasion. Upper panel: default priors. Lower panel: informative priors
In order to better compare our results with previous contributions, we fit our model again using an informative prior for the number of latent clusters at each time occasion. Specifically, we used a priori information with
Table 5 reports parameters’ estimates for the centroids and the standard deviations. It can be seen that the four latent states are well separated, and can be interpreted as representing increasing levels of well-being overall. Standard deviations are clearly increasing over increasing levels of well-being for GNI. On the other hand, standard deviations for life expectancy are decreasing as some countries with low GNI and education are still successful in guaranteeing a good life expectancy, and some in the same group have a very low life expectancy. Rich countries have little differences in life expectancy at birth. Heterogeneity for education levels shows instead a parabolic shape over latent states.
HDI data. Posterior means for latent centroids and standard deviations for the
latent states. 95% highest-posterior-density intervals are reported in parenthesis
Median estimated initial probabilities, after averaging across country-specific estimates, are
Results for the transition probabilities suggest high persistence of units across latent states over time. The index of persistency (2.6) indeed is equal to
HDI data. Posterior means for
and
parameters. Transitions to states with identical labels are used as a reference category for the multinomial logit transformation. For the initial probabilities,
is used as reference. 95% highest-posterior-density intervals are reported in parenthesis
Table 6 summarizes the posterior means of coefficients modulating initial and transition probabilities, together with 95% credible intervals (CI). For initial probabilities, increments in the Gov. Eff. generally lead to higher probabilities of belonging to latent states with higher well-being. The increment is proportional as
We conclude this section by reporting on relevant estimated country-specific transitions. Using a MAP approach, it can be inferred that China has improved its level of well-being over time, with a transition occurring in 2011 from state 2 to state 3. India has transitioned from state 1 to state 2 in 2007. Countries such as the USA and Sweden persistently dwelled in latent state 4. Russia has persisted in latent state 3, and countries such as Nigeria, Niger and Bangladesh have not moved from state 1 during the period of observation. For reasons probably linked to political instability and war, Libya has experienced two latent transitions, one in 2007 and one in 2011, declining from state 4 to state 2 in a short time frame.
We have proposed a general framework for modelling rectangular LM models with covariates for the latent distribution. Our Bayesian framework allows us to make inference also on the sequence of latent states. The sampling strategy proposed does not use completion, and proves to be flexible, have good convergence properties, and avoid computational overheads. It can be used also for standard LM models, as we did in the simulation study. In our experience, our Bayesian fitting procedure is also advantageous over frequentist approaches in terms of computing times with slightly more complex models (e.g., more than one outcome, more than three or four covariates, longer panels).
Our simulation studies clearly indicate that more standard frameworks, which are embedded in our model class, might lead to biased estimates when the true data-generating mechanism is not well specified. Extensions include the case of mixed LM models (Altman, 2007; Maruotti, 2011; Bartolucci and Farcomeni, 2015; Naranjo et al., 2020), in which additional random effects can be used for clustered data; use of regularization (Farcomeni, 2017; Otting and Andreas, 2021), both for increased stability and use of more flexible (e.g., non-parametric) regression functions; and the use of copulas (e.g., Brunel and Pieczynski, 2005; Hardle et al., 2015; Otting et al., 2021) to relax the conditional independence assumption.
We stress that researchers need to be careful, as common with mixture models, about which covariates to include and whether they should model only the manifest, only the latent, or both distributions conditionally on them. Interpretation of the results is different, and more importantly inclusion of irrelevant covariates can have unpredictable effects on the estimates. It is known in the literature that including covariates might indeed either increase or decrease the true number and variability of the latent states across time, and similarly for standard errors of parameters. For some more details and additional remarks see Anderson et al. (2016); Bartolucci et al. (2013); Böckenholt (1997); Di Mari et al. (2022).
We used our model to investigate the dynamics of human development over the period 1998–2019. There is strong evidence that countries are clustered into four well-separated groups that correspond to different stages of well-being, in line with other contributions focusing on the identification of nations’ clubs (Phillips and Sul, 2007, 2009; Pittau et al., 2010). We estimated limited but relevant mobility between classes. The degree of government effectiveness, which reflects the capacity of the government to effectively formulate and implement sound policies, is clearly positively associated both with initial state and transition probability. Trade plays a more complex role. Our results are linked with the empirical literature on the convergence hypothesis, see, for example, Johnson and Papageorgiou (2020) for a review. Our findings are consistent with the theory of club convergence, with growth being somehow constrained to the limiting behaviour of countries. More in detail, while some countries are improving their conditions, others are left behind, in a growth acceleration process that is fragile and fragmented. As a consequence, the number of nations’ clubs is not reducing over time, as one would expect in a convergent world, and only their composition is changing.
Supplementary materials
Supplementary materials for this article are available online.
Supplemental Material for Covariate-modulated rectangular latent Markov models with an unknown number of regime profiles by Alfonso Russo, Alessio Farcomeni, Maria Grazia Pittau, Roberto Zelli, in Statistical Modelling
Footnotes
Acknowledgements
The authors wish to thank an AE and two anonymous referees for constructive suggestions.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
