Bayesian Adaptive Lasso for Ordinal Regression With Latent Variables

Abstract

We consider an ordinal regression model with latent variables to investigate the effects of observable and latent explanatory variables on the ordinal responses of interest. Each latent variable is characterized by correlated observed variables through a confirmatory factor analysis model. We develop a Bayesian adaptive lasso procedure to conduct simultaneous estimation and variable selection. Nice features including empirical performance of the proposed methodology are demonstrated by simulation studies. The model is applied to a study on happiness and its potential determinants from the Inter-university Consortium for Political and Social Research.

Keywords

Bayesian adaptive lasso latent variable MCMC methods ordinal response

Introduction

In the social, behavioral, and psychological sciences, ordinal data are routinely collected in surveys and can be described as a collection of ratings of individual items into ordered categories. The ordinal regression is among the most important tools to investigate the effects of explanatory variables (or predictors) on the ordinal responses of interest (see, e.g., Johnson 1996; McCullagh 1980; Moustaki 2003). Conventional ordinal regression assumes that predictors are observable and directly assessable. However, latent traits that should be characterized by several highly correlated observed indicators from different perspectives are common in many applications (Bennink, Croon, and Vermunt 2013; Bollen 1989; Lee and Song 2007; Skrondal and Rabe-Hesketh 2004). Typical examples include “life satisfaction” summarized by satisfaction in homelife, financial situation, and other things; as well as “quantitative ability” measured by test scores in mathematics, physics, and chemistry. Conventional models manage latent predictors by independently incorporating one or more of their indicators into the regression analysis. However, limitations are evident with such independent analysis. First, multicollinearity that could seriously distort the statistical inference may occur when the highly correlated indicators of a latent trait are simultaneously included in a regression. Second, given that each indicator only reflects a single characteristic of a latent trait, the independent analysis with the partial information of latent traits merely reveals incomplete relationships between variables and is therefore incapable of providing a comprehensible interpretation for the overall effect of a latent trait on the response of interest. Last but not least, high-dimensional covariates may cause the so-called curse of dimensionality problem.

The most popular technique for measuring latent variables on the basis of multiple observed indicators is the factor analysis model (Harman 1976; Kenny and Judd 1984; Lee 2007; Song and Lee 2001, among others). In this study, we propose a joint model to analyze ordinal data with latent variables. The proposed model comprises a confirmatory factor analysis (CFA) model and an ordinal regression model. The CFA model measures latent variables through correlated observed variables, whereas the ordinal regression assesses the effects of observed and latent predictors on ordinal responses of interest. Compared with conventional regression, this joint model incorporates the information from multiple indicators and thus greatly enhances its analytic power and provides scientific evidence for the effects of latent traits that are well known but hard to measure directly. The multicollinearity problem that often occurs in a regression can be naturally eliminated because the highly correlated observed variables are grouped into relatively independent latent factors through a CFA model. Under the joint modeling framework, our model simultaneously manages ordinal responses, the most important discrete data type, and unobservable predictors. Such multiple tasks cannot be accomplished by a simple regression analysis.

Although the CFA model can help reduce dimensionality of the predictors in the regression, identifying relevant predictors to obtain a parsimonious model is always of importance in statistical inference. Most contemporary research in variable selection of regression models is focused on parametric or nonparametric regression without latent variables. A majority of the existing methods lie in two directions. The first direction is the use of criterion-based procedures such as Akaike information criterion (AIC) and Bayesian information criterion (BIC) or penalized approaches in a frequentist framework, whereas the second direction is the development of Markov chain Monte Carlo (MCMC) methods including the analogues of the above procedures in a Bayesian framework. Given that the present study involves latent variables and ordinal responses, which induce an intractable likelihood function, the likelihood-based methods in the former direction are not directly applicable. On the contrary, the sampling-based Bayesian approach is feasible to provide an efficient and reliable analysis for the proposed joint model. Unlike traditional practices, which either perform a pairwise comparison via Bayesian model comparison statistics such as Bayes factor (Kass and Raftery 1995) and deviance information criterion (Spiegelhalter et al. 2002) or conduct a stochastic searching variable selection (SSVS) in the entire model space, we propose the use of a Bayesian version of the least absolute shrinkage and selection operator (lasso; see, e.g., Park and Casella 2008) to perform simultaneous estimation and variable selection in the present study. As demonstrated in the subsequent numerical studies, the Bayesian lasso (BLasso) conducts internal variable selection on model components and automatically determines an appropriate model, thereby avoiding a tedious pairwise comparison and eliminating convergence issues elicited by the varying dimensional parameter space in the implementation of the SSVS algorithm (Gilks, Richardson, and Spiegelhalter 1996).

Regularization methods such as lasso have been widely applied to substantive research in the past decades (Hoti and Sillanpää 2006; Magis, Tuerlinckx, and Boeck 2015; Tibshirani 1996). However, limitations of the lasso procedure including inconsistency in certain conditions and suffering from appreciable bias have been evident (Fan and Li 2001; Wang, Li, and Tsai 2007). To address these problems, an adaptive lasso and its variants have been proposed (see, e.g., Zou 2006). Given a high demand of Bayesian methods in the analysis of complex models and data types, the Bayesian adaptive lasso (BaLasso) has recently been developed and found to be highly efficient, conceptually simple, and easy to implement (Alhamzawi, Yu, and Benoit 2012; Leng, Tran, and Nott 2014). Nevertheless, either BLasso or BaLasso is rarely applied to the regression with latent variables. To the best of our knowledge, this is the first study to use the BaLasso procedure to conduct simultaneous estimation and model selection in the context of the proposed model.

The rest of the article is organized as follows. The second section defines the proposed model, where the ordinal responses are postulated to be associated with the underlying continuous variables. The relation between each ordinal response and its underlying continuous measurement is defined via a threshold specification. The third section develops the BaLasso procedure along with the MCMC algorithm for posterior inference. The fourth section presents simulations to demonstrate the empirical performance of the proposed methodology. The fifth section reports an application to a study of happiness and its determinants on the basis of the World Values Survey. The sixth section concludes the article with discussion. The technical details are provided in the Online Appendices.

Ordinal Regression With Latent Variables

Model Description

Let $z_{i} = (z_{i 1}, \dots, z_{i s})^{T}$ be an s × 1 random vector of ordinal variables corresponding to the ith random sample of size n, where z_ij takes integer values in {1,2,…, H_j }. We consider the underlying continuous vector $y_{i} = {(y_{i 1}, \dots, y_{i s})}^{T}$ , where y_ij relates to z_ij through a threshold link function $h_{j} (\cdot)$ defined as follows (Johnson 1996, 2008; Muthén 1984; Yuan, Wu, and Bentler 2011). For i = 1,…, n and j = 1,…, s,

z_{i j} = h_{j} (y_{i j}) = \sum_{l = 1}^{H_{j}} l \cdot I (α_{j, l} \leq y_{i j} < α_{j, l + 1}),

where $I (\cdot)$ is an indicator function that takes 1 if $α_{j, l} \leq y_{i j} < α_{j, l + 1}$ and 0 otherwise, and ${- \infty = α_{j,1} < α_{j,2} < \dots < α_{j, H_{j}} < α_{j, H_{j} + 1} = + \infty}$ is a set of thresholds defining the H_j categories. As pointed by Johnson (1996), the continuous measurements are introduced to underlie the generation of the resulting ordering of all subjects in the population.

Let $x_{i} = (x_{i 1}, \dots, x_{i p})^{T}$ be a p × 1 vector of observable covariates, and $ω_{i} = (ω_{i 1}, \dots, ω_{i q})^{T}$ be a q × 1 random vector of latent variables. To assess the effects of x _i and ω _i on z_ij , we propose a regression to model the underlying continuous measurement y_ij rather than z_ij as follows:

y_{i j} = β_{0 j} + β_{1 j}^{T} x_{i} + β_{2 j}^{T} ω_{i} + ∊_{i j},

where β _0j is an intercept, β _1j and β _2j are p × 1 and q × 1 vectors of regression coefficients, ∊_ij is a random error distributed as $N [0, σ_{j}^{2}]$ and independent of ω _i.

Let $u_{i} = (u_{i 1}, \dots, u_{i m})^{T}$ be an m × 1 vector of observed indicators that characterize the q dimensional latent vector ω _i (m > q). The CFA model for relating u _i to ω _i is defined as:

u_{i} = μ + Λ ω_{i} + ζ_{i},

where μ is an m × 1 vector of intercepts, Λ is the m × q factor loading matrix, ω _i is assumed to be distributed as N[0,Φ] with a covariance matrix Φ, and ζ _i is an m × 1 vector of random errors independent of ω _i and distributed as N[0,Ψ] with a diagonal covariance matrix Ψ.

Notably, the joint model defined by models (2) and (3) is different from the conventional mixed-effect model with ordinal responses. Unlike random effects that only address the dependency of responses, the latent variables in ω _i represent latent traits (e.g., life satisfaction) that truly exist but cannot be fully characterized by a single observed indicator. Thus, ω _i in model (2) not only addresses the possible heterogeneity of the data but also provides insights into their impacts on the responses of interest, consequently increasing the capability of the model in terms of interpretation.

For notational simplicity, in the subsequent description, we suppress the subscript j in models (1) and (2) by assuming s = 1. Considering that y_ij (and thus z_ij ), j = 1,…, s are conditionally independent given ω _i , the methodology developed for s = 1 can be extended to the case of s > 1 without difficulty. By omitting the subscript j, the ordinal regression model defined by models (1) and (2) can be simplified as:

z_{i} = h (y_{i}) = \sum_{l = 1}^{H} l \cdot I (α_{l} \leq y_{i} < α_{l + 1}),

y_{i} = β_{0} + β_{1}^{T} x_{i} + β_{2}^{T} ω_{i} + ∊_{i},

where ${- \infty = α_{1} < α_{2} < \dots < α_{H} < α_{H + 1} = + \infty}$ is the set of thresholds determining the H categories, β ₀, β ₁, and β ₂ are regression parameters, and $∊_{i} \sim N [0, σ^{2}]$ .

Model Identification

The model defined in the model description subsection is not identifiable without imposing identifiability constraints. In this section, we discuss two model indeterminacies and propose appropriate identifiability constraints as follows.

The first indeterminacy is caused by the unknown scale of the underlying continuous measurement y_i corresponding to the ordinal variable z_i , which makes the parameters β ₁, β ₂, σ ², and the thresholds α ₂, …, α_H not simultaneously estimable. We follow a common practice to fix the first or last unknown threshold as well as the dispersion parameter σ ² at preassigned values to identify the location and the dispersion of the underlying normal distribution (Poon and Wang 2012; Shi and Lee 2000). In the present study, we fix α ₂ at 0 and σ ² at 1.

The second indeterminacy is associated with the CFA model (3). For an arbitrary nonsingular matrix M, model (3) can be rewritten as $u_{i} = μ + Λ M M^{- 1} ω_{i} + ζ_{i} = μ + Λ^{*} ω_{i}^{*} + ζ_{i}$ , where Λ* = ΛM and $ω_{i}^{*} = M^{- 1} ω_{i}$ . To address this indeterminacy, we again follow a common practice to fix appropriate elements of Λ at preassigned values (Song and Lee 2012). For instance, in the study of happiness and its determinants in the fifth section, we specify a nonoverlapping Λ according to the meanings of the observed variables in order to obtain an identified model and a clear interpretation of each latent variable.

Bayesian Inference

BaLasso

The lasso procedure that simultaneously performs parameter estimation and model selection is proposed by Tibshirani (1996) under the simple linear model framework:

Y = β_{0} 1_{n} + X β + ε, ε \sim N (0, σ^{2} I_{n}),

where $Y = (y_{1}, \dots, y_{n})^{T}$ , β ₀ is an intercept, 1 _n is an n × 1 vector of all elements being 1, X is an n × p design matrix with the elements of each column being standardized, β is a p × 1 vector of coefficients, $ε = {(∊_{1}, \dots, ∊_{n})}^{T}$ , and I _n is an n × n identity matrix. Model (6) can be regarded as a matrix form of model (5) with observable Y and without latent variables ω _i . The lasso estimator of β in model (6) is the L ₁-penalized least squares estimate obtained from:

arg {min}_{β} {{(\tilde{Y} - X β)}^{T} (\tilde{Y} - X β) + γ \sum_{k = 1}^{p} | β_{k} |},

where γ ≥ 0 is the tuning parameter, and $\tilde{Y} = Y - β_{0} 1_{n}$ .

Park and Casella (2008) later introduced the Bayesian version of lasso by assigning a conditional Laplace prior of β as follows:

π (β | σ^{2}) = \prod_{k = 1}^{p} \frac{γ}{2 σ} e^{- γ | β_{k} | / σ} .

They formulated BLasso through the following hierarchical representation:

\begin{array}{l} Y | β_{0}, X, β, σ^{2} \sim N (β_{0} 1_{n} + X β, σ^{2} I_{n}), \\ β | σ^{2}, τ_{1}^{2}, \dots, τ_{p}^{2} \sim N (0, σ^{2} D), D = diag (τ_{1}^{2}, \dots, τ_{p}^{2}), \\ σ^{2}, τ_{1}^{2}, \dots, τ_{p}^{2} \sim π (σ^{2}) d σ^{2} \prod_{k = 1}^{p} \frac{γ^{2}}{2} e^{- \frac{1}{2} γ^{2} τ_{k}^{2} d τ_{k}^{2},} σ^{2}, τ_{1}^{2}, \dots, τ_{p}^{2} > 0. \end{array}

The conditional prior of β in equation (8) can be obtained by integrating out $τ_{1}^{2}, \dots, τ_{p}^{2}$ based on representation (9). BLasso provides a posterior sample that can be used to summarize the distribution of β . The posterior mean or mode of β can be regarded as its lasso estimator. Given that BLasso is a sampling-based method, it would not shrink the nonsignificant elements of β exactly to 0. However, as shown by Park and Casella (2008), BLasso indeed shrinks the coefficients of unimportant predictors close to 0 much faster than ridge regression does.

One problem with the lasso-type procedures is that the same tuning parameter γ is applied to different regression coefficients, implying that the same amount of shrinkage is introduced to all coefficients. This may add appreciable bias to the resulting estimates (Fan and Li 2001; Wang et al. 2007). To tackle the issue, Zou (2006) developed the adaptive lasso procedure, which modifies model (7) by assigning a distinct tuning parameter for each of the regression coefficients as follows:

arg {min}_{β} {{(\tilde{Y} - X β)}^{T} (\tilde{Y} - X β) + \sum_{k = 1}^{p} γ_{k} | β_{k} |},

where γ _k ≥ 0, and $\tilde{Y} = Y - β_{0} 1_{n}$ . The adaptive lasso automatically imposes large penalties on unimportant coefficients and alleviates penalties on significant coefficients, so that it shrinks unimportant coefficients to 0 more efficiently and produces better estimation for significant coefficients than lasso does (Wang et al. 2007; Zou 2006). Naturally, a Bayesian version of adaptive lasso can be obtained by assigning a conditional Laplace prior with coefficient-specific turning parameters as follows:

π (β | σ^{2}) = \prod_{k = 1}^{p} \frac{γ_{k}}{2 σ} e^{- γ_{k} | β_{k} | / σ} .

Similar to the adaptive lasso, BaLasso introduces different penalties to various coefficients to enhance its capability of producing good estimation and model selection results.

As highlighted in Tibshirani (1996) and Park and Casella (2008), the lasso estimator of β can guide the variable selection in model (6) because the observed covariates in X are standardized to the same scale. If $| β_{k} |$ is close to 0, then the kth covariate is unimportant and can be removed from the model. However, the proposed ordinal regression includes latent variables ω _i that should be updated in MCMC iterations. This feature makes the standardization of all predictors beforehand impossible, consequently disabling the identification of important and unimportant predictors based on their estimated coefficients. One possible way for addressing the problem is to standardize ω _i in each MCMC iteration based on the posterior samples. However, this method is time-consuming because it requires additional computation in each iteration. Another simple alternative is to obtain an initial estimate of ω _i based on model (3). For each component of ω _i , we then calculate its mean and variance for standardization. As shown by Guo et al. (2012) and the simulations in the fourth section, both methods perform satisfactorily but the latter simplifies the computation significantly.

Prior Specification

Based on model (10), the BaLasso estimators of the coefficients in model (5) can be defined as:

arg {min}_{β} {\sum_{i = 1}^{n} {(y_{i} - β_{0} - β_{1}^{T} x_{i} + β_{2}^{T} ω_{i})}^{2} + \sum_{k = 1}^{p} γ_{k} | β_{1 k} | + \sum_{k = 1}^{q} γ_{p + k} | β_{2 k} |},

where γ ₁,…,γ_p _+q are nonnegative turning parameters, $β_{1} = (β_{11}, \dots, β_{1 p})^{T}$ and $β_{2} = (β_{21}, \dots, β_{2 q})^{T}$ . The size of each turning parameter represents the magnitude of penalty imposed on the corresponding parameter. According to prior (11), the conditional Laplace prior of ( β ₁, β ₂) can be expressed as:

π (β_{1}, β_{2} | σ^{2}) \propto \exp (- \sum_{k = 1}^{p} \frac{γ_{k}}{σ} | β_{1 k} | - \sum_{k = 1}^{q} \frac{γ_{p + k}}{σ} | β_{2 k} |) .

This prior distribution can be reformulated through a hierarchical model as follows (see details in the Online Appendix A):

\begin{array}{l} y_{i} | x_{i}, ω_{i}, β_{0}, β_{1}, β_{2}, σ^{2} \sim N (β_{0} + β_{1}^{T} x_{i} + β_{2}^{T} ω_{i}, σ^{2}), i = 1, \dots, n, \\ β_{1 k} \overset{i n d}{\sim} N (0, σ^{2} τ_{k}^{2}), k = 1, \dots, p, \\ β_{2 k} \overset{i n d}{\sim} N (0, σ^{2} τ_{p + k}^{2}), k = 1, \dots, q, \\ τ_{k}^{2} \overset{i n d}{\sim} Gamma (1, \frac{γ_{k}^{2}}{2}), k = 1, \dots, p + q, \end{array}

where σ ² is fixed to identify the ordinal regression model. Given that the intercept is usually not interesting, we routinely assign a noninformative uniform prior of β ₀.

For the tuning parameters γ_k , we assign the following gamma priors:

γ_{k}^{2} \overset{i n d}{\sim} Gamma (a_{k 0}, b_{k 0}), k = 1, \dots, p + q,

where a_k ₀ and b_k ₀ are hyperparameters with preassigned values. Following the suggestions of the existing literatures (e.g., Guo et al. 2012; Kyung et al. 2010; Song, Lu, and Feng 2014), we set a_k ₀ = 1 and b_k ₀ = 0.05 to obtain dispersed priors for γ_k . Such dispersed priors enable γ_k to be mainly determined by the data, thereby automatically imposing large penalty on unimportant coefficients and shrinking them to 0 efficiently. The reason can be found from the posterior distributions of γ _k and $τ_{k}^{2}$ as follows (see the Online Appendix B):

p (\frac{1}{τ_{k}^{2}} | \cdot) \overset{D}{=} Inverse Gaussian (\frac{γ_{k} σ}{| β_{1 k} |}, γ_{k}^{2} I (τ_{k}^{2} > 0)), k = 1, \dots, p,

p (\frac{1}{τ_{p + k}^{2}} | \cdot) \overset{D}{=} Inverse Gaussian (\frac{γ_{p + k} σ}{| β_{2 k} |}, γ_{p + k}^{2} I (τ_{p + k}^{2} > 0)), k = 1, \dots, q,

p (γ_{k}^{2} | \cdot) \overset{D}{=} Gamma (a_{k 0} + 1, b_{k 0} + \frac{1}{2} τ_{k}^{2}), k = 1, \dots, p + q,

where “ $p (\cdot) \overset{D}{=}$ ” denotes “the distribution $p (\cdot)$ is equal to.” Solving for the empirical conditional distributions of the $τ_{k}^{2}$ and $τ_{p + k}^{2}$ actually yields the inverse Gaussian distributions of $1 / τ_{k}^{2}$ and $1 / τ_{p + k}^{2}$ . Based on distribution (16) and (17), if β _1k or β _2k is significant, then $τ_{k}^{2}$ tends to be large and would play a dominate role in distribution (18), implying that γ_k is mostly data driven. On the contrary, if β _1k or β _2k is small, $τ_{k}^{2}$ is likely to be small and γ_k is then dominated by a dispersed gamma prior, which tends to result in large turning parameter values. Thus, the proposed hierarchical prior in models (14) and (15) enables BaLasso to automatically and efficiently estimate and select relevant predictors.

For the unknown parameters in model (3), we assign conjugate type priors as follows. Let μ_j and ψ_j be the jth element of μ and Ψ, respectively, and $Λ_{j}^{T}$ be the vector that contains the unknown parameters in the jth row of Λ. For j = 1,…, m,

\begin{array}{l} p (μ_{j}) \overset{D}{=} N (μ_{j 0}, σ_{j 0}^{2}), p (Λ_{j} | ψ_{j}) \overset{D}{=} N (Λ_{j 0}, ψ_{j} Σ_{j 0}), \\ p (Φ^{- 1}) \overset{D}{=} Wishart (R_{0}, ρ_{0}), p (ψ_{j}^{- 1}) \overset{D}{=} Gamma (a_{ζ j 0}, b_{ζ j 0}), \end{array}

where μ_j ₀, $σ_{j 0}^{2}$ , Λ _j ₀, ρ ₀, a_ζj ₀, b _ζj ₀, as well as positive definite matrices Σ _j ₀ and R ₀ are hyperparameters whose values are preassigned.

Finally, we specify the prior distribution for threshold parameters. Given that thresholds are not as interesting and their information is usually unknown, we propose the use of a noninformative prior. Note that α ₂ is fixed at 0 to identify the model, we let $α = (α_{3}, \dots, α_{H})^{T}$ and assign a noninformative prior for α as follows:

p (α) = p (α_{3}, \dots, α_{H}) \propto constant .

Posterior Inference

Let $Z = {z_{1}, \dots, z_{n}}$ be a set of ordinal variables, Y be a set of the underlying continuous measurements corresponding to Z, $α = {α_{3}, \dots, α_{H}}$ be a set of threshold parameters, $U = {u_{1}, \dots, u_{n}}$ be a set of response variables in the CFA model, $X = {x_{1}, \dots, x_{n}}$ be a set of covariates, $ω = {ω_{1}, \dots, ω_{n}}$ be a set of latent variables, $θ_{1} = {β_{0}, β_{1}, β_{2}, τ_{1}^{2}, \dots, τ_{p + q}^{2}, γ_{1}, \dots, γ_{p + q}, σ^{2}}$ be a set of parameters in the ordinal regression model, θ₂ = {μ, Λ, Φ, Ψ} be a set of parameters in the CFA model, and θ = { θ ₁, θ ₂}.

The posterior inference can be conducted by sampling from the joint posterior distribution $p (Y, α, ω, θ | Z, U)$ , which is highly intractable in general. Thus, the Gibbs sampler (Geman and Geman 1984) and MCMC techniques such as the Metropolis–Hastings algorithm (Hastings 1970; Metropolis et al. 1953) are employed to iteratively sample from the conditional posterior distributions of the unknown quantities. The full conditional distributions and other technical details are provided in Online Appendix B.

Simulation Study

In this section, we conduct simulations to evaluate the empirical performance of the proposed method. Simulation 1 considers a univariate ordinal regression and examines the performance of our method under different sample sizes. Simulation 2 compares the proposed method with the conventional approaches. The performance of our method in the case of s > 1 is also examined but not reported.

Simulation 1

We consider the model defined by models (3)–(5) with H = 4, p = 6, q = 4, m = 12, and three sample sizes n = 200, 500, and 1,000. The data sets are simulated as follows: covariates x_i ₁, x_i ₂, x_i ₃, and x_i ₄ are independently generated from N(0,1), ordinal (0.25, 0.25, 0.25, 0.25), U(−1, 1), and t(3), respectively, where ordinal (π ₁,π _2, π _3, π ₄) denotes a four-category ordinal distribution with probabilities (π ₁,π _2, π _3, π ₄), U(a, b) denotes the uniform distribution in [a, b], and t(3) denotes the t distribution with degrees of freedom 3. Covariates x_i ₅ and x_i ₆ are jointly generated from a multivariate normal distribution N(0, Σ), where Σ is a correlation matrix with off-diagonal elements 0.5. The latent vector $ω_{i} = (ω_{i 1}, ω_{i 2}, ω_{i 3}, ω_{i 4})^{T}$ and the vector of random errors $ζ_{i} = (ζ_{i 1}, \dots, ζ_{i,12})^{T}$ are drawn from N(0,Φ) and N(0,Ψ), respectively. The structure of the loading matrix is:

Λ^{T} = [\begin{matrix} 1 & λ_{21} & λ_{31} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & λ_{52} & λ_{62} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & λ_{83} & λ_{93} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & λ_{11, 4} & λ_{12, 4} \end{matrix}],

where the 1s and 0s are fixed parameters, and the λs are unknown parameters. The population values of the unknown parameters are $μ_{1} = \dots = μ_{12} = 0.5$ , $λ_{21} = λ_{31} = λ_{52} = λ_{62} = λ_{83} = λ_{93} = λ_{11, 4} = λ_{12, 4} = 0.8$ , the diagonal elements in Ψ are $ψ_{1} = \dots = ψ_{12} = 0.4$ , the elements in Φ are ${φ_{11}, φ_{12}, φ_{13}, φ_{14}, φ_{22}, φ_{23}, φ_{24}, φ_{33}, φ_{34}, φ_{44}} = {1.0, 0.2, 0.2, 0.2, 1.0, 0.2, 0.2, 1.0, 0.2, 1.0}$ . In the ordinal regression, $β_{0} = 0$ , $β_{1} = {(β_{11}, β_{12}, β_{13}, β_{14}, β_{15}, β_{16})}^{T} = (1, 0, 1, 0, 1, {0)}^{T}$ , $β_{2} = (β_{21}, β_{22}, β_{23}, β_{24})^{T} = (1, 1, 0, {0)}^{T}$ , and $σ^{2} = 1$ . To generate ordinal response z_i , we first generate the underlying continuous y_i based on model (5), then transform y_i to z_i according to model (4) via thresholds ${- \infty = α_{1} < α_{2} < α_{3} < α_{4} < α_{5} = \infty}$ with ${α_{2}, α_{3}, α_{4}} = {- 0.6, 2.1, 3.1}$ . The probabilities of z_i taking values of 1, 2, 3, and 4 are around 0.4, 0.4, 0.1, and 0.1, which mimics the distribution of the ordinal variable “happiness” in the application of the fifth section (see Figure 1).

Figure 1.

The histograms of the categorical variable z in the simulation study and the happiness data sets.

The hyperparameters of the prior distributions discussed in the subsection prior specification are specified as follows: for j = 1,…,12, μ_j ₀ = 0, σ_j ₀ = 100,; the elements in Λ _j ₀ are set as 0.6, Σ _j ₀ = 10⁴ I; R ₀ = 15I, ρ₀ = 8; α_ζj ₀ = 9, and b_ζj ₀ = 3, where I denotes an identity matrix with an appropriate dimension. An initial estimate of ω _i is obtained based on the CFA model. In the analysis, all the predictors are standardized into the same scale so that the magnitudes of coefficients can represent the significance of predictors. Pilot runs show that the algorithm converges within 5,000 iteration. After a burn-in phase of 5,000 iterations, we collect 10,000 posterior samples for Bayesian inference. As discussed in the subsection BaLasso, the sampling-based BaLasso procedure would not shrink the nonsignificant elements of β ₁ and β ₂ exactly to 0. To identify nonsignificant coefficients, we follow a common practice of setting the cutoff value to be 0.1 (Guo et al. 2012; Hoti and Sillanpää 2006). If $| β_{1 k} | \leq 0.1$ or $| β_{2 k} | \leq 0.1$ , then the corresponding observed or latent predictor is concluded as unimportant.

The results presented below are on the basis of 100 replications. We use the bias (BIAS) and the root mean square error (RMS) of the Bayesian estimates to assess the performance of the proposed method. Tables 1 –3 report the estimation and model selection results obtained using the proposed method. The less interesting intercepts are not reported to save space. Those obtained using BLasso without adapting the turning parameter are also provided for comparison. Although both methods produce satisfactory results, the former performs better in terms of estimation and variable selection. This result is expected because BaLasso enables the data to determine the magnitude of penalty, which not only adaptively penalizes the coefficients of unimportant predictors and shrinks them to zero faster but also estimates significant coefficients more effectively. Such appealing features are demonstrated by the lower bias and RMS values under BaLasso in Table 1 and the higher correct rates produced by BaLasso in Table 3. Given that the advantage of BaLasso over BLasso is mainly focused on regression, the parameter estimates of the CFA model in Table 2 are similar for the two procedures.

Table 1.

Bayesian Estimates of Parameters of the Ordinal Regression Model in Simulation 1.

Para	BLasso						BaLasso
	n = 200		n = 500		n = 1,000		n = 200		n = 500		n = 1,000
	BIAS	RMS	BIAS	RMS	BIAS	RMS	BIAS	RMS	BIAS	RMS	BIAS	RMS
β ₀ = 0	.042	.337	−.012	.209	.004	.125	.025	.287	−.010	.191	.001	.118
β ₁₁ = 1	.047	.210	−.001	.105	.015	.071	.014	.169	−.006	.102	.012	.071
β ₁₂ = 0	−.003	.101	.002	.060	.000	.035	−.004	.084	.002	.055	.000	.032
β ₁₃ = 1	.023	.270	.006	.153	.009	.090	−.032	.238	−.005	.152	.003	.088
β ₁₄ = 0	−.001	.067	−.001	.041	−.001	.025	−.002	.052	−.001	.037	−.001	.024
β ₁₅ = 1	.063	.202	.001	.117	.007	.069	.032	.168	−.003	.115	.005	.067
β ₁₆ = 0	.021	.107	.011	.089	.008	.055	.019	.087	.010	.080	.007	.051
β ₂₁ = 1	.068	.253	.002	.138	.013	.095	.027	.214	−.003	.135	.010	.093
β ₂₂ = 1	.044	.238	.002	.121	−.006	.088	.006	.188	−.004	.121	−.009	.088
β ₂₃ = 0	.005	.134	.001	.072	.000	.050	.006	.107	.001	.065	.000	.047
β ₂₄ = 0	.023	.112	.010	.079	.003	.051	.021	.087	.010	.072	.003	.047
α ₃ = 2.1	.255	.592	.030	.257	.020	.154	.166	.435	.019	.241	.013	.151
α ₄ = 3.1	.318	.710	.055	.336	.043	.211	.206	.521	.040	.320	.033	.203

Note: BLasso = Bayesian lasso; BaLasso = Bayesian adaptive lasso; RMS = root mean square error.

Table 2.

Bayesian Estimates of Some Parameters of the Factor Analysis Model in Simulation 1.

Para	BLasso						BaLasso
	n = 200		n = 500		n = 1,000		n = 200		n = 500		n = 1,000
	BIAS	RMS	BIAS	RMS	BIAS	RMS	BIAS	RMS	BIAS	RMS	BIAS	RMS
λ ₂₁ = .8	−.012	.066	−.002	.038	−.005	.030	−.014	.067	−.001	.038	−.006	.031
λ ₃₁ = .8	−.007	.063	−.008	.046	−.006	.035	−.009	.063	−.007	.046	−.006	.035
λ ₅₂ = .8	−.015	.061	−.001	.038	.001	.030	−.015	.062	−.001	.038	.001	.030
λ ₆₂ = .8	−.010	.062	−.008	.040	−.002	.030	−.010	.062	−.008	.040	−.001	.031
λ ₈₃ = .8	−.012	.068	.001	.043	.000	.034	−.011	.067	.001	.044	.000	.034
λ ₉₃ = .8	−.010	.069	−.002	.042	.001	.031	−.009	.070	−.002	.043	.001	.030
λ _11,4 = .8	.003	.068	−.003	.046	−.004	.030	.003	.067	−.004	.045	−.005	.030
λ _12,4 = .8	−.003	.063	−.006	.039	.000	.029	−.003	.063	−.006	.040	−.001	.030
ψ ₁ = .4	−.013	.055	−.005	.041	−.001	.026	−.015	.057	−.004	.042	−.001	.027
ψ ₂= .4	−.010	.043	−.002	.035	.002	.022	−.009	.042	−.002	.035	.002	.022
ψ ₃= .4	−.008	.044	.000	.032	−.001	.027	−.007	.044	.000	.032	.000	.026
ψ ₄= .4	−.012	.059	−.007	.040	−.001	.032	−.013	.058	−.007	.039	−.001	.033
ψ ₅= .4	−.008	.044	−.004	.031	.000	.023	−.008	.043	−.004	.032	.000	.024
ψ ₆= .4	−.005	.051	.004	.034	−.003	.025	−.005	.051	.004	.034	−.003	.025
ψ ₇= .4	−.018	.059	−.002	.037	−.003	.036	−.017	.059	−.002	.038	−.003	.036
ψ ₈ = .4	−.004	.049	−.001	.031	−.002	.027	−.004	.049	.000	.031	−.002	.028
ψ ₉ = .4	−.001	.043	−.003	.032	−.002	.024	−.001	.043	−.003	.032	−.002	.024
ψ ₁₀ = .4	−.014	.060	−.010	.042	−.002	.030	−.014	.059	−.010	.041	−.003	.031
ψ ₁₁ = .4	−.015	.048	.001	.036	.001	.026	−.014	.049	.001	.036	.001	.026
ψ ₁₂ = .4	−.008	.052	−.001	.035	.001	.028	−.008	.052	−.001	.035	.001	.029
ϕ ₁₁ = 1	.085	.160	.046	.094	.021	.075	.089	.163	.045	.091	.022	.075
ϕ ₁₂ = .2	−.001	.077	−.001	.051	−.003	.044	.001	.078	−.001	.051	−.003	.044
ϕ ₁₃ = .2	.000	.095	−.003	.056	.002	.036	.000	.095	−.003	.055	.002	.036
ϕ ₁₄ = .2	−.011	.091	.009	.052	.001	.036	−.012	.090	.008	.052	.001	.036
ϕ ₂₂ = 1	.092	.159	.048	.102	.022	.065	.093	.160	.048	.101	.021	.065
ϕ ₂₃= .2	.002	.080	−.001	.050	−.005	.039	.001	.081	.000	.050	−.005	.040
ϕ ₂₄ = .2	−.019	.085	.008	.050	.000	.037	−.020	.085	.008	.050	.000	.037
ϕ ₃₃= 1	.093	.160	.032	.094	.020	.073	.092	.160	.033	.095	.019	.073
ϕ ₃₄ = .2	−.005	.085	−.004	.057	.001	.045	−.006	.086	−.004	.058	.001	.046
ϕ ₄₄ = 1	.081	.168	.041	.093	.025	.073	.081	.167	.041	.094	.027	.074

Note: BLasso = Bayesian lasso; BaLasso = Bayesian adaptive lasso; RMS = root mean square error.

Table 3.

Number of Correct Variable Selections in Simulation 1.

Para	Truth	BLasso			BaLasso
Para	Truth	n = 200	n = 500	n = 1,000	n = 200	n = 500	n = 1,000
β ₁₁	≠ 0	100	100	100	100	100	100
β ₁₂	0	83	90	99	86	94	99
β ₁₃	≠ 0	100	100	100	100	100	100
β ₁₄	0	87	94	97	92	94	98
β ₁₅	≠ 0	100	100	100	100	100	100
β ₁₆	0	76	80	95	82	85	98
β ₂₁	≠ 0	100	100	100	100	100	100
β ₂₂	≠ 0	100	100	100	100	100	100
β ₂₃	0	78	90	98	82	91	98
β ₂₄	0	82	87	98	83	88	100

Note: BLasso = Bayesian lasso; BaLasso = Bayesian adaptive lasso.

To check whether the Bayesian results are sensitive to prior inputs, we reconduct the analysis using two different prior settings in prior (15): a_k ₀ = 1, b_k ₀ = 0.5; and a_k ₀ = 1, b_k ₀ = 0.01, together with some disturbance of other hyperparameters in prior (19). The estimation and model selection results are similar and not reported.

Simulation 2

In this section, the proposed method is compared with two conventional approaches. The first one is to simply regard the observed indicators of the latent variables as independent covariates. To compare the proposed joint analysis with such an independent analysis, the 100 replicated data sets that are generated in simulation 1 with n = 1,000 are reanalyzed using the following simple independent model:

y_{i} = β_{0} + β_{1}^{T} x_{i} + {\tilde{β}}_{2}^{T} u_{i} + ϵ_{i},

where all the components are the same as those in simulation 1 except that here ${\tilde{β}}_{2}$ is a 12 × 1 vector of unknown parameters and $u_{i} = (u_{i 1}, \dots, u_{i,12})^{T}$ .

Notably, a direct comparison in terms of estimation result is unavailable because the true value of ${\tilde{β}}_{2}$ is unknown. Therefore, we focus on the comparison of variable selection. Based on the set up of simulation 1, ω_i ₁ and ω_i ₂ are significant predictors with coefficients β ₂₁ = β ₂₂ = 1. Thus, we expect that their associated observed indicators {u_i ₁, u_i ₂, u_i ₃} and {u_i ₄, u_i ₅, u_i ₆} are likewise relevant to the response. However, the independent analysis shows that ${\tilde{β}}_{21}, \dots, {\tilde{β}}_{26}$ are nonsignificant in a quite large number of replications. The numbers in which ${\tilde{β}}_{21}, \dots, {\tilde{β}}_{26}$ are significant are only 52, 36, 35, 40, 35, and 35, respectively. On the contrary, the estimates of β ₂₁ and β ₂₂ in the joint analysis are all highly significant in each of the 100 replications. A main reason for this undesirable result is the multicollinearity elicited by simultaneously incorporating highly correlated {u_i ₁, u_i ₂, u_i ₃} and {u_i ₄, u_i ₅, u_i ₆} in the regression. Unlike the conventional analysis that addresses multicollinearity by simply removing one or more correlated indicators, which may lose the important characteristics of a latent trait, the proposed method groups the highly correlated observed variables into relatively independent latent variables, thereby eliminating multicollinearity and providing attractive interpretation for latent predictors.

The second comparison is between the proposed method and a continuous regression that simply regards the ordinal response as continuous. Again, the abovementioned 100 replicated data sets with n = 1,000 are reanalyzed by treating z_i as continuous. The estimates of the regression coefficients under the two methods are presented in Table 4, indicating that ignoring the ordinal nature of responses could substantially underestimate the effects of important predictors.

Table 4.

Comparison Between the Proposed Model and a Continuous Regression.

Para	Ordinal regression		Continuous regression
Para	BIAS	RMS	BIAS	RMS
β ₀ = 0	.001	.118	1.889	1.890
β ₁₁ = 1	.012	.071	−.652	.653
β ₁₂ = 0	.000	.032	−.001	.013
β ₁₃ = 1	.003	.088	−.657	.658
β ₁₄ = 0	−.001	.024	−.001	.010
β ₁₅ = 1	.005	.067	−.656	.657
β ₁₆ = 0	.007	.051	.003	.020
β ₂₁ = 1	.010	.093	−.691	.691
β ₂₂ =1	−.009	.088	−.694	.695
β ₂₃ = 0	.000	.047	.009	.019
β ₂₄ = 0	.003	.047	.011	.019
α ₃ = 2.1	.013	.151	N/A	N/A
α ₄ = 3.1	.033	.203	N/A	N/A

Note: n = 1,000. BLasso = Bayesian lasso; BaLasso = Bayesian adaptive lasso; RMS = root mean square error; N/A = not applicable.

Further, we utilize the Watanabe–Akaike information criterion (WAIC; Gelman, Hwang, and Vehtari 2014; Vehtari, Gelman, and Gabry 2015; Watanabe 2010) to compare the proposed model with the abovementioned ones that regard the observed indicators of the latent variables as independent covariates and the ordinal responses as continuous, respectively. The WAIC is defined as:

WAIC = lpd - p_{w a i c},

where $lpd = \sum_{i = 1}^{n} \log \int p (z_{i} | θ) p (θ | \cdot) d θ$ denotes the logarithm pointwise predictive density, $p (θ | \cdot)$ is the posterior distribution for the parameters, and $p_{w a i c} = \sum_{i = 1}^{n} {var}_{p o s t} (\log p (z_{i} | θ))$ represents the effective number of parameters in the model. This index seeks a balance between the goodness of fit to the data and model complexity. If the difference in WAIC values between two competing models is statistically significant, then the model with larger WAIC value is favored (see details in Vehtari et al. 2015). In the present study, lpd and p _waic can be approximated as follows:

\begin{array}{l} l \hat{p} d & = \sum_{i = 1}^{n} \log (\frac{1}{T} \sum_{t = 1}^{T} p (z_{i} | θ^{(t)})), \\ {\hat{p}}_{waic} & = \sum_{i = 1}^{n} \frac{1}{T - 1} \sum_{t = 1}^{T} {(\log p (z_{i} | θ^{(t)}) - \frac{1}{T} \sum_{t = 1}^{T} \log p (z_{i} | θ^{(t)}))}^{2}, \end{array}

where θ ^(t) are tth MCMC samples drawn from the posterior distributions after the burn-in phase. For ordinal variable $z_{i}, p (z_{i} | θ^{(t)}) = p (y_{i}^{(t)} \in (α_{z_{i}}^{(t)}, α_{z_{i} + 1}^{(t)}) | θ^{(t)})$ , and $y_{i}^{(t)}$ follows a normal distribution of $N (β_{0}^{(t)} + β_{1}^{(t)}^{T} x_{i}, σ^{2} + β_{2}^{(t)}^{T} Φ β_{2}^{(t)})$ in the proposed model defined by models (4) and (5).

Figure 2 summarizes the WAIC values corresponding to the three analyses: (1) the proposed joint analysis, (2) the independent analysis, and (3) the analysis that regards ordinal variables as continuous. The box plots of WAIC values show that the proposed joint analysis substantially outperforms the others.

Figure 2.

Summary of the Watanabe–Akaike information criterion values on the basis of 100 replications in simulation 2.

To examine the empirical performance of the proposed method in the case of s > 1, we likewise conduct a simulation with the exactly same set up as that of simulation 1 except s = 2. The estimation and model selection results are similar to those in simulation 1 and are not reported.

A Study of Happiness and Its Determinants

In this section, we present an application to the analysis of the World Values Survey (World Values Study Group 1994), a global research project exploring the social and personal characteristics that influence peoples’ values and beliefs. The World Values Survey collects information from participants around the world on contemporary societal issues such as individuals’ happiness, their satisfaction with life and job as well as their attitudes toward work, money, religious beliefs, and politics. The primary goal of the survey is to enable a cross-national and cross-cultural comparison of peoples’ core values. In this application, we considered the data collected from Britain and Japan, which are the typical representatives of western and eastern countries. We are particularly interested in individuals’ happiness and its major determinants, consequently discovering the culture diversity in the two populations.

A total of 802 and 601 samples were included in Britain and Japan cohorts, respectively. The ordinal response variable “happiness” (z) was measured in a 4-point scale from 1 to 4 representing very happy to not at all happy. Six observed covariates, including gender (x ₁), marriage (x ₂), life meaning (x ₃), social–economic status (x ₄), health condition (x ₅), and religious belief (x ₆), which are related to the characteristics of respondents, were considered as possible determinants of happiness. In addition, three latent variables such as job satisfaction (ω₁), homelife satisfaction (ω₂), and work attitude (ω₃), which were characterized by {u ₁, u ₂}, {u ₃, u ₄, u ₅}, and {u ₆, u ₇, u ₈}, were also considered as the potential influential factors of happiness. A description of the observed indicators u ₁, …, u ₈, together with the abovementioned response variable z and covariates x ₁, …, x ₆, is provided in Online Appendix C. Given that u ₁, …, u ₈ were measured in a 10-point scale, we treated them as continuous. Then, the model defined by models (3)–(5) with H = 4, p = 4, q = 4, and m = 8 was proposed to conduct the analysis. A path diagram to depict the CFA model and the ordinal regression is presented in Figure 3. In the proposed model, $β_{1} = (β_{11}, \dots, β_{16})^{T}$ , $x_{i} = (x_{i 1}, \dots, x_{i 6})^{T}$ , $β_{2} = (β_{21}, β_{22}, β_{23})^{T}$ , and $ω_{i} = (ω_{i 1}, ω_{i 2}, ω_{i 3})^{T}$ . The factor loading matrix has a nonoverlapping structure:

Figure 3.

The path diagram of the proposed model, where the solid and dashed lines indicate significant and nonsignificant effects.

Λ^{T} = [\begin{matrix} 1 & λ_{21} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & λ_{42} & λ_{52} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & λ_{73} & λ_{83} \end{matrix}],

where the 1s and 0s are fixed to obtain an identified model and a clear interpretation of the latent variables.

The prior distributions discussed in the subsection prior specification with hyperparameters similar to those in simulation 1 were used. The cutoff value for identifying significant predictors was again set as 0.1. An initial estimate of the latent variables was obtained based on the CFA model for standardization. The BaLasso procedure was implemented to perform simultaneous estimation and model selection. On the basis of several testing runs starting from very different initial values, we found that the MCMC algorithm converged within 10,000 iterations. After a burn-in phase of 10,000 iterations, 10,000 additional samples were generated for Bayesian inference.

The estimated factor loadings and the significant coefficients in the ordinal regression along with their standard error estimates (in parentheses) are presented in Figure 3. For both populations, the factor loadings are substantially different from zero, indicating that each observed indicator significantly contributes to the characterization of the associated latent variable. Based on their operationalization, the latent variables can be interpreted as follows. Higher scores of ω ₁ and ω ₂ imply better satisfaction with job and homelife. An increase in the score of ω ₃ indicates a more negative attitude toward job. For the British, the significant predictors include health condition (x ₅), homelife satisfaction (ω ₂), and work attitude (ω ₃). Their estimated coefficients are ${\hat{β}}_{15} = 0.166 (0.046)$ , ${\hat{β}}_{22} = - 0.503 (0.066)$ , and ${\hat{β}}_{23} = 0.147 (0.054)$ . In addition to these relevant predictors, the analysis of Japan cohort reveals other potential determinants, such as gender (x ₁), marriage (x ₂), and job satisfaction (ω ₁). The estimated coefficients of the relevant predictors in Japan are ${\hat{β}}_{11} = - 0.223 (0.061)$ , ${\hat{β}}_{12} = - 0.163 (0.056)$ , ${\hat{β}}_{15} = 0.174 (0.059)$ , ${\hat{β}}_{21} = 0.143 (0.069)$ , ${\hat{β}}_{22} = - 0.945 (0.102)$ , and ${\hat{β}}_{23} = 0.171 (0.073)$ . Similarities and differences in the determinants of happiness between Britain and Japan are interpreted as follows:

In both countries, health condition (x ₅) and homelife satisfaction (ω ₂) are closely related to happiness. Healthier people and those with higher satisfaction with their homelife tend to be happier. This result agrees with previous studies (see, e.g., Lee and Ono 2008; Stack and Eshleman 1998), in which similar associations were discovered.

Another common determinant of happiness shared by the two cohorts is work attitude (ω ₃). Active work attitude in general raises happiness. Mohanty (2009) also revealed the tie between happiness and positive attitude toward life, work, and daily affairs.

The effects of gender (x ₁) and marriage (x ₂) on happiness are nonsignificant for the British but significant for the Japanese. In general, female and married Japanese are happier than male and unmarried ones, respectively. This conclusion aligns with the previous findings (Lee and Ono 2008; Oshio and Kobayashi 2010, 2011; Stack and Eshleman 1998), which investigated the gendered division of labor, marriage, family roles as well as their impacts on happiness, in Japan.

In Britain cohort, the effect of job satisfaction (ω ₁) on happiness is again nonsignificant, implying that the British have better work–life balance and seem less dominated by job. However, this effect is negative in Japan cohort. A possible reason is that high job satisfaction may make the Japanese greatly devote themselves to job, consequently reducing their homelife quality and lowering their happiness. This finding has a good agreement with those of Greenhaus and Beutell (1985) and Shimazu et al. (2011).

For both populations, life meaning (x ₃), socioeconomic status (x ₄), and religious belief (x ₆) are all irrelevant to happiness. Whether or not this conclusion can be generalized to other western or eastern countries requires further investigation.

For comparison, we conducted an independent analysis by incorporating u ₁, …, u ₈ as independent covariates into the regression. The estimated coefficients of x ₁, …, x ₆ are similar and not reported. Those corresponding to the latent variables are reported in Table 4, which shows that all the u ₁, …, u ₈ become nonsignificant. Specifically, the estimates of ${\tilde{β}}_{21}, \dots, {\tilde{β}}_{28}$ are consistently less than twice their standard error estimates. This typical sign of multicollinearity confirms the necessity of the proposed model in the application. We likewise calculated the WAIC values (see Table 5) for the two models. Again, the WAIC strongly supports the proposed model.

Table 5.

Comparison of the Joint and the Indepedent Analyses of Happiness and Its Determinants.

Country	Joint analysis $y_{i} = β_{0} + β_{1}^{T} x_{i} + β_{2}^{T} ω_{i}$			Independent analysis $y_{i} = β_{0} + β_{1}^{T} x_{i} + {\tilde{β}}_{2}^{T} u_{i}$
Country	Variable	Para	Estimation	Variable	Para	Estimation
Britain	ω ₁	β₂₁	.048 (.056)	u ₁	${\tilde{β}}_{21}$	.023 (.179)
				u ₂	${\tilde{β}}_{22}$	.039 (.053)
	ω ₂	β₂₂	−.503 (.066)	u ₃	${\tilde{β}}_{23}$	−.052 (.135)
				u ₄	${\tilde{β}}_{24}$	.132 (.086)
				u ₅	${\tilde{β}}_{25}$	−.116 (.208)
	ω ₃	β ₂₃	.147 (.054)	u ₆	${\tilde{β}}_{26}$	.039 (.085)
				u ₇	${\tilde{β}}_{27}$	.057 (.169)
				u ₈	${\tilde{β}}_{28}$	.064 (.104)
	WAIC		−738.47	WAIC		−962.27
Japan	ω ₁	β ₂₁	.143 (.069)	u ₁	${\tilde{β}}_{21}$	.092 (.159)
				u ₂	${\tilde{β}}_{22}$	−.067 (.092)
	ω ₂	β ₂₂	−.945 (.102)	u ₃	${\tilde{β}}_{23}$	−.163 (.191)
				u ₄	${\tilde{β}}_{24}$	.069 (.109)
				u ₅	${\tilde{β}}_{25}$	−.286 (.196)
	ω ₃	β ₂₃	.171 (.073)	u ₆	${\tilde{β}}_{26}$	.110 (.063)
				u ₇	${\tilde{β}}_{27}$	.029 (.255)
				u ₈	${\tilde{β}}_{28}$	−.004 (.053)
	WAIC		−515.54	WAIC		−651.95

Note: WAIC = Watanabe–Akaike information criterion.

To assess the sensitivity of the Bayesian inference to the prior inputs, the analysis was repeated by disturbing the prior inputs in a similar manner as that in simulation 1. The estimation and model selection results are close to those reported in Figure 3.

Discussion

In this article, we considered a simultaneous estimation and model selection for the ordinal regression with latent variables. Ordinal responses were formulated via the underlying continuous measurements with a threshold specification. A BaLasso procedure coped with MCMC methods was developed to conduct the analysis. The empirical performance of the proposed methodology was demonstrated by simulations. An application to a study of happiness and its determinants in Britain and Japan was presented.

The current study has limitations. First, we only considered the linear effects of predictors on ordinal responses. This linear assumption may not be true in practice. Extending the proposed parametric model to a nonparametric framework is of scientific interest. Bayesian nonparametric techniques such as Bayesian penalized splines (Lang and Brezger 2004) and Gaussian process (Rasmussen and Williams 2006) are plausible techniques for this generalization. Second, although ordinal data are of the most common data type, a comprehensive model accommodating a wide variety of data types shows high potential in real applications (Song et al. 2013; Wang, Feng, and Song 2015). Finally, we characterized latent variables on the basis of multiple observed indicators via a CFA model, in which the number of latent factors and the operationalization of latent constructs are given in advance. While substantive study and common knowledge often provide such kind of information, an exploratory factor analysis that determines the number of latent variables and their observed indicators fully by data is highly useful in practice. A generalizable approach for simultaneously operating latent factors and selecting relevant predictors is a promising attempt but requires further investigation.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by GRF 14305014 from the Research Grant Council of the Hong Kong Special Administration Region, NSFC 11471277 from the National Natural Science Foundation of China, and CUHK 4053138 from the Direct Grant of the Chinese University of Hong Kong.

Supplemental Material

Supplementary material for this article is available online.

References

Alhamzawi

Benoit

D. F.

. 2012. “Bayesian Adaptive Lasso Quantile Regression.” Statistical Modelling 12:279–97.

Andrews

D. F.

Mallows

C. L.

. 1974. “Scale Mixtures of Normal Distributions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 36:99–102.

Bennink

Croon

M. A.

Vermunt

J. K.

. 2013. “Micro–macro Multilevel Analysis for Discrete Data a Latent Variable Approach and an Application on Personal Network Data.” Sociological Methods & Research 42:431–57.

Bollen

K. A.

1989. Structural Equations with Latent Variables. New York: Wiley.

Fan

. 2001. “Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association 96:1348–60.

Gelman

Hwang

Vehtari

. 2014. “Understanding Predictive Information Criteria for Bayesian Models.” Statistics and Computing 24:997–1016.

Geman

. 1984. “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.” IEEE Transactions on Pattern Analysis and Machine Intelligence 6:721–41.

Gilks

W. R.

Richardson

Spiegelhalter

D. J.

. 1996. Markov Chain Monte Carlo in Practice. London, UK: Chapman&Hall.

Greenhaus

J. H.

Beutell

N. J.

. 1985. “Sources of Conflict Between Work and Family Roles.” Academy of Management Review 10:76–88.

10.

Guo

Zhu

Chow

S. M.

Ibrahim

J. G.

. 2012. “Bayesian Lasso for Semiparametric Structural Equation Models.” Biometrics 68:567–77.

11.

Harman

H. H.

1976. Modern Factor Analysis. Chicago: University of Chicago Press.

12.

Hastings

W. K.

1970. “Monte Carlo Sampling Methods Using Markov Chains and Their Application.” Biometrika 57:97–100.

13.

Hoti

Sillanpää

M. J.

. 2006. “Bayesian Mapping of Genotype × Expression Interactions in Quantitative and Qualitative Traits.” Heredity 97:4–18.

14.

Johnson

V. E.

1996. “On Bayesian Analysis of Multirater Ordinal Data: An Application to Automated Essay Grading.” Journal of the American Statistical Association 91:42–51.

15.

Johnson

V. E.

2008. “Statistical Analysis of the National Institutes of Health Peer Review System.” Proceedings of the National Academy of Sciences 105:11076–80.

16.

Kass

R. E.

Raftery

A. E.

. 1995. “Bayes Factors.” Journal of the American Statistical Association 90:773–95.

17.

Kenny

D. A.

Judd

C. M.

. 1984. “Estimating the Nonlinear and Interactive Effects of Latent Variables.” Psychological Bulletin 96:201–10.

18.

Kyung

Gill

Ghosh

Casella

. 2010. “Penalized Regression, Standard Errors, and Bayesian Lassos.” Bayesian Analysis 5:369–412.

19.

Lang

Brezger

. 2004. “Bayesian P-splines.” Journal Computational and Graphical Statistics 13:183–212.

20.

Lee

K. S.

Ono

. 2008. “Specialization and Happiness in Marriage: A US-Japan Comparison.” Social Science Research 37:1216–34.

21.

Lee

S. Y.

2007. Structural Equation Modeling: A Bayesian Approach. Chichester, UK: Wiley.

22.

Lee

S. Y.

Song

X. Y.

. 2007. “A Unified Maximum Likelihood Approach for Analyzing Structural Equation Models with Missing Nonstandard Data.” Sociological Methods & Research 35:352–81.

23.

Leng

C. L.

Tran

M. N.

Nott

. 2014. “Bayesian Adaptive Lasso.” Annals of the Institute of Statistical Mathematics 66:221–44.

24.

Magis

Tuerlinckx

De Boeck

. 2015. “Detection of Differential Item Functioning Using the Lasso Approach.” Journal of Educational and Behavioral Statistics 40:111–35.

25.

McCullagh

1980. “Regression Models for Ordinal Data.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 42:109–42.

26.

Metropolis

Rosenbluth

A. W.

Rosenbluth

M. N.

Teller

A. H.

Teller

. 1953. “Equations of State Calculations by Fast Computing Machine.” Journal of Chemical Physics 21:1087–91.

27.

Mohanty

M. S.

2009. “Effects of Positive Attitude on Happiness and Wage: Evidence from the US Data.” Journal of Economic Psychology 30:884–97.

28.

Moustaki

2003. “A General Class of Latent Variable Models for Ordinal Manifest Variables with Covariate Effects on the Manifest and Latent Variables.” British Journal of Mathematical and Statistical Psychology 56:337–57.

29.

Muthén

1984. “A General Structural Equation Model with Dichotomous, Ordered Categorical, and Continuous Latent Variable Indicators.” Psychometrika 49:115–32.

30.

Oshio

Kobayashi

. 2010. “Income Inequality, Perceived Happiness, and Self-rated Health: Evidence from Nationwide Surveys in Japan.” Social Science & Medicine 70:1358–66.

31.

Oshio

Kobayashi

. 2011. “Area-level Income Inequality and Individual Happiness: Evidence from Japan.” Journal of Happiness Studies 12:633–49.

32.

Park

Casella

. 2008. “The Bayesian Lasso.” Journal of the American Statistical Association 103:681–86.

33.

Poon

W. Y.

Wang

H. B.

. 2012. “Latent Variable Models with Ordinal Categorical Covariates.” Statistics and Computing 22:1135–54.

34.

Rasmussen

C. E.

Williams

C. K. I.

. 2006. Gaussian Processes for Machine Learning. Cambridge, MA: The MIT Press.

35.

Shi

J. Q.

Lee

S. Y.

. 2000. “Latent Variable Models with Mixed Continuous and Polytomous Data.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 62:77–87.

36.

Shimazu

Demerouti

Bakker

A. B.

Shimada

Kawakami

. 2011. “Workaholism and Well-being among Japanese Dual-earner Couples: A Spillover-crossover Perspective.” Social Science & Medicine 73:399–409.

37.

Skrondal

Rabe-Hesketh

. 2004. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. Boca Raton, FL: Chapman & Hall.

38.

Song

X. Y.

Lee

S. Y.

. 2001. “Bayesian Estimation and Test for Factor Analysis Model with Continuous and Polytomous Data in Several Populations.” British Journal of Mathematical and Statistical Psychology 54:237–63.

39.

Song

X. Y.

Lee

S. Y.

. 2012. Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences. London, UK: Wiley.

40.

Song

X. Y.

Z. H.

Cai

J. H.

H. S.

. 2013. “A Bayesian Modeling Approach for Generalized Semiparametric Structural Equation Models.” Psychometrika 78:624–47.

41.

Song

X. Y.

Z. H.

Feng

X. N.

. 2014. “Latent Variable Models with Nonparametric Interaction Effects of Latent Variables.” Statistics in Medicine 33:1723–37.

42.

Spiegelhalter

D. J.

Best

N. G.

Carlin

B. P.

van der Linde

. 2002. “Bayesian Measures of Model Complexity and Fit (with Discussion).” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64:583–639.

43.

Stack

Eshleman

J. R.

. 1998. “Marital Status and Happiness: A 17-nation Study.” Journal of Marriage and the Family 60:527–36.

44.

Tibshirani

1996. “Regression Shrinkage and Selection Via a Lasso.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 58:267–88.

45.

Vehtari

Gelman

Gabry

. 2015. “Efficient Implementation of Leave-one-out Cross-validation and WAIC for Evaluating Fitted Bayesian Models.” ArXiv E-prints. (http://arxiv.org/abs/1507.04544).

46.

Wang

H. S.

G. D.

Tsai

C. L.

. 2007. “Regression Coefficient and Autoregressive Order Shrinkage and Selection Via the Lasso.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69:63–78.

47.

Wang

Y. F.

Feng

X. N.

Song

X. Y.

. 2015. “Bayesian Quantile Structural Equation Models.” Structural Equation Modeling: A Multidisciplinary Journal. (http://www.tandfonline.com/doi/full/10.1080/10705511.2015.1033057).

48.

Watanabe

2010. “Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory.” The Journal of Machine Learning Research 11:3571–94.

49.

World Values Study Group. 1994. World Values Survey, 1981-1984 and 1990-1993. ICPSR version. Ann Arbor : Institute of Social Research [producer]. Ann Arbor: Inter-university Consortium for Political and Social Research [distributor].

50.

Yuan

K. H.

Bentler

P. M.

. 2011. “Ridge Structural Equation Modelling with Correlation Matrices for Ordinal and Continuous Data.” British Journal of Mathematical and Statistical Psychology 64:107–33.

51.

Zou

2006. “The Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association 101:1418–29.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.37 MB