Abstract
Researchers often combine longitudinal panel data analysis with tests of interactions (i.e., moderation). A popular example is the cross-lagged panel model (CLPM). However, interaction tests in CLPMs and related models require caution because stable (i.e., between-level, B) and dynamic (i.e., within-level, W) sources of variation are present in longitudinal data, which can conflate estimates of interaction effects. We address this by integrating literature on CLPMs, multilevel moderation, and latent interactions. Distinguishing stable B and dynamic W parts, we describe three types of interactions that are of interest to researchers: 1) purely dynamic or WxW; 2) cross-level or BxW; and 3) purely stable or BxB. We demonstrate estimating latent interaction effects in a CLPM using a Bayesian SEM in Mplus to apply relationships among work-family conflict and job satisfaction, using gender as a stable B variable. We support our approach via simulations, demonstrating that our proposed CLPM approach is superior to a traditional CLPMs that conflate B and W sources of variation. We describe higher-order nonlinearities as a possible extension, and we discuss limitations and future research directions.
Time-series and panel data methods are used to study a variety of phenomena in organization science (Bliese et al., 2020). Of these, the cross-lagged panel model (CLPM) and its variants are a popular approach (e.g., Zyphur, Allison, et al., 2020; called panel vector autoregressions or panel VARs in econometrics; see Abrigo & Love, 2016; Love & Zicchino, 2006). These models estimate lagged relationships to allow inferences that are consistent with the fact that causality unfolds over time (Zyphur, Voelkle, et al., 2020), typically estimated as structural equation models (SEM) with fewer than 10 observations over time.
The CLPM can also be used to study conditions under which lagged effects vary by testing interactions. 1 Such tests often use product terms such as xz for predictors x and z (Cortina et al., 2021; Preacher et al., 2006, 2016), which can be calculated as the products of lagged predictors. For example, Eby et al. (2015) used the product of lagged predictors to test the interaction between (past) organizational support and (past) mentoring to predict (current) organizational citizenship behaviors. Related examples include: 1) models that account for measurement error with latent two-way (e.g., Dormann & Zapf, 1999) and three-way interactions (e.g., Lian et al., 2014); 2) combining interactions and nonlinear terms (e.g., Lin et al., 2017); 3) tests for sub-group differences in lagged effects (e.g., Cieslak et al., 2007; Houkes et al., 2003; Zablah et al., 2016); and 4) lag-as-moderator models (LAM) wherein the amount of time between measurements is treated as a moderator of lagged effects (Selig et al., 2012).
However, recent work suggests caution is needed when testing interactions in panel data, because different types of interactions may be conflated. Specifically, panel data contain stable ‘between-level’ (B) variation (e.g., personality; Hamaker et al., 2015) and dynamic ‘within-level’ (W) variation due to time-varying factors (e.g., emotions; Zyphur, Allison, et al., 2020). The multilevel modeling literature notes the importance of separating these sources of variation when testing interactions to allow precisely matching theory to hypotheses while avoiding the conflation of B and W effects (Preacher et al., 2016). However, when testing interactions in CLPMs, researchers have overlooked the implications of W and B sources of variation. In turn, this may have led to interaction estimates being difficult to interpret or resulted in flawed conclusions pertaining to significance testing.
In this paper we address testing interactions in CLPMs by integrating the literatures on CLPMs, multilevel moderation, and latent interactions. We start with the multilevel nature of longitudinal data and how panel data allow testing three types of interactions: (1) purely within-level or WxW interactions among dynamic factors; (2) cross-level or BxW interactions among stable B and dynamic W factors; and (3) purely between-level or BxB interactions among stable factors (Preacher et al., 2016). By extension, we also describe how to incorporate nonlinear and higher-order interactions of various kinds, including but not limited to within-level or W2 nonlinearity; between-level or B2 nonlinearity; and combinations of these with cross-level interactions, which we call B2xW, BxW2, or B2xW2 cases.
We then show how CLPMs may conflate such effects, including when estimating latent interactions (e.g., Dormann & Zapf, 1999; Lian et al., 2014) or attempting to control for stable B factors (e.g., Carpenter & Fredrickson, 2001). To address this issue, we describe a computationally tractable SEM framework for testing interactions in CLPMs with a Bayes estimator and latent B and W components (Asparouhov & Muthén, 2020). We illustrate our method for estimating and interpreting CLPM interaction effects using a dataset with time-varying measures of job satisfaction and work-family conflict as well as a stable measure of gender. We also run simulations showing that our approach is less biased than alternatives. We provide code for implementing our approach in Mplus, as this software is widely known in organization research and elsewhere (see Cortina et al., 2021; Heck & Thomas, 2015; Preacher et al., 2010, 2016). We conclude by discussing how our general approach can be used in other types of longitudinal panel data models including the general cross-lagged panel model (GCLM; Zyphur, Allison, et al., 2020), and latent curve or latent growth models (LGM; Chan, 1998). In sum, by capitalizing on the multilevel nature of longitudinal data we offer guidance for testing different types of multilevel interaction effects in a familiar CLPM framework estimated as an SEM.
A Multilevel Understanding of Cross-Lagged Panel Models
Longitudinal data are inherently complex, involving multiple sources of variation. When emotions are measured over time, for example, the resulting data exhibit both stability due to time-invariant factors and instability due to occasion-specific factors (Beal, 2015; Gabriel et al., 2019), which implies a multilevel structure. From this perspective, stable factors are higher-level or between-level (e.g., between-person or merely B), whereas dynamic factors are lower-level or within-level (e.g., within-person or merely W) phenomena (Zyphur, Allison, et al., 2020). In brief, B components reflect stable factors such as time-invariant individual differences (e.g., personality), and W components reflect temporary deviations around stable levels, due to occasion-specific factors (e.g., random events). Thus, to represent W components necessitates repeated measures of a variable with W variation and based on these a stable B component in the same content domain can be treated as a latent variable reflected in the repeated measures. It is notable that for variables without W variation such as stable gender or race, a single stable measure of the B variable will suffice and drawing an analogy to multilevel discussions of ‘direct-consensus’ measures, it would also be possible to collect ‘direct-rating’ measures for B terms (e.g., participants’ reporting their ‘usual’ or ‘overall’ job satisfaction across a studied timespan). 2
We illustrate the multilevel nature of longitudinal data using two variables x and y, which might reflect work-family conflict and job satisfaction measured for a person i at an occasion t, where i = 1, 2, …, N and t = 1, 2, …, T. The multilevel logic of a stable B and dynamic W component can then be shown as follows (see Figure 1 for an SEM diagram):

An SEM diagram distinguishing W and B components in a CLPM for x and y measured at three occasions.
In multilevel parlance,
In this context, there are two primary ways to decompose a variable, namely manifest group-mean centering (i.e., an ‘observed-means’ approach) versus latent group-mean centering (see Lüdtke et al., 2008; Preacher et al., 2010). In the observed-means approach, researchers estimate the B part for each sampled entity using an average of the observed variables over time, with the deviations from this average representing the W part. In the latent centering case, a model-based approach is used to estimate the B and W variance components without attempting to estimate the underlying B and W scores directly. We compare these approaches later with real data and in simulations, but substantial prior research suggests that latent centering is superior to observed centering because the latter fails to account for uncertainty in the B and W components (Preacher et al., 2016). This is particularly important in longitudinal panel data models with lagged effects, wherein observed-means centering is known to cause ‘Nickell’ bias in lagged effect estimates (see Nickell, 1981; Zyphur, Allison, et al., 2020).
To illustrate, we start by describing the dynamic W terms used for lagged effects in CLPMs, using notation from Zyphur, Allison, et al. (2020) where
Equations 3 and 4 represent a familiar lagged-effects logic (for an associated logic of causality, although not the focus of our paper, see Granger, 1969; for a critical discussion see Zyphur, Allison, et al., 2020; Zyphur, Voelkle, et al., 2020). However, by substituting the lagged W terms from Eqs. 3 and 4 into the original multilevel decomposition in Eqs. 1 and 2, the model can be shown as a type of ‘fixed-effects’ CLPM (in econometrics terms) that implies stable B terms are held constant when examining W effects (see Figure 1):
To continue, a more general structural specification allows regression among B terms when causal effects among them can be theoretically justified, such as when studying individual differences or organizational climate and culture. For example, consider a model for
By substituting the B model for y from Equation (7) into the larger model for y in Equation (6), we can show a full B + W representation with both stable and dynamic parts as follows:
Latent Interactions in Cross-Lagged Models
Based on previous work on multilevel moderation (Preacher et al., 2016), we can also begin drawing conclusions about interactions in CLPMs. An interaction exists when the effect of a predictor x on an outcome y varies across the levels of another variable z. To test this, x and z are usually multiplied and their product xz is used as a predictor of y. As noted above, lagged-effects models that test interactions use this or conceptually similar approaches with lagged predictors. The idea behind using product terms in CLPM is that “extending this method to autoregressive models is straightforward, and no modifications to the mathematics are required to accommodate lagged values. That is, rather than (or in addition to) controlling for information available in prior lags, use lag variables as moderators” (Hayes, 2015, p. 16). This belief about the generality of methods for testing interactions has motivated the use of related interaction testing methods in longitudinal data (e.g., Selig et al., 2012; Zablah et al., 2016). However, some modifications to these approaches are needed to ensure that different sources of multilevel variation and effects are not conflated.
First, product terms among observed variables will confound different types of multilevel interactions when observed variables have B and W parts—consider the implications of expanding the product term
In sum, multilevel research shows that testing interactions with multilevel data requires (1) decomposing observed variables into B and W parts, and then (2) specifying latent interactions separately with these parts (Preacher et al., 2016; Zyphur et al., 2018). Complementing this, recent work on autoregressive confirmatory factor analysis (AR-CFA) shows how such effects can be tested with latent interactions (see Ozkok et al., 2019). To adapt this research for CLPMs, we start by describing three types of interactions with panel data: purely within-level (WxW) interactions among dynamic factors; cross-level (BxW) interactions among stable B and dynamic W factors; and purely between-level (BxB) interactions among stable factors. We then elaborate on other cases.
Purely Within-Level or WxW Interactions
Many studies that test interactions using CLPMs seek to examine how CL effects vary across the levels of another dynamic variable. For example, Zablah et al. (2016) investigate whether the effect of past customer satisfaction on future job satisfaction is moderated by past customer engagement. In such cases, lagged predictors and moderators are often described as time-varying (alluding to W effects) and discussions of the effects imply that time-varying moderating variables are of interest. This focus on the dynamic W parts of variables also exists in studies of time-varying moderators of AR terms, such as investigations of how time-varying stress moderates the persistence (i.e., AR term) of emotional states (see Koval & Kuppens, 2012). Thus, the literature has various examples that analyze W interactions between lagged predictors (
With an interest in forming latent interactions among W factors we first present purely W moderation effects. We show the simplest WxW case that does not require any external moderators and instead forms latent interactions among the lagged W components of x and y from Eqs. 3 and 4, which are always available in CLPMs (see Figure 2a):

CLPM for x and y at three occasions with latent WxW interactions.

CLPM for x and y at three occasions with latent BxW interactions.

CLPM for x and y at three occasions with latent BxB interactions.

CLPM for x and y at three occasions with latent WxW, BxW, and BxB interactions.
The point is that only the W part of each variable should form product terms to estimate a purely W interaction. This means that the stable B components are not only controlled, they also do not bias estimates of the interaction among x and y (a concern in the multilevel literature; see Preacher et al., 2016). Indeed, this would apply even when assessing WxW effects among variables that are relatively stable over time (e.g., personality), because these might still exhibit some W variation (e.g., across contexts; Fleeson & Gallagher, 2009; Tett & Burnett, 2003) and tests of any dynamic W parts should not be biased by their stable B parts. Finally, our exploration of WxW effects shows how estimating these has always been possible in CLPMs because, by design, these models always include at least two lagged predictors. This realization opens up new opportunities to use CLPMs to test interactions—consider that every published CLPM can be re-estimated to test WxW interactions.
Cross-Level or BxW Interactions
Next, consider that time-invariant B variables, such as the stable components of personality, gender, or the environment may moderate lagged relationships in the W model. This is a familiar type of cross-level interaction that in CLPMs involves a B variable moderating a lagged W effect. An example from the emotions literature is that stable trait neuroticism (B) is associated with stronger persistence of negative emotions over time (i.e., slower reversion to the mean), such that the W AR terms for negative emotions are larger at higher levels of the stable B variable neuroticism (Koval et al., 2016; Suls et al., 1998). Similarly, the stable B part of negative emotions measured over time may moderate its own W AR effect (for insight see Figure 3 in Hamaker et al., 2018). Cross-level interactions may be present elsewhere, such as in Zablah et al. (2016), if the lagged W effect of customer satisfaction

Graphic representation of the full model with estimates.
For this, the B and W parts of observed variables must be decomposed or else the product terms will conflate W and B variation when estimating interactions (Preacher et al., 2016). To illustrate a solution to this problem, below we study how stable B gender
To formalize this familiar type of cross-level moderation we start with Eqs. 3 and 4, adding latent interactions among B and W terms (see Figure 2b). The equations are rather cumbersome, so for concision we show only the model for y. For clarity, we separate the purely W effects (including the WxW term) from the cross-level terms wherein a B variable moderates AR and CL effects in the W model. For the cross-level terms, Equation (11) describes a single B variable z moderating both W effects:
Here, we focus on interpreting the effects in Equation (12) for readers who prefer either a traditional interaction/moderation depiction of coefficients or a multilevel random-slope depiction of cross-level interaction effects (as in Raudenbush & Bryk, 2002). We begin with a traditional depiction wherein ‘main effects’ and interactions are grouped as a compound coefficient on each primary lagged W predictor in the model, showing interactions separately for AR and CL terms as follows:
Alternatively, the same BxW terms can be shown in a multilevel random-slope style as follows, with arbitrary but conceptually useful equations for the AR and CL coefficients:
This formulation clarifies the typical ‘slopes-as-outcome’ interpretation of multilevel interactions wherein the AR and CL coefficients
As our approach thus far makes clear, not only is it possible to conceptualize W interactions in CLPMs, but it is also possible to formulate cross-level BxW interactions because panel data imply B components that may serve as moderators of W lagged effects. Seen this way, the bulk of the literature on multilevel moderation appears relevant to CLPMs including for interpreting effects. For example, when B moderators are mean-centered the ‘main effects’
Purely Between-Level or BxB Interactions
Many researchers propose effects among variables that are conceptualized as having strong stable components, such as personality and job performance (see Barrick & Mount, 1991; Tett et al., 1991). In these cases, it is possible to test interactions among B variables. For example, Zablah et al. (2016) could have focused their theorizing on B effects, such that the stable B part of customer satisfaction predicts the stable B part of job satisfaction, and this effect is moderated by the stable B part of customer engagement. This model would describe a purely B form of moderation, which we refer to as a BxB interaction. This type of interaction can involve observed or latent variables at the B level. For example, gender could be treated as an observed B moderator
With this kind of approach, the B model for y in Equation (7) could be shown as involving a B moderator
Nonlinearity and Higher-Order Interactions
With these three types of interactions, more complex and interesting pictures of dynamic systems are available for theory testing (see Figure 2d for all interactions in one model). Yet, our approach also allows incorporating nonlinearity and higher-order interactions. Consider Lian et al.'s (2014) three-way interaction wherein abusive supervision, self-control capacity, and intention to quit at t = 1 were modeled as interacting to predict organizational deviance at t = 2. Their model focused on W relationships but did not account for stable B components. With our approach, building on Eqs. 9 and 10, it is possible to model pure WxWxW interactions of three variables by forming latent W interactions in a CLPM. A WxWxB interaction could also be modeled by using the B part of a variable and forming a three-way interaction with two W lagged predictors.
As another example, consider Lin et al. (2017) who studied a type of nonlinear moderation using squared predictors multiplied by a moderator. Specifically, perceived underemployment at t = 1 and task crafting at t = 2 had an inverted U-shaped relationship (modelled by including a squared term for underemployment), which was moderated by organizational identification at t = 1. Using our approach (assuming a sufficient number of measurement occasions) it would be possible to model the purely W nonlinear effect of underemployment on task crafting as a latent WxW interaction of underemployment with itself (what we call a
Although the possibilities for testing nonlinear and higher-order interactions are essentially unlimited, we point out five potential cases that should be of primary interest and rather intuitive for most researchers. Notably, our examples here contain a variety of different types of interactions and nonlinear effects, not all of which may be of interest in any given application, but which we show for completeness. The first is a purely W quadratic effects or
Next is the case of a quadratic B effect, which we refer to as
Now consider three unique cross-level moderation cases that combine the logic of the two previous
In the second case of nonlinear Bx
In the third case of nonlinear cross-level moderation,
In all cases above, our approach decomposes the dynamic W and stable B parts of observed variables before forming product terms. This yields unbiased estimates of same-level and cross-level effects in CLPMs, while helping researchers focus on theory and hypothesis tests that are sensitive to interactions at different levels of analysis. As a beneficial byproduct, any heteroskedasticity that would otherwise exist in residuals can be accounted for by the interaction effects in the model (Raudenbush & Bryk, 2002). Before continuing, however, we would emphasize that simply because so many interactions and nonlinearities can be modeled does not imply that they should be. Indeed, a benefit of ‘unconflating’ B and W effects is that researchers need only model effects that are theoretically relevant.
Confounding in Tests of Interaction Effects
Given our presentation thus far, it is now straightforward to show how confounding occurs when variables are not disaggregated into B and W parts. For example, when the product of x and y is included in a typical CLPM, the net result can be shown as follows for the y variable (again, temporarily ignoring intercepts or an ‘occasion effect’
However, the multilevel decomposition in Equations (1) and (2) suggests that this formulation has various potential shortcomings, which we help illustrate by using a compound residual term
By not decomposing y and x into their level-specific components, interaction terms that conflate B and W parts can be shown by expanding the
Method
Participants
Data were taken from the Swiss Household Panel (SHP) survey, which is a nationally representative random sample measured annually, beginning in 1999 with around 5,000 Swiss households and adding two additional samples in 2004 and 2013 (see Voorpostel et al., 2018). Individual household members are surveyed via telephone on a broad range of topics including health, employment, and leisure activities. The SHP has been used extensively in fields such as public health (e.g., Knecht et al., 2011), sociology (e.g., Oesch & Lipps, 2013), and economics (e.g., Frei & Sousa-Poza, 2012).
In the present study, we used data collected in the years 2004 through 2007. We chose 2004 as the onset of our study because this year marked the first that all study variables were measured annually, and the first year that included a second SHP sample, which increased our sample size. Although the SHP's sample included around 10,000 working people in 1999, due to attrition this number decreased to 6,000 in 2004. The addition of a second sample in 2004 increased the number of working participants to over 13,000. Furthermore, the waves 2004 to 2007 consistently demonstrated response rates of more than 70%.
We included all working individuals that responded to at least one of the four waves surveyed between 2004 and 2007. This brought our sample size to 7,748 individuals. The participants in our sample (51.6% women, 48.4% men) were on average 39.28 (SD = 14.50) years old and had 12.98 (SD = 3.20) years of education.
Measures
Descriptive statistics are reported in Table 1. We used measures of job satisfaction and work-family conflict. Past research shows that behavior-based types of work-family conflict (i.e., behavioral incompatibilities of different roles) have the strongest relationship to job satisfaction, particularly composite-based satisfaction (averaged across facets) rather than global measures (Bruck et al., 2002).
Descriptive Statistics and Correlations of Study Variables.
Note: N = 7,748 from the Swiss Household Panel (SHP) with variables named in accordance with the SHP waves; J9-J6 = job satisfaction measured in 2007–2004; C9-C6 = work-family conflict measured in 2007–2004; H9-H6 = hours worked measured in 2007–2004; G = gender coded 0 for men and 1 for women; σ2 = estimated population variance.
Job Satisfaction
Participants indicated the extent to which they were satisfied with their job by responding to six items created for the SHP (0 = not at all satisfied to 10 = completely satisfied). The lead-in question was “Can you indicate your degree of satisfaction for each of the following points?” The six items were “your job in general”, “the level of interest in your tasks”, “the amount of your work”, “the income you get from your job”, “your work conditions”, and “the atmosphere between you and your colleagues.” For simplicity, we computed observed-variable mean scores across these six job satisfaction items. However, alternative higher-order latent variable models would also be possible given the general flexibility of SEMs (see Mulder & Hamaker, 2021). The alpha coefficients for the four waves were 0.78, 0.78, 0.79, and 0.79, respectively.
Work-Family Conflict
Participants rated the extent to which their work negatively impacted their family life by a single-item measure: “How strongly does your work interfere with your private activities and family obligations more than you would want this to be?” (0 = not at all to 10 = very strongly).
Hours Worked
Participants reported the number of hours worked per week, which we included as an observed variable in our model with the same W and B decomposition to control for hours worked per week when interpreting all effects. Before analysis we multiplied this variable by 4 to improve convergence when setting observed variances to small values as noted in the next section.
Gender
In every wave, participants indicated their gender using a dummy variable (0 = man, 1 = woman). Responses across all participants were invariant across all years, allowing this to serve as a purely B observed variable in our CLPM as described next.
Specification and Estimation
We first offer a baseline model decomposing W and B parts of observed variables (as shown in Equations. (1) and (2)). This model includes lagged effects for the W parts (as in Equations (5)and (6)); a B model with stable B job satisfaction predicted by stable B work-family conflict, gender, and hours worked (as in Eq. 7); stable B work-family conflict predicted by gender and hours worked; and gender predicting B hours worked. Details on this SEM specification can be found in Hamaker et al. (2015), including the addition of a time-varying intercept
To estimate our models, we considered using latent moderated structural equations or LMS (e.g., Preacher et al., 2016), but this is computationally infeasible in high dimensions (Zyphur et al., 2018). This problem is addressed with the Bayes estimator in Mplus, which provides optimal estimates and faster compute times using parallel processing (Asparouhov & Muthén, 2020). To do this requires first checking compute times for 400 iterations with different numbers of physical (rather than logical or ‘virtual’) processing cores, which we did for up to 48 cores using Amazon Web Services (AWS) ‘c5.24xlarge’ EC2 instances (with 48 physical and 96 virtual cores). We found that, in a Windows environment, 4 or 8 cores were optimal for models with fewer interactions, but with many interactions 16 cores was best. However, in the same AWS platform, we found that in a Linux environment estimation scaled much better with higher core counts. For example, estimating the WxW model below with 60,000 total iterations (as we note below, thinning every other iteration and requesting 30,000 iterations), we observed the following estimation times with our comparatively large model: 2 cores = 25.92 h; 4 cores = 13.33 h; 8 cores = 7 h; 16 cores = 3.58 h; 32 cores = 1.83 h; 48 cores = 1.36 h. As such, we recommend using a Linux environment rather than Windows, particularly given the Linux instance is roughly half price on AWS (∼$4.50 hly vs. Windows at ∼$9.00 hly).
For all models we checked convergence using posterior scale reduction (PSR) values < 1.05, at least doubling iterations after this to ensure stable convergence (Asparouhov & Muthén, 2010; Zyphur & Oswald, 2015). For this we were conservative and estimated more than will typically be required for other researchers using similar data, for example thinning every other iteration with 30,000 iterations set, for a total of 60,000 in the WxW case. To aid in convergence, it is notable that decomposing the W and B parts of observed variables requires accounting for all variance in an observed variable, which implies zero residual variances. This causes slow convergence with Bayes estimators, so we set residual variances to roughly 1–3% of the total observed variation to improve convergence (Asparouhov & Muthén, 2020), while allowing the decomposed W and B variances to be freely estimated.
Results for Real-Data Example
All model results are shown in Table 2 in a side-by-side format to allow easy comparisons of parameter estimates across models. We present results for each model separately, starting with the baseline model to help draw comparisons with those that follow, including interaction effects and finally the conflated model that does not decompose B and W parts with observed-variable products for interactions. Standardized effects can be computed using formulas presented in Asparouhov and Muthén (2020), but for brevity we present raw coefficients, as is often done in multilevel literature testing interactions. For brevity, Figure 3 contains results only for the full model with all effects except for the paths involving the control variable (hours worked) and with only two occasions of measurement shown to aid readability. Also, all reported p-values are two-tailed and calculated using Bayesian posteriors SDs (rather than their frequentist counterpart SEs).
Summary of Model Estimates.
Note: All estimates are unstandardized.
W = Within; B = Between; J = Job satisfaction; C = Work-family conflict; G = Gender; H = Hours worked; Arrows represent direction.
*p < 0.05, **p < 0.01, ***p < 0.001. Source: Swiss Household Panel (SHP).
The following CLPM was implemented in Mplus (see the online files Baseline CLPM.inp and .out) to compute baseline W and B parts and slope coefficients as follows:
The AR effects for work-family conflict, job satisfaction, and hours worked were significant with
Estimates for B terms supported a negative effect of average work-family conflict on average job satisfaction as
In sum, the baseline model effects are important because researchers who use CLPMs are interested in knowing what dynamic effects are present in the W parts of their models. The B model is also relevant for those who are comfortable with their model specification and any inferences that follow from it. However, given the possibility of estimating different types of latent interactions, this typical CLPM may omit important moderation effects.
Moving on to the WxW interactions, we found statistically significant effects. First, when predicting work-family conflict, we found a positive interaction effect

Plot of The WxW model's interaction effect of work-family conflict (C) and Job satisfaction (J). Note: Plotted using only AR and CL effects for simplicity, showing how the W part of C and the W part J can interact to influence the future state of the W part of C (from the model estimates: C Wt +1 = .087 * C Wt − .064 * J Wt + .044 C Wt * J Wt ).

Plot of The WxW model's interaction effect of work-family conflict (C) and Job satisfaction (J). Note: Plotted using only AR and CL effects for simplicity, showing how the W part of C and the W part J can interact to influence the future state of W part of J (from the model estimates: J Wt +1 = .197 * J Wt + .005 * C Wt + .03 C Wt * J Wt ).
Again, as in the WxW model, here the B effects were very similar to the baseline, with average hours worked associated with increased job satisfaction,
Finally, the estimated effect of gender on average job satisfaction was similar yet slightly smaller,
Starting with AR terms, we estimated
For average (stable B) job satisfaction and work-family conflict the story is simpler, as greater levels of both appeared to increase persistence of dynamic W perturbations for job satisfaction, respectively
In terms of CL effects, accounting for cross-level interactions leads all of these to become significant compared to the baseline model, with all terms interpreted as the average CL effect across the sample for men (when gender = 0) as follows:
In terms of gender: women might have a larger W effect of job satisfaction on work-family conflict but this interaction is not significant,
In terms of the B model, cross-level interactions appear to have reduced uncertainty in these estimates, leading to smaller p-values. However, cross-level interactions also markedly change the point estimates, which was initially vexing given that cross-level interactions should impact only W rather than B parameters (Preacher et al., 2016). Specifically, the B estimates were:
Importantly, the reason for these changes appears to be the use of the B parts of the job satisfaction and work-family conflict variables for BxW interactions. In brief, the B parts of these or any other variables may correlate with the B variation in the W effects implied by cross-level interactions—in the multilevel context this would be typical covariation among a random intercept (stable B part) and a random slope for a W coefficient (implying a potential cross-level interaction). By estimating BxW interactions using the B parts of the observed variables, this controls for the B part of job satisfaction and work-family conflict associated with BxW interactions. Thus, when introducing BxW interactions we eliminate any B variance in the observed variables that is correlated with the ‘random slope’ implied by the BxW interactions, which in turn adjusts the B parameters. Therefore, researchers may want to interpret B parameters separately from models that include such BxW interactions.
Second, the full model contains two WxW interactions whereas the BxW does not, and therefore it is relevant to compare the WxW terms against those in the full model. Consistent with accounting for significant variance with the addition of many interaction terms in the full model, the WxW interactions in the full model have smaller p-values (the estimates are also somewhat larger, but this may be merely due to variance caused by the estimator).
AR effects were all significant and substantially larger than in the baseline model:
The CL terms were also very different, including two that became significant in the conflated model. First, job satisfaction predicting work-family conflict
In terms of gender, a stable B variable, we used this to predict all occasions past the first (i.e., t > 1). Although this specification is uncommon (e.g., Shin & Konrad, 2017), it is the correct way to control for a B variable by using it to predict all dependent variables in the model. Results show
Finally, the product term estimates were not significant, with
Simulations
To provide the reader with some preliminary insights into how CLPMs with latent interactions perform, we also ran simulations for the two cases that we believe will be most common in future research using our approach: purely within (WxW) models and (BxW) models. In the WxW case, this allowed us to compare our approach to observed-means centering and an uncentered approach with raw variables (this uncentered approach is equivalent to the ‘conflated model’ described previously and in Table 2 as a traditional CLPM; for all simulation results see Table 3). Alternatively, in the BxW case, we compared our approach to only observed-means centering because it would seem incoherent to propose a cross-level interaction but use uncentered variables to test this (see results in Tables 4a and 4b). As we will now describe, our proposed CLPM approach outperformed the alternatives, and for some parameters drastically so.
Results of Simulation Study 1 Examining WxW Interaction Effects.
Note. Parameter estimates are based on 500 replications. The AR effects were fixed at 0.5. The CL effects were fixed at 0.3. The B covariance was fixed at 0.3. Each replication has N = 300. Bias was calculated using the formula (Population parameter – Parameter estimate) / (Population parameter). 95% Coverage indicates the proportion of replications whose 95% confidence intervals include the population parameter. Proportion of Significant Coefficients indicates the proportion of replications for which the null hypothesis is rejected at p = .05. This value represents the power when the population parameter is nonzero and the Type-I error rate when the population parameter is zero.
Results of Simulation Study 2 Examining BxW Interaction Effects for N = 300 (with T = 3/T = 4 Results Shown).
Note. Parameter estimates are based on 500 replications. The first value in a cell indicates the findings for T = 3. The second value in a cell indicates the findings for T = 4. The AR effects were fixed at 0.5. The CL effects were fixed at 0.3. The B covariance was fixed at 0.3. Each replication has N = 300. Bias was calculated using the formula (Population parameter – Parameter estimate) / (Population parameter). 95% Coverage indicates the proportion of replications whose 95% confidence intervals include the population parameter. Proportion of Significant Coefficient indicates the proportion of replications for which the null hypothesis is rejected at p = .05. This value represents the power when the population parameter is nonzero and the Type-I error rate when the population parameter is zero.
Results of Simulation Study 2 Examining BxW Interaction Effects for N = 2,000 (with T = 3/T = 4 Results Shown).
Note. Parameter estimates are based on 500 replications. The first value in a cell indicates the findings for T = 3. The second value in a cell indicates the findings for T = 4. The AR effects were fixed at 0.5. The CL effects were fixed at 0.3. The B covariance was fixed at 0.3. Each replication has N = 300. Bias was calculated using the formula (Population parameter – Parameter estimate) / (Population parameter). 95% Coverage indicates the proportion of replications whose 95% confidence intervals include the population parameter. Proportion of Significant Coefficient indicates the proportion of replications for which the null hypothesis is rejected at p = .05. This value represents the power when the population parameter is nonzero and the Type-I error rate when the population parameter is zero.
Each simulation condition was replicated 500 times. The true model always had a B variance of 1.0 and a model-implied W variance of 1.0 (adjusting W residuals appropriately to account for W effects), giving an ICC(1) = .50 as is often found in longitudinal data. These unit variances imply estimated AR, CL, and interaction effects can be interpreted as being standardized. In all simulation conditions, AR terms were fixed to .5 to reflect a reasonable degree of persistence and CL terms were fixed at .3 to reflect modest cross-lagged effects. We also specified a covariance of .3 among B components to reflect a moderate correlation among B factors. To improve convergence with the Bayes estimator, we allowed for a small observed-variable residual of .03, so the total observed variance for any variable was 2.03. To account for this, we fixed observed-variable residuals to .03 when estimating our latent CLPM. To ensure that this small variance did not impact the performance of the alternative models, for both the observed-means centering and uncentered approaches we specified a latent variable for each observed variable with variance fixed at .03, but uncorrelated with all other variables, thus adding no estimated parameters and not impacting model df or fit. For both the WxW and the BxW simulations, we varied the interaction effects in three conditions: fixed to −0.3, 0, and 0.3 reflecting moderately negative, zero, and moderately positive interaction effects, respectively. We chose these values to be in line with effect sizes found in prior work (e.g., Ernst Kossek & Ozeki, 1998, reported a meta-analyzed relationship of around -.3 between work-family conflict and job satisfaction).
In Simulation Study 1, we generated three waves of data (T = 3), as would be fairly common in applied research, and we set the sample size to N = 300 with a WxW interaction among the W parts of two observed variables x and y. This Simulation Study 1 thus has a 1 × 1 × 3 design: 1 (Sample size: N = 300) by 1 (Occasion: T = 3) by 3 (Interaction effect sizes: −.3, 0, .3). Results shown in Table 3 support the latent means approach, showing good statistical coverage and only comparatively small levels of bias for all parameters across all conditions. Alternatively, the observed-means approach underestimates AR terms, the uncentered approach overestimates AR terms, both of these methods underestimate CL terms to different degrees, and only in the no-interaction condition do these approaches offer good estimates of the interaction effect—in the −.3 and .3 interaction cases, the estimates are markedly shrunk towards zero.
Simulation Study 2 examines BxW interactions with a single interaction among the B part of y and the W part of x to predict the future W part of y (shown as XW*YB →YW in Tables 4a and 4b), contrasting our proposed latent CLPM approach with an observed-means centering approach. In this simulation, we set T = 3 and N = 300 (shown in Table 4a), but because cross-level interactions are known to be sensitive to the number of lower-level observations per higher-level unit (see Raudenbush & Bryk, 2002), we further varied the number of waves to T = 4. Also, for comparison, we included conditions with the sample size of N = 2,000 (see Table 4b). 4 Simulation Study 2 thus has a 2 × 2 × 3 design: 2 (Sample size: N = 300 and N = 2,000) by 2 (Occasion: T = 3 and T = 4) by 3 (Interaction effect sizes: −.3, 0, .3). Results show that the latent CLPM approach outperforms the observed-means centering method in all cases, with a similar pattern for the observed-means approach: underestimated AR terms; CL terms shrunk towards zero; and cross-level interactions markedly shrunk towards zero. Notably, the larger sample size of N = 2,000 in Table 4b does not appear to improve estimates very much, but the larger T = 4 sample size does improve estimates for the observed-means case—this is to be expected given the longitudinal panel data modeling literature on the topic (see Nickell, 1981; Zyphur, Allison, et al., 2020; Zyphur, Voelkle, et al., 2020).
Before continuing, a relevant finding in both studies was that fewer latent CLPMs converged when interaction effects were zero, particularly with smaller samples (see Tables 3, 4a, and 4b). This may point to inherent instabilities with very small interaction effects, but in our view, this more likely suggests that a larger number of iterations may be required in the presence of very small interaction effects and small samples—we used a maximum of 20,000 iterations in our simulation runs. Requesting more iterations or thinning iterations should therefore alleviate these issues—something that is more feasible when estimating only one data set instead of 500, as was the case in our simulations. Conveniently, estimation becomes faster as sample sizes decrease, and therefore increasing the number of iterations should not present computational issues in practice. In sum, our results suggest taking a latent mean centering approach over others for estimating interactions in CLPMs. In both simulation studies, we find that our proposed approach outperforms both an observed-means as well as uncentered method with raw variables, providing less biased estimates with greater coverage to improve statistical inference. The full simulation files and results can be found in our Online Appendices.
Overview of Software Packages for Estimating Latent Interactions in SEMs.
Discussion
This paper describes how to estimate interactions among stable (between-level or B) and/or time-varying (within-level or W) parts of latent variables in CLPMs. We propose that prior research examining interaction effects in CLPMs overlooks the potential importance of separating different types of (latent) interaction effects. As a consequence, interpretations of interaction effects in past CLPMs and conclusions with regards to hypothesis testing may have been flawed. At the same time, some researchers may have concluded that interaction effects did not exist when, in fact, they did if B and W components had been properly separated. To address this issue and assist researchers interested in testing interaction effects in CLPMs, we offer a latent interactions approach that can be used to detect previously unconsidered forms of moderation involving B and/or W parts of predictors. As noted above, although not all of the many types of interactions and nonlinear effects that our framework allows may be of interest, the convenient feature of our proposed method is that any one or more of these effects can be precisely specified and estimated to test theory, while excluding others that are not theoretically relevant.
Conveniently, our approach can be easily applied to other methods that have a similar process of forming latent W terms, including the general cross-lagged panel model or GCLM wherein W terms are referred to as ‘impulses’ u (see Zyphur, Allison, et al., 2020); the ‘xtdpdml’ panel data model (see Allison et al., 2017; Moral-Benito et al., 2019; Williams et al., 2018); as well as other methods (e.g., Bollen & Brand, 2010). In general, whenever a lagged-effects structure exists that separates stable B and dynamic W components, we recommend formally delineating WxW, BxB, and BxW interactions—although in other models the B terms may need to be rescaled (for discussion see Zyphur, Allison, et al., 2020).
Furthermore, it is notable that WxW and BxW interactions can be applied to indirect effects using AR and CL terms (for general discussion see Preacher et al., 2016; Zyphur et al., 2018). As Zyphur, Allison, et al. (2020) note, the logic of an ‘impulse response’ can be used to model the long-run effects of a 1-unit change in a predictor variable as effects ‘flow’ through a system over time along AR and CL paths. The implied impulse responses at low and high levels of a moderating variable involved in an interaction can be plotted to gain insight into potential long-run effects at different values of the W or B components of variables involved in an interaction. The approach we take here can also be extended to additional cases, including latent-indicator models to account for measurement error or multiple-group models (see Mulder & Hamaker, 2021), which can be easily implemented in SEM (see Little, 2013), as well as indirect-effects in the B model.
In terms of our applied example, prior research has come a long way in attempts to tease apart causal associations and potential interactions/moderation effects. However, much of this work has not accounted for potential stable B versus dynamic W components (e.g., Spector et al., 2007). For example, Grandey et al. (2005) found that an annual change in job satisfaction was predicted by work-family conflict for women only. Alternatively, using our data and our B and W decomposition with latent interactions we found an effect of past work-family conflict on future job satisfaction, which was actually stronger for men than for women, after accounting for hours worked. Using the framework we advance here, future studies can better interrogate the causal relationships among these variables including potential higher-order interactions and forms of nonlinearity.
Limitations and Future Research
We note four key limitations of our proposed method. First, although our method is computationally tractable, it can still be burdensome to estimate many interaction effects in large datasets. To speed up estimation, we used an AWS EC2 instance with up to 48 physical cores (96 virtual cores), which could still take a few hours to converge—along with relevant checks for convergence as recommended by Asparouhov and Muthén (2020). Run on a Linux instance rather than Windows allowed using all 48 cores to speed up processing. However, discovering this fact by running Windows and Linux instances on AWS for quite some time, including thousands of simulation runs, cost our research team roughly $3,000 USD—an expense future researchers will not need to incur if using cheaper and much faster Linux instances on AWS. Furthermore, our sample size was comparatively large with almost 8,000 people measured at 4 occasions for 3 separate variables, resulting in 12 latent W variables and 3 latent B variables. Hopefully our efforts will have removed much of the required trial and error, and costs, associated with latent interaction CLPMs.
Furthermore, future research may be able to draw from advances in the field of Bayesian statistics to reduce convergence time (e.g., Hamiltonian Monte Carlo; see Gelman et al., 2013) and utilize upcoming CPUs with even greater core counts and higher performance. Also, in practice, researchers can exclude any latent interactions that are not theoretically relevant, thus reducing computation times. In fact, for Simulation Study 1, it took about 50 h to estimate a total of 1,500 models involving two WxW interaction effects (T = 3, N = 300) with only 2 time-series variables x and y using our proposed CLPM approach on a standard office computer. Assuming this sample size will be relatively standard, this is a modest 2 min of estimation time per model.
The second limitation is that our approach is currently feasible only in Mplus using a wide (single-level) SEM framework and is thus limited to approximately 10 occasions of measurement (T = 10). Mplus is currently the only available software package, to our knowledge, that can deal effectively with many latent interactions in SEMs. For the interested reader, we present commonly-used software packages and their limitations with regards to our proposed approach in Table 5. We hope that future work will provide researchers with alternatives to proprietary software. Regarding the number of measurement occasions, we suppose that for T > 10, there may be too many observed variables to ensure reasonable computation times. In such cases, researchers may opt for a multilevel framework such as DSEM (see Asparouhov et al., 2018), which currently allows cross-level BxW interactions but not yet latent WxW or BxB interactions (see Asparouhov et al., 2018; Asparouhov & Muthén, 2019; Hamaker & Muthén, 2020).
Relatedly, a third limitation is that our proposed approach makes it burdensome to estimate ‘random slopes’ for AR and CL terms, as in multilevel models (including DSEM). Although BxW interactions typically involve a ‘slopes-as-outcomes’ model, where a W effect is allowed to have a B variance (see Raudenbush & Bryk, 2002), the model input and logic in the single-level case (such as ours) is rather complicated. Although it is technically possible to specify a latent variable to capture B variation in AR and CL terms, results of our second simulation study suggest that estimates of this B variation will probably not be trustworthy given the small T involved (contrast T = 3 and T = 4 results in Tables 4a and 4b). Therefore, we have not pursued this approach here and we do not recommend it. Instead, we recommend our approach wherein BxW terms are treated as fixed, which the reader can selectively interpret as a slopes-as-outcomes model that excludes a B residual for the slope.
The fourth limitation is the fact that models that include BxW interactions with the stable B part of a longitudinal variable may bias estimates of purely B regressions, and likewise B regressions may bias the estimation of BxW interactions. This may occur with few occasions of measurement because of the inherent uncertainty in B estimates that are involved in these model components with small T. Therefore, if researchers are interested in testing hypotheses associated with purely B effects, then these should likely be evaluated in models that exclude BxW interactions among the B variables involved. However, we would point out that this does not seem like a very serious problem given that the entire point of lagged-effects models, such as CLPMs, is to avoid having to make causal inferences based on purely B ‘effects’. Instead, we recommend focusing on AR and CL terms in the W part of a CLPM and estimating WxW and BxW interactions as needed to test theory (while allowing for covariance among B parts to hold them constant, i.e., a ‘fixed-effects’ method).
In terms of future research, we propose that a key direction for innovation is the development of standardization methods for latent interaction effects in CLPMs. Typically, obtaining standardized effect estimates when testing interactions is done by standardizing variables before forming product terms. However, as we have decomposed observed variables into latent B and W parts for analysis, standardizing observed variables before analysis is not possible. Instead, it is theoretically possible to standardize the latent B and W variables in our model by imposing a unit variance and zero mean on them. To this end, Asparouhov and Muthén (2020) suggest a standardization approach that could overcome the need to impose such a model specification. When doing so, previous work has pointed out that it may be important to account for B variation in W variances used for standardization (see Schuurman et al., 2016). Future research should examine how this process can be facilitated in the context of latent interaction effects like those we have presented here.
Supplemental Material
sj-rar-1-orm-10.1177_10944281211043733 - Supplemental material for Interaction Effects in Cross-Lagged Panel Models: SEM with Latent Interactions Applied to Work-Family Conflict, Job Satisfaction, and Gender
Supplemental material, sj-rar-1-orm-10.1177_10944281211043733 for Interaction Effects in Cross-Lagged Panel Models: SEM with Latent Interactions Applied to Work-Family Conflict, Job Satisfaction, and Gender by Ozlem Ozkok, Manuel J. Vaulont and Michael J. Zyphur, Zhen Zhang, Kristopher J. Preacher, Peter Koval, Yixia Zheng in Organizational Research Methods
Footnotes
Acknowledgment
The first three authors Ozlem Ozkok, Manuel J. Vaulont, and Michael J. Zyphur contributed equally to this manuscript. All authors are grateful for input provided by Bengt Muthén, Linda Muthén, and Tihomir Asparouhov. This study has been realized using data collected by the Swiss Household Panel (SHP), which is based at the Swiss Centre of Expertise in the Social Sciences FORS. The SHP project is supported by the Swiss National Science Foundation.
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biographies
![]()
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
