Abstract
Several studies have stressed the importance of simultaneously estimating interaction and quadratic effects in multiple regression analyses, even if theory only suggests an interaction effect should be present. Specifically, past studies suggested that failing to simultaneously include quadratic effects when testing for interaction effects could result in Type I errors, Type II errors, or misleading interactions. Research investigating this issue has been limited to multiple regression models. Contrarily, structural equation modeling is a more appropriate analysis when hypotheses include latent variables. The current study utilized Monte Carlo simulation to investigate whether quadratic effects should be included in the latent variable interaction model. Consistent with previous research, it was found that including latent variable quadratic effects in the model successfully reduced the frequency of spurious interaction effects but at a cost of low power to detect true interaction effects, inaccurate parameter estimates, inaccurate standard error estimates, and reduced convergence rates. Based on findings from the current study, we recommend that researchers hypothesizing interactions between latent variables should test for these relations using the latent variable interaction model rather than the interaction quadratic model. If researchers are concerned about spurious interactions, then they may want to consider including quadratic effects in the model, provided that they have sample sizes of at least 500 and high indicator reliability. We caution all researchers to base higher order effects models on theory.
Keywords
Structural equation modeling (SEM) refers to an analytic method for modeling a hypothesized pattern of linear relations among a set of observed and latent variables (Bollen, 1989). In recent years, this type of modeling has been extended to integrate nonlinear effects in the structural model. Like their manifest (nonlatent) variable regression counterparts, structural models that presume an additive, linear relation among latent variables may be too inflexible to adequately account for the complexities of the exogenous–endogenous relation often found in practice (Harring, 2013; Wall, 2009; Wall & Amemiya, 2007). For example, if a slope relating a continuous latent predictor and endogenous criterion is thought to systematically change across levels of a second latent predictor, the inclusion of a cross-product term could augment the simple additive structural model in a straightforward manner to account for this relation, which is often referred to as an interaction effect (Kenny & Judd, 1984).
Another example in which the inclusion of a cross-product term could be beneficial is the case of a quadratic relation in which the effect of the exogenous variable on the endogenous variable changes depending on the level of the exogenous variable itself. For example, consider an exogenous variable,
Several studies have stressed the importance of simultaneously estimating interaction and quadratic effects in multiple regression analyses to control for spurious interactions (Cortina, 1993; Ganzach, 1997; Lubinski & Humphreys, 1990). Because multiple regression for measured variables can be cast as a special case of regression models falling within the SEM framework, it may appear on the surface that the results from these past studies would generalize to the SEM context in a straightforward manner. However, we would argue that because SEM takes into account random measurement error of the measured variables that reliability (or lack thereof) of the measures will be pivotal in whether or not these multiplicative effects in a latent variable framework will be spurious. Simulation studies that have studied interaction, quadratic, and other nonlinear effects in a latent framework have been limited in that they have only included simulated data from a population with no interaction effect in order to determine if one (spurious interaction) appears. Previous research has not investigated the impact that testing for unhypothesized quadratic effects could have on Type II errors, and thus power. The current study used Monte Carlo simulation to address this model specification issue in order to provide recommendations to applied researchers hypothesizing these types of relations.
The Evolution of the Constrained Approach
The evolution of estimating interactions and quadratic effects in structural equation models began with Kenny and Judd (1984) who hypothesized the interaction between two latent exogenous variables and used cross-products to represent the latent interaction effect. Hayduk (1987) was the first to expand the Kenny–Judd model to include a latent endogenous variable. Many methods have been suggested to estimate nonlinear latent variable models (Arminger & Muthén, 1998; Bollen & Paxton, 1998; Klein & Moosbrugger, 2000; Ping, 1996; Wall & Amemiya, 2003). A convenient summary of work on the latent variable interaction model is the edited volume by Schumacker and Marcoulides (1998) that featured methods or critical assessments that specifically underscored approaches known as product indicator methods.
In SEM, the LISREL specification for the structural model in which two exogenous variables interact with one another can be written as
where
Jöreskog and Yang (1996) extended the Kenny–Judd model to include a mean structure. They noted that in the Kenny–Judd model, even if the observed variables were mean-centered, their products would not necessarily be mean-centered. Consequently, the latent interaction variable,
Algina and Moulder (2001) extended on the Jöreskog and Yang (1996) model and that advocated by Yang-Wallentin and Jöreskog (2001) by mean-centering the independently observed variables. Algina and Moulder (2001) referred to this model as the “constrained” model and found that it was more likely to converge, was less biased, and had better Type I error control than the Jöreskog and Yang (1996) uncentered model. The Algina and Moulder (2001) constrained approach applied to a latent interaction model imposed four types of constraints on the model.
The first constraint required the loadings for each of the interaction indicators to be constrained to equal the product of the two indicators that created the interaction indicator. The second constraint required the mean of the interaction to be set equal to the covariance between
The Unconstrained Approach
Marsh, Wen, and Hau (2004) introduced an unconstrained model in which all four of the constraints were released. Like the generalized appended product indicator (Wall & Amemiya, 2001), this model also did not require the stringent normality assumption of
For the empirical study described below, we assume that there are three measured indicator variables for each of the two exogenous factors as well as for the endogenous factor. Furthermore, a matched-paired cross-product indicator strategy, proposed by Marsh et al. (2004), is adopted in which the product of measured indicators from each of the linear factors is used to create indicators of the latent interaction term (and subsequent quadratic terms as well). Figure 1 depicts the unconstrained approach with the following specifications. The structural model for the unconstrained approach is the same as in Equation 1. The measurement model connecting the measured variables to the latent variables is specified as
where

A path diagram of a latent interaction model with three observed indicators for each latent variable under the unconstrained model.
The Latent Quadratic Model
Similarly to the latent interaction model, one may express the latent quadratic model as
where
Model Misspecification Issues With Interaction and Quadratic Effects
When first-order terms are perfectly correlated, then there is no need to differentiate between an interaction effect and a quadratic effect because they are the same. When the correlation between first-order terms is zero, then one can clearly see the differences between interaction effects and quadratic effects. However, as the relation between the first-order terms increases (i.e., as multicollinearity between first-order terms increases), it becomes more difficult to differentiate between interaction effects and quadratic effects. Thus, an interaction effect may be statistically significant because an endogenous variable and an exogenous variable are quadratically related, or vice versa.
Several studies have stressed the importance of simultaneously estimating interaction and quadratic effects in multiple regression analyses, even if theory only suggests an interaction effect (Cortina, 1993; Ganzach, 1997; Lubinski & Humphreys, 1990). Specifically, these studies found that failing to simultaneously test for quadratic effects when testing for an interaction effect could have resulted in one of the following situations:
A significant interaction effect may be observed when in reality there is no interaction effect (Type I error, also referred to as a spurious interaction; Cortina, 1993; Lubinski & Humphreys, 1990).
An interaction may appear to be nonsignificant when in fact there is an interaction effect (Type II error; Ganzach, 1997).
The interaction may appear positive when in fact it was negative (misleading interaction; Ganzach, 1997).
There have been mixed recommendations about whether or not one should test for quadratic effects in order to exert Type I error control when interactions are hypothesized (Aiken & West, 1991; Cortina, 1993; Ganzach, 1997; Lubinski & Humphreys, 1990; Shepperd, 1991). Some researchers suggested a more tolerant approach, requiring theory before including nonlinear effects in a model (Aiken & West, 1991). Other researchers stated that nonlinear effects should only be included when the correlation between the first-order effects is nonzero (i.e.,
The majority of previous studies addressing this issue are limited in that they only investigated this recommendation using multiple regression models (Cortina, 1993; Lubinski & Humphreys, 1990; MacCallum & Mar, 1995; Shepperd, 1991). Contradictorily, many social and behavioral science researchers hypothesize interactions between latent variables, which contain measurement error. Therefore, when interactions are hypothesized between latent variables, latent variable analyses (such as SEM) are more appropriate than the traditional measured variable analyses.
To date, one study investigated the simultaneous estimation of quadratic and interaction effects using latent variable models (Klein, Schermelleh-Engel, Moosbrugger, & Augustin, 2009). The authors of this study found that testing for quadratic effects in addition to an interaction effect resulted in lower Type I error and thus fewer spurious interactions. However, this study was limited in that it utilized one approach of testing for nonlinear effects (the latent moderated structures approach) when many other methods of testing for nonlinear effects also exist. Additionally, the simulation design of the Klein et al. (2009) study only investigated spurious interaction effects (i.e., situation one above, but neither two nor three).
While controlling for spurious interaction effects is a valid concern, controlling for Type II errors should also be of concern. If a researcher solely focuses on avoiding spurious interactions, true interactions may go undetected. This is a power issue, which has not been addressed in previous studies. Additionally, testing for quadratic effects that are not based in theory could potentially result in spurious quadratic effects. Thus, when attempting to avoid making a Type I error with regard to an interaction effect, one may in turn make a Type I error with regard to a quadratic effect. The goal of the current study was to use Monte Carlo simulation to investigate whether or not quadratic effects should be included in the latent variable interaction model. This study builds on previous research in that Type II errors, power, and spurious quadratic effects are considered in addition to spurious interaction effects and spurious quadratic effects.
Method
To address the research questions, three models were compared, a first-order effects only model (Model A)
an interaction model like that in Equation 1 (Model B), and an interaction-quadratic model (Model C),
Using these three models, two scenarios were created. In the first scenario, Model A was true, and Models B and C were tested. This scenario allowed for investigation of spurious interactions and spurious quadratic effects. In the second scenario, Model B was true, and Models B and C were tested. This scenario allowed for investigating Type II errors associated with the interaction effect, as well as spurious quadratic effects.
All variables were simulated to come from a population in which
Thus, while
The effect size associated with the interaction effect represents the additional variance that the interaction explains in
Previous studies have set this value at 5% (Harring, Weiss, & Hsu, 2012; Jaccard & Wan, 1995; Klein & Muthén, 2007; Little, Bovaird, & Widaman, 2006; Marsh et al., 2004; Moulder & Algina, 2002; Weiss, 2010). The current study used similar values allowing the interaction to account for 0% (Model A), and 5% of the unique variance in
The measurement model depicted in Figure 1 was used throughout, except to allow for the different structural models. Reliability of the indicators has been known to affect estimates of structural coefficients (Dimitruk, Schermelleh-Engel, Kelava, & Moosbrugger, 2007). Commensurate with past simulation studies (see, e.g., Algina & Moulder, 2001; Jaccard & Wan, 1995) loadings for indicators were set equal to either 0.5 or 0.9 in the population-generating model. Less reliable loadings were thought to result in lower power, which corresponds to an increase in the probability of making a Type II error (Lubinski & Humphreys, 1990). Thus, the less reliable loading condition was chosen to investigate the power Models B and C had to identify real interaction effects. The loading of 0.9, which denotes a high level of measured variable reliability, was chosen based on previous research (Marsh et al., 2004; Weiss, 2010). We chose the loadings to be equal across the indicator variables. This decision was made in part by the practical realities of executing the simulation although we do recognize that in applied research settings that the loadings (and thus the reliabilities) for each observed variable indicator could very well be distinct.
Two different sample sizes were used (referred to as small and large here), which corresponded to n = 100 and n = 500, respectively. The sample sizes were chosen partially based on past simulation studies that used similar sample sizes (Jaccard & Wan, 1995; Klein & Muthén, 2007; Marsh et al., 2004; Moulder & Algina, 2002; Schermelleh-Engel, Klein, & Moosbrugger, 1998). Furthermore, it is known that small sample size has an indirect effect on power. Thus, the sample size of 100 was chosen to investigate the power of Models B and C. Wall and Amemiya (2003) evaluated methods using sample sizes as large as 1,000. However, in preliminary analyses, little difference was found in the bias and precision of parameter estimates between sample sizes of 500 and 1,000.
The correlation between the two first-order latent variables
In summary, the simulation design was 2 (Model A and Model B) × 2 (loadings) × 2 (sample sizes) × 3 (correlations between first-order effects), resulting in 24 conditions. All 24 conditions were analyzed with both Models B and C, resulting in 48 sets of analyses. All data were simulated to come from a normal distribution. Due to the complicated nonlinear constraints associated with the constrained approach, the unconstrained approach was used to test for the interaction and quadratic effects. Previous research has shown that when data are normally distributed, the unconstrained approach is unbiased in most conditions, and almost as unbiased as the constrained approach (Marsh et al., 2004; Weiss, 2010). Data were simulated using SAS 9.2. Maximum likelihood was used to estimate model parameters. The Hessian matrix from the Newton–Raphson algorithm was used in the computation of the standard errors. Data for each of the possible 24 conditions was simulated 1,000 times. Simulated data were analyzed using LISREL 8.8 and SAS 9.2.
Results
Convergence Rates
Table 1 contains the converge rates for each condition. In all conditions, Model B was more likely to converge than Model C. Convergence rates decreased as
Convergence Rates (Expressed in Percentages).
Spurious Interactions
By definition, spurious interactions occur when a statistically significant interaction effect is observed when in reality there is no interaction effect. Spurious interactions are numerically represented by Type I error rates. Type I error for the interaction effect was computed from the proportion of converged solutions that had a statistically significant interaction effect in the simulated data when the population interaction effect was zero (i.e., the proportion of times that
Scenario 1, in which Model A was true and Models B and C were tested, allowed for investigation of spurious interactions. Figure 2 contains the Type I error rates for Scenario 1. When loadings were 0.5, and Model B was tested, Type I error was above the desired .05 level, indicating that there was a higher than desired rate of spurious interactions. When loadings were 0.5, Model C (which included quadratic effects) resulted in successful reduction of Type I error rates. When loadings were 0.9, spurious interactions were not problematic for either Model B or Model C. Altering the correlation between first-order effects did not affect the rate of Type I errors for most conditions. In nearly all conditions, increasing the sample size resulted in lower Type I error rates, with the exception of the Model C,

Type I error for interaction effect (true Model A, tested Models B and C).
Power
Power was calculated as one minus the proportion of converged solutions in which
Figure 3 contains the power estimates when Model B was the true model and Models B and C were tested. As expected, Model B had more power to detect true interaction effects than Model C in all conditions. However, for Model B the conditions in which loadings were 0.9 were the only conditions in which statistical power reached values that were near desirable. That is, when loadings were 0.5, power was low regardless of sample size and regardless of which model was tested. In most conditions, power for both models also decreased as multicollinearity increased.

Empirical power for interaction effect (true Model B, tested Models B and C).
Spurious Quadratic Effects
Both Scenarios 1 and 2 allowed for the investigation of spurious quadratic effects. Type I error for the quadratic effects (i.e., the rate of spurious quadratic effects) was computed by calculating the proportion of converged solutions that had a statistically significant quadratic effect in the simulated data when the population quadratic effect was zero (i.e., the proportion of times that
Similar Type I error rates were found for both quadratic effects in Model C; thus, only the rates for the first quadratic effect associated with

Spurious quadratic effects when Model C is tested (true Model A).

Spurious quadratic effects when Model C is tested (true Model B).
Bias and Relative Bias
Relative bias was also used as an evaluation criterion for investigating spurious interaction effects. Relative bias is defined by the average difference between the parameter estimates and the population-generating value divided by the population-generating parameter value. In the conditions in which the population-generating parameter value
Table 2 contains the bias and relative bias estimates for the interaction effect across the 48 conditions. Values close to zero are desirable because they indicate low bias. Hoogland and Boomsma (1998) suggested that unbiased parameter estimates should be within an absolute value of 0.05. When Model B was tested, bias was low when loadings were 0.9 or when the sample size was 500. The strength of the relation between the first-order factors did not have a meaningful impact on bias across most conditions.
Bias and Relative Bias for the Interaction Effect.
This is an indication that regardless of whether or not an interaction effect is present, testing for the interaction effect results in relatively unbiased estimates of the interaction for the conditions simulated in this study, provided that loadings are 0.9 or the sample size is 500.
When there was no true interaction effect (i.e., Model A was true), Model C also resulted in unbiased estimates of the interaction effect when loadings were 0.9 or sample size was 500. However, when a true interaction effect existed (i.e., Model B was true), Model C yielded biased estimates of the interaction in almost all conditions. In most conditions, this bias was positive, indicating that Model C overestimated the interaction effect.
Precision
Precision was evaluated by calculating a ratio of the average estimated standard error,
Table 3 contains the standard error to standard deviation ratios associated with estimating the interaction effect. When loadings were 0.9, ratios were closer to 1.0 for Model B than for Model C in all conditions. With loadings of 0.9 and sample size of 100, Model C resulted in inaccurate standard error estimates. When loadings were 0.5, both Model B and Model C resulted in inaccurate standard error estimates; in the majority of conditions these standard errors were overestimated.
SE/SD Ratio for the Interaction Effect.
Discussion
Several previous studies have stressed the importance of simultaneously estimating interaction and quadratic effects, even if theory only suggests an interaction effect exists (Cortina, 1993; Ganzach, 1997; Klein et al., 2009; Lubinski & Humphreys, 1990; Shepperd, 1991). The current study investigated the impact that incorporating nontheoretical quadratic effects had on Type I error rates, power, relative bias estimates, standard error estimates, and convergence rates in a latent variable framework. Consistent with previous research, we found that including latent variable quadratic effects in the model successfully reduced the frequency of spurious interaction effects. However, this also resulted in a large reduction of statistical power to detect interaction effects particularly when loadings (and thus reliability) of indicators were 0.5, samples were 100, and first-order terms were strongly, linearly related. Only in the most ideal conditions (loadings of 0.9 and sample size of 500) was statistical power acceptable (.70 or greater) when nontheoretical quadratic effects were included in the model, and in these conditions the interaction-quadratic model resulted in 16% to 79% bias of the interaction effect.
Also consistent with previous research, testing the interaction-only model resulted in spurious interaction effects in some conditions. However, provided that loadings (and thus reliability) of indicators were 0.9, the rate of spurious interaction effects was consistent with alpha, and thus similar to expected values. Also, in most conditions the convergence rates were higher and the parameter estimates and standard error estimates associated with the interaction effect were more accurate with the interaction-only model in comparison to the interaction-quadratic model.
Based on findings from the current study, we recommend that researchers hypothesizing interactions between latent variables test for these relations using the latent variable interaction model, instead of the interaction-quadratic model. The latent variable interaction model, as tested here, had higher convergence rates, adequate Type I error rates, higher power (and thus lower Type II error rates), resulted in unbiased estimates of the interaction effect, and had more accurate standard error estimates across most conditions when compared with the interaction-quadratic model. Our recommendation is contrary to that from previous studies in which it is recommended that researchers should always test for both quadratic and interaction effects (Cortina, 1993; Ganzach, 1997; Klein et al., 2009; Lubinski & Humphreys, 1990).
Of note, the sample size and strength of the relation between first-order effects did not impact the bias of the interaction effects with the interaction-only model. Rather, the reliability of the indicators had the greatest impact on Type I error rates associated with the interaction effect. This finding is similar to those from previous empirical studies (see, e.g., Dimitruk et al., 2007; Jaccard & Wan, 1995). Also, because the relation between first-order factors is related to the value of the interaction effect, relation between first-order factors, and the conditional variance of eta, power is directly related to these effects, and thus we are limited in making direct comparisons of the relation of the first-order factors and statistical power.
While the interaction-quadratic model may not be necessary to estimate interaction effects when loadings (and thus reliability) of indicators is adequate, it could be beneficial in reducing Type I error rates when loadings are low. If researchers are concerned about spurious interactions (e.g., if high-stakes decisions are to be made based on findings of statistically significant interaction effects), then they may want to consider including quadratic effects in the model to reduce spurious findings. However, if non-theoretical quadratic effects are to be included in models for this purpose, sample sizes should be at least 500 and indicator reliability estimates should be high. Of course, caution should be taken in interpreting results when loadings are low with any model, not just the interaction-quadratic model.
Recommendations from this study do not mean that quadratic effects should never be included in latent interaction models. Researchers should test the model that they theorize. That is, if a researcher hypothesizes a quadratic effect, then he or she should test for that effect. This recommendation is consistent with Aiken and West (1991) who state theory should be present before incorporating nonlinear effects into a model. Thus, we caution all researchers to base higher order effects models on theory.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported here was funded by a grant from the Institute of Education Sciences, U.S. Department of Education, to Boston College (#R305A140114). The opinions expressed are those of the authors and do not represent views of the institute or the U.S. Department of Education.
