Abstract
Path models with observed composites based on multiple items (e.g., mean or sum score of the items) are commonly used to test interaction effects. Under this practice, researchers generally assume that the observed composites are measured without errors. In this study, we reviewed and evaluated two alternative methods within the structural equation modeling (SEM) framework, namely, the reliability-adjusted product indicator (RAPI) method and the latent moderated structural equations (LMS) method, which can both flexibly take into account measurement errors. Results showed that both these methods generally produced unbiased estimates of the interaction effects. On the other hand, the path model—without considering measurement errors—led to substantial bias and a low confidence interval coverage rate of nonzero interaction effects. Other findings and implications for future studies are discussed.
Testing interaction effects is an important and common practice in social and behavioral research, as researchers are interested in determining whether the relationship between two variables stays the same or changes depending on the level of a third variable (i.e., the moderator). In practice, both the predictor and the moderator are measured by either a single item (e.g., socioeconomic status, age, or gender) or a scale containing multiple items. For the applications of testing interaction effects with multiple-item exogenous variables, methodologists have proposed several statistical methods within the structural equation modeling (SEM) framework to test this type of interaction effects. These statistical methods are capable of modeling the latent interaction effects while simultaneously taking into account any measurement errors in the items (Jöreskog & Yang, 1996; Kenny & Judd, 1984; Klein & Moosbrugger, 2000; Klein & Muthén, 2007; Lin, Wen, Marsh, & Lin, 2010; Little, Bovaird, & Widaman, 2006; Marsh, Wen, & Hau, 2004; Moulder & Algina, 2002; Wall & Amemiya, 2001).
Despite methodological advancements in recent years, however, applied researchers still generally use observed composites (e.g., the mean or sum from a multiple-item scale) for both the predictor and the moderator when testing interaction effects. For example, a review of the articles (N = 120) published in the Journal of Applied Psychology in 2015 identified 22 (18.3%) articles testing at least one interaction effect using observed composites. 1 Of these 22 articles, only 2 corrected for the measurement errors of the exogenous variables, but in neither study did the authors consider measurement errors in the interaction terms (Eby, Butts, Hoffman, & Sauer, 2015; Mitchell, Vogel, & Folger, 2015). In the remaining 20 (90.9%) articles, all the manifest variables and the corresponding interaction effects were assumed to be measured accurately (i.e., without any measurement errors). These findings echo those of Cole and Preacher (2014), who reviewed 44 issues of seven American Psychological Association journals published in 2011, and found that more than one tenth of the studies conducted path analyses without correcting for measurement errors in the manifest variables. Thus, ignoring measurement errors of the manifest variables and the corresponding interaction effects in path analyses is still quite common. Yet, perfectly reliable manifest variables rarely exist in real data (Cohen, Cohen, West, & Aiken, 2003) and, as a result, path analyses with observed variables uncorrected for measurement errors could result in biased (either under- or overestimated) path coefficients (e.g., Aiken & West, 1991; Busemeyer & Jones, 1983; Cole & Preacher, 2014) and lead to reduced statistical power (e.g., Marsh, Wen, Nagengast, & Hau, 2012).
Given the potential problems raised by failing to properly address measurement errors when observed composites are used, in this study, two alternative methods were reviewed and evaluated: the latent moderated structural equations (LMS) method and the reliability-adjusted product indicator (RAPI) method, both of which can properly take into account measurement errors when testing interaction effects based on observed composite measures. The LMS method, developed by Klein and Moosbrugger (2000), originally focused on testing interaction effect with multiple-indicator exogenous variables. In the present study, we illustrated how to impose error variance constraints on the exogenous variables while using the LMS method to estimate interaction effects based on observed composite variables. With regard to the RAPI method, even though it can be traced back to the 1980s (Bohrnstedt & Marwell, 1978; Busemeyer & Jones, 1983), it has seldom been used in applied research.
To our knowledge, the performance of these two alternative approaches in terms of the estimation accuracy of interaction effects with observed composites has yet to be investigated. Therefore, in the present study, we compared the LMS and the RAPI methods with the commonly used path analysis approach, which assumes no measurement error for all the observed composites and the corresponding interaction effect, under conditions of varying sample sizes, reliability levels, and magnitudes of the interaction effects.
Methods for Estimating Interaction Effect With Composite Scores
As mentioned, the most common way to estimate interaction effects with observed composite scores is by using the traditional path models, assuming that all variables in the model are measurement-error free. Thus, under the traditional path model (see Figure 1), both the predictor and the moderator are presented as observed variables and are assumed to be measurement-error free. On the contrary, the distribution analytic method (see Figure 2) and the reliability-adjusted product indicator (RAPI) method (see Figure 3) can take into account the measurement errors of the exogenous variables while estimating interaction effects. A key feature of these alternative approaches is the application of a reliability adjustment of each observed composite by constraining the corresponding error variance. Below we first discuss how to impose the error-variance constraint with the use of reliability. We then present examples of applying these reliability adjustments to both LMS and RAPI methods.

The path model for estimating one interaction effect with single predictor variable (X) and single moderator (M). Both X and M are composites from multiple items; XM is the product term of X and M.

The latent moderated structural equations (LMS) method (Klein & Moosbrugger, 2000) for estimating one interaction effect with single predictor variable (X) and single moderator (M). Both X and M are composites from multiple items. The equations for defining the variances of

The reliability adjusted product indicator (RAPI) method for estimating one interaction effect with single predictor variable (X) and single moderator (M). Both X and M are composites from multiple items; XM is the product term of X and M. The equations for defining the variances of
Reliability Adjustment for the Interaction Effect Between Observed Composites
In the classical testing theory (CTT) framework (Crocker & Algina, 1986; Lord & Novick, 1968), score reliability of a composite variable, X, is defined as the proportion of variance in X that can be attributed to the true score. Multiple approaches have been proposed to estimate reliability coefficients under conditions where the true-score variance cannot be directly obtained (Crocker & Algina, 1986). Among these approaches, structural equation modeling (SEM) is one of the techniques that yield more precise estimation of reliability coefficients (Raykov, 1997; Yang & Green, 2010). Let Xi be the ith observed item of a scale measuring the latent construct,
where
where
If information about the individual item is unknown or unavailable (e.g., use of secondary data), one can only use the composite score, X = ΣX
i
, as the single indicator for the latent variable,
given that the only factor loading between X and
Given the reliability coefficient,
Equations (4) and (5) are the key elements in specifying the error variance constraints for the interaction effects under the RAPI method. Note that the discussion is equally applicable to mean composite scores, which is simply a rescaled version of the sum composite score.
Distribution Analytic Approach
Researchers can apply the distribution analytic approach to estimate interaction effects by either the LMS method (Klein & Moosbrugger, 2000) or the quasi-maximum likelihood (QML) method (Klein & Muthén, 2007) under the SEM framework with specific data distributional assumptions. Figure 2 shows the simplest scenario in which a one-indicator predictor composite and a one-indicator moderator composite predict a single outcome. By using Equations (4) and (5) to constrain the error variances of the observed composites according to the corresponding reliability coefficient such as Cronbach’s alpha (Bollen, 1989) or factor structure reliability (Raykov, 1997), one can estimate the latent interaction effect with the observed composite scores via the distribution analytic approach, which takes into account the measurement errors for the observed composites (Figure 2).
Based on Equations (4) and (5),
while
Although this is a very powerful approach, access to both the LMS and QML methods is quite limited. For example, the LMS method is exclusively built into Mplus (Muthén & Muthén, 1998-2013) whereas the QML method is a stand-alone program available only from the developer Andreas Klein (Kwok, Im, Hughes, Wehrly, & West, 2016). Additionally, the overall model chi-square test and the commonly used model fit indices (e.g., comparative fit index [CFI], root mean square error of approximation [RMSEA], and standardized root mean square residual [SRMR]) are not available in these methods.
Reliability-Adjusted Product Indicator Method
Researchers can also create a latent interaction effect factor by having the observed interaction effect term (i.e., the product of the predictor and the moderator) loaded on it (see Figure 3). Similar to the distributional analytic approach, the reliability-adjusted constraints can be directly applied to the exogenous variables (i.e., the predictor X and moderator M) under the RAPI approach, with the use of the same error-variance constraints as presented in Equations (4) and (5).
As for the observed interaction variable, XM, which is the product term of X and M, the variance of this interaction effect can be defined as the following equation (reproduced from Equation A7 in Appendix A), under the assumption of independent measurement errors and double mean-centered variables (Lin et al., 2010):
The procedure to create the double mean centered variable is straightforward. First both X and M are mean-centered, then the product term of the mean-centered X and M are mean-centered. The variance of the observed interaction variable,
The corresponding derivations are described in Appendix A. Accordingly, in Equation (6), we can substitute the measurement error variances and the true-score variances of X and M with their corresponding reliability estimates and observed variances. Hence, the error variance of the latent interaction effect is (Bohrnstedt & Marwell, 1978; Busemeyer & Jones, 1983) as follows:
Equation (8) is the key equation to set up the nonlinear constraint for the error variance of the latent interaction effect when using the RAPI method.
Purpose of the Study
This study compared three methods of examining the interaction effects with observed composite scores to determine the estimation accuracy of the interaction effects. A Monte Carlo simulation study was conducted to compare methods with and without the consideration of measurement errors of the manifest variables. Both the LMS and RAPI methods were compared with the conventional path model. We chose the LMS method because it is currently the only distributional analytic approach that is feasible in a general SEM program (i.e., Mplus).
Method
In this Monte Carlo study, we compared different methods for estimating the magnitude of the interaction effect
where

The pseudo population model for generating simulation data sets.
Monte Carlo Simulation Study
The model shown in Figure 4 was used to generate the population data. The two latent variables,
The items corresponding to
Sample Size, N
Based on the conditions used in past simulation studies (Cham, West, Ma, & Aiken, 2012; Chin, Marcolin, & Newsted, 2003; Lin et al., 2010; Marsh, Wen, & Hau, 2004; Maslowsky, Jager, & Hemken, 2015), we chose 100, 200, and 500 to represent small, medium, and relatively large sample sizes.
Reliability,
We manipulated the reliability,
Interaction Effect,
We manipulated the magnitude of the interaction effect
Mplus 7.11 (Muthén & Muthén, 1998-2013) was used to generate 2,000 data sets for each condition. Given that the data were generated at the item level (i.e., three items per latent factor), we computed the mean composite scores for X and for M by averaging the corresponding items. Hence, we had three new observed composite scores; namely, the two observed composite variables X and M, and the corresponding product (or observed interaction effect) term XM. The data sets were then analyzed by fitting the three methods as shown in Figures 1, 2 and 3, respectively. For all three methods, double-centering strategy (Lin et al., 2010) was applied. Therefore, before analyzing the data using the three methods, X and M were first mean-centered; the product term XM was first computed using the mean-centered X and M and then mean-centered afterward. The annotated Mplus syntax for specifying the models with these three methods is presented in Appendix B.
Path Model
The first method tested was the conventional path model (see Figure 1), with one predictor, one moderator, and the product term predicting one outcome variable. The measurement errors of the manifest exogenous variables were assumed to be zero. The three exogenous variables were allowed to be correlated.
Latent Moderated Structural Equations Method
For the second method, the LMS method, no product indicator was created, as depicted in Figure 2. Instead, a maximum likelihood estimator with robust standard errors using numerical integration was used to estimate the latent interaction effect, based on the information of X and M. The measurement error variances for both X and M were constrained by using Equations (4) and (5). The two latent factors,
Reliability-Adjusted Product Indicator Method
In the RAPI method, we utilized the reliability of each composite to constrain the corresponding measurement error. These non-linear constrains are shown in Figure 3. All the common factor loadings were fixed to 1 for model identification purposes whereas the factor variances were freely estimated. All the latent factors were allowed to be correlated.
Evaluation Criteria
Four criteria were applied to evaluate the performance of the three methods in examining the interaction effects with observed composite scores. The first two criteria, a 95% confidence interval (CI) coverage rate and the standardized bias, were used to evaluate bias—the average difference between the estimator and the true parameter. For the 95% CI coverage, the Wald interval was obtained, with a coverage rate >91% considered acceptable (Muthén & Muthén, 2002). The standardized bias was the ratio of the average raw bias over parameter standard errors. Therefore, the standardized bias can be interpreted in a standard deviation unit, like Cohen’s d. The standardized bias of the latent interaction effect estimates was compared with the cutoff value of 0.40. An absolute value <0.40 was regarded as acceptable (Collins, Schafer, & Kam, 2001).
The third criterion was the relative standard error (SE) bias of the interaction effect estimates; it was designed to evaluate the precision of the interaction estimators. Estimators with smaller relative SE bias show less variability across simulation replications. As recommended by Hoogland and Boomsma (1998), relative SE bias values <10% were considered acceptable.
Finally, the root mean square error (RMSE) was calculated to evaluate both the accuracy and precision of the parameter estimations for the three methods. The smaller the RMSE values, the more accurate the parameter estimations were across the 2,000 replications.
Results
The results of the conventional path model (without considering any measurement errors of the exogenous variables) and the models applying the RAPI and the LMS methods were compared in terms of the 95% CI coverage rate of the interaction effect, the standardized bias, relative standard error bias, and RMSE of the interaction effect estimates. The simulation results for
95% Confidence Interval (CI) Coverage Rate, Standardized Bias, Relative Standard Error (SE) Bias, and Root Mean Square Error (RMSE) for
Note. N = sample size; ρ = reliability estimate; PM = path model; RAPI = reliability-adjusted product indicator method; LMS = latent moderated structural equations method.
Values exceeding the recommended cutoffs are in boldface.
95% Confidence Interval (CI) Coverage Rate, Standardized Bias, Relative Standard Error (SE) Bias, and Root Mean Square Error (RMSE) for
Note. N = sample size; ρ = reliability estimate; PM = path model; RAPI = reliability-adjusted product-indicator method; LMS = latent moderated structural equations method.
Values exceeding the recommended cutoffs are in boldface.
Convergence and Inadmissible Solutions
All the simulation replications were converged without any issues. Only 12 inadmissible solutions occurred with the RAPI method under the condition of non-zero interaction effect (
Coverage of 95% Confidence Interval of
As shown in Table 1, for conditions with interaction effect (
When the interaction effect was nonzero, the conventional path model without taking measurement errors into account generally resulted in lowest coverage rate. For example, as shown in Table 2, coverage rates were considerably low for the conventional path model, with a range from 0% to 79.2%. By comparison, under the same conditions, the coverage rates for the RAPI method continued to range from 93.3% to 97.1%. Similarly, the coverage rates for the LMS method were higher than those for the conventional path model, ranging from 90.0% to 94.6%. In other words, when the true interaction effect existed, the model that did not directly take measurement errors into account (i.e., the conventional path model) had the lowest chance of identifying the true effect.
Standardized Bias of
When the true interaction effect,
When the true interaction effect was not zero (=0.50), the standardized biases of the interaction effects differed for the three methods across simulation conditions. For the conventional path model, substantial underestimations of the interaction effects were observed, with a range from −6.50 to −0.91 across all the conditions. By contrast, interaction effects were slightly overestimated for the RAPI method. These overestimations, however, were still within the acceptable criteria across all conditions. Standardized biases were larger (ranged from 0.16 to 0.30) under the low reliability (.70) condition, compared with those (ranged from 0.03 to 0.07) under the high reliability (.90) condition when using the RAPI method. On the other hand, slightly underestimated interaction effects were found for the LMS method, with standardized biases ranging from −0.13 to −0.07 under the low reliability (.70) condition, and from −0.07 to −0.04 under the high reliability (.90) condition.
Relative Standard Error Bias of
As shown in Table 1, the absolute values of relative SE bias when
When
Root Mean Square Error in Estimating
Generally, the RMSE values decreased as sample size or reliability increased. Under the condition of
On the other hand, different RMSE patterns were observed when
Discussion
Despite the existence of the SEM approach for decades, applied researchers still commonly test interaction effects with the presumably measurement error–free observed composite scores. In this study, we reviewed two alternative methods, namely, the RAPI and the LMS methods, and compared their performance with that of the conventional path model through a Monte Carlo study.
Our simulation results showed a substantial negative standardized bias and considerably low coverage rate when the conventional path model (without adequately taking into account measurement errors of the observed composites) was employed in testing interaction effect. Thus, the interaction effect under the conventional path model is more likely to be underestimated from the true population value when measurement errors are not adequately taken into account in the analysis. These findings reaffirm past research, which has shown biased results due to imperfect (reliability) measurement when testing interaction effects (Dunlap & Kemery, 1988; Evans, 1985; Feucht, 1989). Thus, the conventional path models, which do not adjust for measurement errors of the manifest predictors, are not recommended for testing interaction effects.
On the other hand, the two alternative methods discussed here, namely, the RAPI and LMS methods, can directly adjust the measurement errors of the observed composites by using either the factor structure reliability calculated from the measurement model or the conventional coefficient alpha. The major difference between these two methods is how the interaction effect is specified/captured: RAPI requires the creation of a product indicator for the latent interaction effect, whereas LMS does not. Results from the present study have shown that the RAPI method performed comparably well to the LMS method in estimating the interaction effects. Additionally, when the true interaction effects were nonzero, RAPI yielded slightly overestimated (but still acceptable) coefficients, whereas LMS yielded slightly underestimated coefficients. Hence, the LMS method may be more preferable for applied researchers who aim to be more conservative by preventing overestimated effects.
Both sample size and the magnitude of reliability played important roles in estimating the non-zero interaction effect. The standardized biases became smaller as sample size increased for both RAPI and LMS methods, suggesting that the reliability-adjusted measurement error constraints worked better with larger sample sizes. Reliability had a similar effect on standardized biases. With the same sample size, higher reliability (.90) produced more accurate interaction effect estimates than those from lower reliability (.70). Additionally, the RAPI method yielded less stable estimates than the LMS method under the low reliability and small sample size condition. Hence, the LMS method is more preferable when the exogenous variables are less reliable along with a small sample (e.g., N = 100).
Although our simulation results showed the benefits of controlling for measurement errors when testing interaction effects, this step sometimes comes at the price of increasing variability. For example, comparing four latent interaction modeling approaches, Cham et al. (2012) found that latent variable models can correct for bias but sometimes lose statistical power. When estimating the nonzero interaction effects in our simulation, the relative SE biases of the interaction effects from RAPI and LMS were higher than those from the path model under the high reliability (.90) condition. Given the reciprocal relationship between measurement error and reliability, these results suggest that constraining measurement errors for highly reliable variables may lead to over-correction, especially when the sample size is small. However, if we consider precision and bias together, the RMSE results showed that both the RAPI and LMS methods in general outperformed the conventional path model. Hence, these measurement error adjustment methods are recommended for testing interaction effects with composites, with the recognition that the RAPI method may produce less precise or less accurate estimates than the LMS method under conditions with small sample and less reliable measures.
Practically speaking, there are several situations where researchers will find both the RAPI and LMS methods more preferable than the multiple-item latent factor model in empirical data analyses. For example, if the predictors or the moderators are measured by a large number of items, fitting the hypothesized structural model at the item level may lead to convergence issues due to the complexity of the model.
Another example would be when researchers analyze secondary data and have limited or no access to the original items. As mentioned earlier, the factor structure reliability in SEM is comparable to the conventional internal consistency reliability (i.e., Cronbach’s alpha or coefficient alpha) with tau-equivalent items (i.e., items with equal factor loadings and possibly unequal error variances). Hence, as long as the reliability information of the composites is available, we advocate the use of this information to constrain the error variances for the observed composites and conducting the analyses with either the RAPI or LMS method to obtain interaction effect estimates.
Limitations and Future Research Directions
Two limitations in the present study must be addressed. First, since the interaction effect is the product term of the predictor and moderator, having a low reliability on either or both variables can amplify the measurement error of the interaction effect (Aiken & West, 1991). It is, therefore, worth investigating how changes in the reliability of the interaction term influence the interaction effect estimation. Second, the scope of this study was the traditional single-level interaction effect. Future study is needed to investigate the impact of ignoring measurement errors when testing interaction effect with observed composites under more complex data structures, such as multilevel data.
Conclusions
When examining an interaction effect based on the observed composite scores without properly taking measurement errors into account, the result may be a considerable underestimation in the interaction effect. Thus, we encourage researchers to apply either the LMS or the RAPI method, which can directly take into account the measurement errors in the manifest variables. For researchers who have very limited access to SEM programs, the RAPI model is by far the most feasible way (i.e., can be implemented in most of the SEM programs) to generate unbiased interaction estimates. Moreover, the overall model chi-square test and other commonly used model-fit indices are only available for the RAPI method. On the other hand, the LMS method produces relatively more conservative interaction effect estimates. Additionally, for those who have small data sets (with low sample sizes) or less reliable measures, the LMS method would be more preferable.
Footnotes
Appendix A
Appendix B
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
