Abstract
To assess the direct and indirect effect of an intervention, multilevel 2-1-1 studies with intervention randomized at the upper (class) level and mediator and outcome measured at the lower (student) level are frequently used in educational research. In such studies, the mediation process may flow through the student-level mediator (the within indirect effect) or a class-aggregated mediator (the contextual indirect effect). In this article, we cast mediation analysis within the counterfactual framework and clarify the assumptions that are needed to identify the within and contextual indirect effect. We show that unlike the contextual indirect effect, the within indirect effect can be unbiasedly estimated in linear models in the presence of unmeasured confounders of the mediator–outcome relationship at the upper level that exert additive effects on mediator and outcome. When unmeasured confounding occurs at the individual level, both indirect effects are no longer identified. We propose sensitivity analyses to assess the robustness of the within and contextual indirect effect under lower and upper-level confounding, respectively.
To assess the impact of instructional methods on students’ performance, cluster-randomized trials have become the gold standard in educational psychology. Such designs are useful when entire classes are the focus of the intervention. They allow one, for instances, to estimate the causal relationship between teacher behavior and students’ engagement in learning and achievement. Unraveling this causal relationship may further deepen the understanding into the underlying processes. For example, Eccles’s expectancy-value model of achievement motivation (Wigfield & Eccles, 2000) links instructional style, motivational factors, and achievement, whereby performances are assumed to be influenced by students’ motivational beliefs, which themselves are influenced by students’ perceptions of teachers’ behavior. Mediation analysis allows one to test such theories and can disentangle the effect of a treatment or intervention T on an outcome Y into the effect going through a mediating variable M, called the mediator, and the remaining direct effect.
Multilevel models that account for the hierarchical nature of educational data are typically used to address mediation in cluster-randomized trials (MacKinnon, 2008). When the treatment is randomized at the upper (or second) level, but the mediator and outcome are measured at the lower (or first) level, this multilevel mediation setting is sometimes referred to as a 2-1-1 setting. There is currently much debate on how mediation may operate in this setting. Pituch and Stapleton (2012) nicely summarize the two prevailing opposing views. On the one hand, Preacher and colleagues (Preacher, Zyphur, & Zhang, 2010; Zhang, Zyphur, & Preacher, 2009) argue that “The effect of T on Y must be a strictly Between; because T is constant within a given group, variation in T cannot influence individual differences within a group” (Preacher et al., 2010, p. 210). In other words, they assert that a student-level variable cannot mediate the association between an intervention at the teacher level and an outcome measured at the student level and regard any association between the student-level mediator and outcome as irrelevant. These authors therefore purely focus on mediation processes at the cluster-level (left panel of Figure 1) or cluster-level only mediation. Other scholars (Krull & Mackinnon, 2001; Pituch & Stapleton, 2012; VanderWeele, 2010) argue that a teacher-level treatment may impact a student-level outcome through a student-level mediator. This is referred to as cross-level mediation. These authors allow that the effect of the intervention on the outcome may be mediated not only by a variable at the student level but also through a class-level aggregate of this mediator (right panel of Figure 1). We take the latter view and, in this article, particularly focus on the estimation of the indirect effect through the student-level mediator that will be referred to as the within indirect effect.

(A) The indirect effect in a 2-1-1 setting following the view of Zhang, Zyphur, and Preacher (2009). (B) The indirect effect in a 2-1-1 setting following the view of Pituch and Stapleton (2012).
The rationale for focusing on the within indirect effect can be understood from the illustrating example that we will use throughout this article. In a randomized experimental study, De Naeghel, Van Keer, Vansteenkiste, Haerens, and Aelterman (in press) evaluated the impact of a need-supportive teacher training grounded on self-determination theory. The experimental condition consisted of 12 teachers participating in a teacher professional development workshop (throughout this article referred to as “training”) aimed at providing the knowledge and skills necessary to implement an autonomy-supportive and structuring teaching style, whereas the control condition included 25 teachers who continued their current teaching repertoire. The researchers hypothesized that need-supportive teachers would change the autonomous reading motivation in fifth-grade students, which in turn would change reading frequency. Reading motivation was measured using the self regulation questionnaire (SRQ)-Reading Motivation (De Naeghel, Van Keer, Vansteenkiste, & Rosseel, 2012). To obtain a score for autonomous motivation, the average over the 17 items of this questionnaire, each measured on a 5-point Likert-type scale, was calculated. The reading frequency of students was measured as the average over 4 items on a 4-point Likert-type scale (1 = (almost) never, 2 = sometimes, 3 = often, 4 = (almost) always). For both measures, a pretest and posttest were obtained from 628 fifth-grade students from 37 classes in total. Note that it is an unbalanced design with an average group size of 17 (min = 7, max = 28). We consider for both autonomous reading motivation and reading frequency the change between scores on posttest and pretest. Since the researchers hypothesized that the effect of the need-supportive teaching style on individual reading performance change occurs via the individual autonomous motivation change of each student, we concur with Pituch and Stapleton (2012) that the cluster-level only mediation approach would not be well suited for this example. In the cross-level approach that we will consider, two mediated effects are of interest: the first through the individual change in autonomous motivation of the student and the second through the class-level aggregate of this mediator. This can be understood upon noting that the association between a student-level mediator and the outcome within a class, known as the within-group association, may be different from the association between the class means of the mediator and the outcome, known as the between-group association. This may be due to confounding factors; for example, classes may differ among each other in terms of factors such as socioeconomic status of the students, education level, and so on, which may be related to the outcome too. However, there may also be a difference because the between-group association measures a truly different effect. In many cases, there is thus no reason to assume that the within effects and between effects are similar; the contextual effect is formally defined as the difference between these two effects. While it is clear that in our motivating example, we are primarily interested in the effect of the teacher’s training on the student’s reading frequency through the change in the student’s reading motivation, interest in other settings may lie in the contextual indirect effect (e.g., Nagengast & Marsh, 2012).
In this article, we will clarify the assumptions under which the within and contextual indirect effects in the cross-level approach can be identified. We will therefore cast mediation analysis within the counterfactual framework. This framework has proved its usefulness in making the assumptions needed to identify direct and indirect effects in simple settings with independent observations more explicit. With a few notable exceptions (VanderWeele, 2010; VanderWeele, Hong, Jones, & Brown, 2013), such investigations are mostly absent in multilevel mediation settings as the one described here (Preacher, 2015). One particular complication in this context is interference, which is present when the treatment received by one individual may affect the outcomes of other individuals. This complication is not addressed by most of the literature on causal inference, which relies on an assumption of “no interference” (Tchetgen & VanderWeele, 2012). In particular, within Rubin’s (1976) causal framework, an assumption referred to as the “Stable Unit Treatment Value Assumption” includes a no-interference assumption. Under no interference, there is a single value of each potential outcome associated with each intervention for each student, regardless of how interventions are assigned and of which interventions are received by other students. Rubin (1980) cautioned that this assumption becomes problematic in educational settings, for example, where interventions are given to students who interact with each other in class. In our setting, when students in the same class experience higher autonomous motivation due to a need-supportive teacher, not only the individual change in autonomous motivation but also the changed autonomous motivation as a group may lead to a higher reading frequency in each student. Building on work by Hong and Raudenbush (2006) and VanderWeele, Hong, Jones, and Brown (2013), we present a potential outcome framework for causal inference in which not only one’s own but also the treatment of classmates can affect students’ potential outcome. This framework will enable us to obtain identification results for within and contextual indirect effects.
This article is further organized as follows: We first describe the multilevel approach proposed by Pituch and Stapleton (2012) aiming to estimate the within and contextual indirect effect and contrast it with an approach based on regression of differences (the ROD approach). The latter approach is also referred to as the “fixed effect” approach, the most common approach in the econometrics literature for handling upper-level confounding in panel data (Hausman & Taylor, 1981). Inspired by the work of Raudenbush and Willms (1995), the fixed effect approach has recently been introduced in the educational setting by Castellano, Rabe-Hesketh, and Skrondal (2014) for cross-sectional data. These authors show that this approach is preferable over more common approaches in educational research that estimate contextual effects by including class means of student covariates in addition to student-level covariates, particularly when there are unmeasured common causes of the predictor and outcome at the class level. In this article, we elaborate on their findings in the mediation setting and assess the impact of unmeasured confounding (i.e., unmeasured common causes) of the M-Y relationship at the upper and lower level on the estimation of within and contextual indirect effects. We show that in the presence of unmeasured upper-level M-Y confounders that exert an additive effect on mediator and outcome, the within indirect effect can still be unbiasedly estimated by both approaches in the linear models that we consider. We further derive bias expressions for the Pituch and Stapleton estimator of the contextual indirect effect in the presence of unmeasured upper-level M-Y confounding. In the presence of unmeasured lower-level M-Y confounding, the within indirect effect is no longer identified. We therefore propose a sensitivity analysis within the ROD framework that allows us to explore how the estimated within indirect effect is affected by violations of the no unmeasured lower-level M-Y confounding assumption. When interest lies in the contextual indirect effect, a sensitivity analysis in the Pituch and Stapleton framework can be performed to assess the impact of upper-level M-Y confounding. The ROD approach and the sensitivity analysis are illustrated with our motivating example. For ease of exposition, we focus primarily on simple settings with no interactions (i.e., no treatment mediator, treatment baseline covariate, or mediator baseline covariate interactions) but illustrate later how ideas extend to more complex settings involving such interactions. We end with a discussion.
Identification of the Direct and Indirect Effects in a 2-1-1 Model
The Counterfactual Framework
We first introduce some notation for the observed variables in a randomized 2-1-1 multilevel setting. Assume there is a random sample of K classes with sample sizes n 1, n 2, …, nK . Let T.j denote a binary treatment variable for group j (j = 1, … , K), which takes the value 1 for the experimental condition and 0 for the control condition. Mij and Yij represent the mediator and outcome of student i (i = 1, … , nj ) in group j, respectively. The vector Cij collects all measured baseline covariates at individual and group level and contains information about cluster membership. For example, if measured, the number of years of teacher experience and the distance between the student’s home and the closest public library could be considered as covariate at, respectively, the group and individual level.
To define total, direct, and indirect effects in the 2-1-1 multilevel mediation setting, we will rely on the counterfactual framework of causal inference (Pearl, 2001; Robins & Greenland, 1992). A counterfactual or potential outcome Yij (t) is the outcome that would be observed in individual i (i = 1, … , nj ) of cluster j (j = 1, … , K) if (possibly contrary to the fact) the group-level treatment T.j in cluster j is set to t. Given a binary treatment, each individual has two potential outcomes, Yij (1) and Yij (0). In order to observe at least one of the potential outcomes for each subject, we need to make the consistency assumption for the outcome, which implies that the potential outcome for a given treatment T = t equals the observed outcome for subjects that are exposed to treatment t (VanderWeele, 2008). This assumption holds by design in randomized studies but may be more questionable with less manipulable treatments that are often seen in observational studies. While the individual total causal effect, defined as Yij (1) − Yij (0), cannot be calculated since individuals are only observed under a single intervention, the average total causal effect of treatment E(Yij (1) − Yij (0)) is identified as E(Yij|T.j=1)−E(Yij|T.j=0) in randomized trials; this latter difference can easily be estimated based on the observed data. We further make the intact cluster assumption (Hong & Raudenbush, 2008), which implies that the cluster to which individuals belong does not change because of the interventions at the cluster level. In educational research, this assumption generally holds since the intervention does not lead to reorganization of class groups. Note that for observational studies, one would additionally need to assume that there are no unmeasured confounders for the treatment–outcome relation. Similarly, let Mij (1) and Mij (0) represent the counterfactuals for the mediator under experimental and control condition, which can be used to define the causal effect on the mediator. In a randomized trial, the latter is identified as E(Mij|T.j=1)−E(Mij|T.j=0) when making the consistency assumption for the mediator and can be unbiasedly estimated based on the observed data.
To define direct and indirect effects, we need to introduce additional counterfactuals that are more complex than in the simple setting with independent observations. Spillover effects occur when the exposure of one individual affects the outcomes of other individuals. This phenomenon of spillover, also referred to as “interference,” is common whenever an outcome depends upon social interactions between individuals (VanderWeele, 2015). Similar to VanderWeele et al. (2013), we relax the no-interference assumption at the individual level by explicitly accounting for contextual effects through a function
With the definition of these nested counterfactuals, the average total causal effect can be decomposed into a natural direct effect (NDE) and a natural indirect effect (NIE):
In our example, the NDE reflects the difference in reading frequency change for a need-supportive teacher versus a control teacher, while fixing the autonomous motivation of the student and the average autonomous motivation of his or her classmates to the level that would have been observed under the control condition. A simple way to view this is to note that values change for Y’s first arguments, but not for second and third, implying that Y is influenced by T only directly. The NIE reflects the difference in reading frequency change when the autonomous motivation of the student and the average autonomous motivation of his or her classmates changes from the level that would have been observed with a need-supportive trained teacher to the level that would have been observed in the control group, while fixing the intervention to the experimental condition otherwise. Now the first argument of Y is fixed, but the second and third argument change and hence the NIE reflects all of the effect of the intervention on the outcome going through the individual and classmates’ autonomous motivation. It can be further decomposed into a contextual indirect effect, Equation 1, and a within indirect effect, Equation 2:
The within indirect effect reflects the difference in reading frequency change when the autonomous motivation of the student changes from the level that would have been observed with a need-supportive teacher to the level that would have been observed in the control group, while fixing the intervention to the experimental condition and the average autonomous motivation of his or her classmates to the level that would have been observed with a teacher of the control group. The contextual indirect effect reflects the difference in reading frequency change when the average autonomous motivation of his or her classmates changed from the level that would have been observed with a teacher in the need-supportive group to the level in the control group, while fixing the intervention to the experimental condition and the autonomous motivation of the individual student to the level that would have been observed with a need-supportive trained teacher.
Identification Assumptions
Throughout, we will interpret the causal diagram in Figure 1 (right) as a nonparametric structural equation model with independent errors (Pearl, 2001). Following VanderWeele et al. (2013), Table 1 then provides a summary of different causal assumptions that suffice to identify natural direct and indirect (within and contextual) effects in our setting; note that alternative assumptions may also suffice (Pearl, 2014). We first discuss those assumptions.
Assumptions for the Identification of Causal Effects in the Presence of Contextual Indirect Effects in a 2-1-1 Multilevel Setting (VanderWeele et al., 2013)
Note. For a set of random variables A, B, and D, A ╨ B|D denotes that A is independent of B conditional on D. Cij (i = 1, …, nj ; j = 1, …, K) represents a set of measured baseline or pretreatment covariates, that is, covariates that are not influenced by treatment and contains information about cluster membership.
A1, A2, A3, and A4 encode assumptions about the absence of unmeasured confounders for the relations between treatment, mediator, and/or outcome. They are called ignorability assumptions. Although these assumptions are rarely explicitly stated in educational research literature, it is key to realize that a randomized treatment only implies that Assumptions A1 and A2 hold, but does not guarantee that Assumption A3 is satisfied. One may indeed think of several factors at the student and/or class level that influence both the mediator and the outcome. Conditioning on such factors, contained in a set of baseline covariates C, may make the assumption of no unmeasured confounding for the mediator–outcome relationship more plausible. Assumption A4 is violated if some confounders of the M-Y relationship are affected by treatment. This assumption would for example not hold if training leads to a higher number of reading hours in the class, and the latter influences both the change in reading motivation and the change in reading. Assumption A5 states that information on the counterfactual Mij (t′) provides no information on the counterfactual M−ij (t*) conditional on observed covariates. This assumption will be violated if there is an unmeasured covariate, for instance, learning attitude of an individual student, which influences both the individual reading motivation and the motivation of other students in the class.
If the ignorability Assumptions A1, A2, A3, and A4 hold, the average counterfactual outcome
The expression in Equation 3 can be used to identify the NDE and the NIE.
If in addition Assumption A5 holds, then the average counterfactual outcome
This expression can be used to identify the within indirect effect, Equation 2, and the contextual indirect effect, Equation 1.
Modeling Assumptions
Besides the above causal assumptions, we will also need to make modeling assumptions throughout the article. For ease of exposition, we start by considering the following simple data generating mechanisms for the mediator and outcome:
with M.j equal to the class average of the mediator. We consider the outcome of student i in class j to depend on his or her own mediator but also on the class average M.j, rather than the average of the classmates M−ij . In groups of size 20 or more, M−ij will be typically approximately equal to M.j. Although Mij and M.j are not independent, we will further follow the view of VanderWeele et al. (2013) and replace M−ij by the average of Mij for all individuals in class j, M.j, in the remainder of this article. This does not fundamentally change the estimators that we will discuss later (see the online Appendix B1, available at http://jeb.sagepub.com/supplemental) but simplifies the required calculations.
In Equations 5 and 6, the parameters β0 and θ0 represent the intercepts, β1 and θ1 represent the direct effect of treatment on, respectively, the mediator and outcome, θ2 the effect of the individual mediator on outcome, θ3 the effect of the class mean of the mediator on outcome, and β2 and θ4 the effect of (a vector of) measured baseline covariates (student- and/or class level-specific) on mediator and outcome. uj
,
Note that we allow unmeasured factors uj
such as teacher or class characteristics that influence both Mij
and M.j. For example, if Mij
represents autonomous reading motivation of individual i in class j and M.j the average autonomous motivation over all classmates, it is unlikely that those are independent, given the same teacher training. Furthermore, as it is often unlikely that all common causes of the mediator and outcome at the class level are measured, we will require that
We do, however, assume that
A second and third important modeling assumption implied by the data generating mechanisms in Equations 5 and 6 can be phrased as follows:
and
In other words, we assume homogeneous effects across classes M2 and across individuals in the same class M3, given the observed data. The data generating mechanisms in Equations 5 and 6 do not allow for such moderation of the treatment effect by baseline covariates (neither at the student nor at the class level) for both mediator and outcome, as well as moderation of the mediator–outcome relationship, and assumes constant effects for all students in all classes. The fourth modeling assumption is:
For ease of exposition, we start from this simple setting without interactions, but later we show how to allow for such moderated mediation.
Derivation of Direct and Indirect Effects in Linear Models
We now derive the natural direct and indirect effects under the linear models in Equations 5 and 6 that obey Modeling Assumptions M1–M4. With a slight abuse of notation, let Cij
include uj and
while the NIE equals
Interestingly, the direct and indirect effects do not depend on uj
and
and the natural within indirect effect:
More details on these derivations can be found in the online Appendix B2 (available at http://jeb.sagepub.com/supplemental). Again, none of these conditional effects depend on uj
or
Methods for Multilevel Mediation Analysis
The Method of Pituch and Stapleton
The method of Pituch and Stapleton (2012), further abbreviated as PS method, considers two separate multilevel models:
with
As shown in the right panel of Figure 1 (leaving out Cij
for clarity), the PS method aims to distinguish between the contextual indirect effect (estimated as
To unbiasedly estimate the contextual indirect and the direct effect separately, as one aims to do with the PS method, we thus require the strong assumption that
The ROD Method
We here propose an alternative strategy to the PS method that effectively eliminates unmeasured upper-level confounding by applying cluster-mean centering (Castellano, Rabe-Hesketh, & Skrondal, 2014). Consider the data generating models in Equations 5 and 6, without the assumption that uj
and
Hence, by regressing the group-centered outcome (Yij−Y.j) on the group-centered mediator (Mij−M.j) and baseline confounders (Cij−C.j), that is,
unmeasured confounding at the group level is elegantly canceled out. This strategy thus identifies θ2; even if
Next, along the lines of Goetgeluk, Vansteelandt, and Goetghebeur (2008) and Loeys, Moerkerke, Raes, Rosseel, and Vansteelandt (2014), one may obtain an estimator of the effect of the treatment on the outcome not going through this individual mediator (Figure 2) by removing the effect of the individual mediator on the outcome and regressing this new outcome on the treatment:

Second step of the regression of difference method: by subtracting δWIE Mij of Yij we take away the edge between Mij and Yij .
An estimator for δDE, denoted as
Since one fits Equations 9 and 11 consecutively, standard errors for
The ROD method described above closely corresponds to the two steps of the Hausman–Taylor (HT) estimator introduced by Castellano et al. (2014) in the educational setting. Similar to our procedure but not specifically for a mediation context, the idea is to first eliminate the upper-level confounding by estimating the within effect of interest with a fixed effect estimator (Step 1) and next to consider the between-level effect that is appropriately adjusted for this within effect (Step 2). We refer the interested reader to appendix A of Castellano et al. (2014) for more technical details. The procedure for the HT estimator also produces valid standard errors. In the four steps of the procedure of the HT estimator, one obtains naive estimators in the first two steps, but next both parameters are reestimated simultaneously in the next two steps, so that the uncertainty in the estimate of δWIE is not ignored when estimating the standard error of
Unmeasured Confounding of the M–Y Relationship
In this section, we obtain expressions for the bias of different estimators in the different methods under lower and upper-level confounding of the M-Y relationship. To this end, we consider an alternative formulation for the data generating mechanism models in Equations 5 and 6:
with now both
We first assess the bias of the estimator for δWIE in the model in Equation 10 of the ROD approach under data generating mechanisms in Equations 12 and 13. Replacing rij
in Equation 13 by
The ROD approach thus yields:
and δWIE in the ROD approach will reflect the effect θ2 + ρ under models in Equations 12 and 13 and hence, there is no bias under upper-level M-Y confounding (η ≠ 0).
We next develop a sensitivity analysis in the ROD framework that enables analysts to investigate the robustness of the within indirect effect estimator
where
Upon noting that
where δWIE can be replaced by
In practice, we proceed as follows. We first estimate δWIE,
Next we consider the PS approach. In balanced designs, the estimators for the within indirect effect and the combined direct effect are mathematically equivalent to those obtained from the ROD method. In unbalanced designs, the within indirect effect is still mathematically equivalent but no longer the combined direct effect. This follows from the fact that the generalized least squares (GLS) estimator and OLS estimator for fixed effect parameters from linear mixed models are identical in balanced designs (Searle, Casella, & McCulloch, 2006, p. 160; Zyskind & Martin, 1969). Closed-form expressions for the GLS estimators from a linear mixed model with only a random intercept in unbalanced designs reveal that the GLS estimator for within-cluster effects (i.e., effects for cluster mean–centered predictors) is equal to the OLS estimator in unbalanced designs but no longer for between-cluster effects (Demidenko, 2013, Zyskind, 1967). The PS approach further disentangles the combined direct effect into the direct and contextual indirect effect but yields biased estimators for those effects in the presence of unmeasured upper-level M-Y confounding (under Assumption A3.B). We now quantify this bias. Assuming Equations 12 and 13 with η ≠ 0 but ρ = 0, the following expressions for the large sample bias of the PS estimators for θ1 and θ3 based on Equations 7 and 8 can be derived (see the online Appendix B5, available at http://jeb.sagepub.com/supplemental):
These expressions are derived under the assumption of the same number of students within each class, but the qualitative findings also hold as the number of students varies between classes. It is interesting to note that since the effect of the treatment on the mediator (i.e., β1 under the data generating mechanisms in Equations 5 and 6) is still unbiasedly estimated by
Simulation Study
Using simulations, we assess the finite sample performance of the estimators of the direct and indirect effect estimators using the PS method and the ROD method and contrast the empirical bias under upper- or lower-level M-Y confounding with the above-derived bias expressions. We also compare the performance of six approaches to estimate the imprecision of the estimated within indirect effect and the combined direct effect. More specifically, we assess the empirical coverage of the 95% CIs of these effects using the naive approach, HT approach, the sandwich estimator, the NPBS-C, and the PPBS-R.
Using realistic assumptions about the number of classes and students, we explore four different scenarios using data generating mechanisms in Equations 12 and 13:
no M-Y confounding and a contextual indirect effect (η = 0, ρ = 0, and θ3 = 0.3).
upper-level M-Y confounding but no contextual indirect effect (η = 0.2, ρ = 0, and θ3 = 0).
upper-level M-Y confounding and contextual indirect effect (η = 0.2, ρ = 0, and θ3 = 0.2).
upper- and lower level M-Y confounding and contextual indirect effect (η = 0.2, ρ = 0.4, and θ3 = 0.2).
The following parameter choices were further made. The intraclass correlation coefficient for generating the mediator and the outcome in the clusters equals 0.2, β1 = 0.1, θ1 = 0.2, and θ2 = 0.2. Sample size and group size vary as in Pituch and Stapleton (2012): 40 groups (gr = 40)/20 individuals (ind = 20), gr = 40, ind = 10 and gr = 20, ind = 20 and gr = 20, ind = 10. Note that similar results are obtained for unbalanced designs but not reported here.
Interest lies in estimation of the following effects: the direct effect θ1, the within indirect effect β1θ2, the contextual indirect effect β1θ3, and the combined direct effect θ1 + β1θ3.
Those four effects are estimated using the PS method (2012), while only the second and fourth are estimated using the ROD method. Each setting was repeated 1,000 times. R Version 3.1 was used to simulate and analyze the data (R Core Team 2015).
Results
Results are summarized in Table 2. Under setting (A), the PS method yields unbiased estimators of the direct
Results of the Simulation Studies When (A) no M-Y confounding is present, (B) upper-level M-Y Confounding Is Present but No Contextual Indirect Effect, (C) Upper-Level M-Y Confounding and Contextual Indirect Effect Are Present, and (D) Upper- And Lower-Level M-Y Confounding and Contextual Indirect Effect Are Present
Note. Mean estimates over 1,000 simulations are presented. Bias (denoted as Bxxx ) is the expected bias based on the formulas in Equations 17, 18, and 14. ROD = regression of difference; PS = Pituch and Stapleton.
Under setting (B), the ROD method leads to unbiased estimators for the within indirect effect and the direct effect, that is,
Under setting (C), the estimator
Under setting (D), all parameters of interest are estimated with bias under both approaches. We find that the bias for the estimators obtained by the PS method can be calculated as the sum of the bias formulas under upper- and lower-level confounding, that is,
So far, we solely focused on the bias of the estimators from the ROD and PS methods. Next, we explore the coverages of the 95% CI of the ROD estimators and compare five different methods. Table 3 shows the mean of the estimated standard errors and the coverage of the 95% CI for δWIE under those five approaches. Scenario (D) was dropped, as the estimator for θ2 is biased in this setting. The NPBS-C (using 2,000 bootstraps) tends to underestimate the variability of
Results of the Simulation Studies When (A) No M-Y Confounding Is Present, (B) Upper-Level M-Y Confounding Is Present but No Contextual Indirect Effect, and (C) Upper-Level M-Y Confounding and Contextual Indirect Effect Are Present
Note. The fourth column shows the empirical standard error of
Results of the Simulation Studies When (A) No M-Y Confounding Is Present, (B) Upper-Level M-Y Confounding Is Present but No Contextual Indirect Effect, and (C) Upper-Level M-Y Confounding and Contextual Indirect Effect Are Present
Note. The fourth column shows the empirical standard error of
Illustration
We now illustrate the ROD method and the sensitivity analysis with our motivating example. Interest lies in the effect of a need-supportive teaching style on change in reading frequency, and the extent to which this effect can be explained by a change in the students’ autonomous reading motivation. Before we start the data analysis, we discuss the plausibility of our causal assumptions. Assumptions A1 and A2 hold by randomization. If one is willing to assume that the effect of variables that affect both the reading frequency and the autonomous reading motivation (such as native language) is constant over time (i.e., at the pre- and postmeeting), such variables are no longer associated with the change in reading frequency and the change in autonomous reading motivation. By looking at the change, one can thus allow for unmeasured confounders that have a time-constant effect. However, if there are additional unmeasured covariates which have a different effect on pre- and postmeasurements of mediator and outcome, Assumption A3 would still be violated. Assumption A4 requires that there are no variables (such as the number of reading hours in the class) that depend on the need-supportive teaching style and that are associated with the change in reading frequency and the change in autonomous reading motivation, while Assumption A5 requires that the potential motivation of an individual student under some exposure is independent from the potential motivation of his or her friends under an assumed exposure in the same class. While both assumptions may not hold unconditionally, it is reasonable to assume that both assumptions may still hold conditional on unobserved class characteristics (such as the atmosphere in the class). That is sufficient to still identify the within indirect effect, provided that additive effects of these unmeasured class characteristics can be assumed. Further, it should be noted that schools are geographically separated but that in some schools, more than one class participated in this study, which may imply that the “no interference between classes” assumption may not fully hold. If part of the effect of the intervention on reading frequency is mediated by the reading motivation in other classrooms (hereby assuming an additive effect), but we ignore this and consider only the effect mediated through a child’s own classroom, then the direct effect will capture both the true NDE and the between-classes spillover-mediated effect (VanderWeele et al., 2013). As the ROD method only aims to disentangle the within indirect effect and the remaining “combined” direct effect, it is not impacted by violation of the no interference between classes assumption.
The total effect of the intervention on reading frequency, obtained by using a linear mixed model for the change in reading frequency with a fixed effect for intervention and a random intercept for each class, equals 0.093 (95% CI [−0.032, 0.217]). In other words, students of the treatment group have a change in reading frequency which lies 0.093 units above the change in the control group. Given that reading frequency is measured on a 4-point Likert-type scale, this difference is rather small and corresponds to approximately a 0.16 standard deviation increase. Interestingly, a −0.079 decrease and a 0.014 increase in reading frequency were observed in the control and intervention condition, respectively. In the traditional causal steps approach of the Baron and Kenny (1986) framework for mediation, a significant test for the total effect is considered a prerequisite for the test for the indirect effect. However, several scholars (Loeys, Moerkerke, & Vansteelandt, 2015; Shrout & Bolger, 2002) have argued that when primary interest lies in the mediation effect, the need for a significant total effect may be abandoned. We therefore continue to disentangle this nonsignificant total effect. Analyzing the data using the ROD method leads to a significant within indirect effect of training on reading frequency through autonomous reading motivation (0.048, 95% CI [0.010, 0.086]). Further, the estimate for the combined direct effect (δDE) is 0.042 (95% CI [−0.085, 0.170]) and is statistically nonsignificant (95% CI for the within indirect effect and the combined direct effect are based on the sandwich estimator). These estimates indicate that treatment leads to a higher change in reading frequency through the change of individual autonomous reading motivation (effect of 0.048) and to a higher change in reading frequency through the combination of all other paths (effect of 0.042). Again, these effects are rather small, given the 4-point Likert-type scale of reading frequency. We also analyze the data with the PS method. We find the same estimate and CI for the within indirect effect of training on reading frequency (0.048, 95% CI [0.012, 0.089], based on PPBS-R). The contextual indirect effect (0.043, 95% CI [−0.005, 0.119]) and the direct effect (−0.002, 95% CI [−0.120, 0.116]) are not statistically significant. It should be noted that these estimated effects rely on the absence of upper-level M-Y confounding. If we calculate the combined direct effect as the sum of the two latter effects, we obtain a similar but not identical estimate as the one obtained by the ROD method (0.042, 95% CI [−0.080, 0.153]).
While the ROD method and the PS method yield an unbiased estimator for the within indirect effect with unmeasured M-Y confounding at the class level (Assumption A3.B), this no longer holds in the presence of such confounding at the student level. As there might still be unmeasured common causes of change in autonomous reading motivation and change in reading frequency at the individual level, over and beyond such common causes at the group level, we perform a sensitivity analysis as outlined in the previous section. Figure 3 presents the results of the sensitivity analysis. For negative values of ρ*, the within indirect effect remains statistically significant. However, in this study, negative values are less realistic, given that one may expect that unmeasured common causes would impact change in autonomous reading motivation and change in reading frequency in the same direction. For instance, the distance between home and closest public library, which was unmeasured, may have an impact on reading motivation and on reading frequency. For positive values of ρ*, the indirect effect decreases with increasing ρ* becomes nonsignificant for ρ* > 0.23 and equals 0 when ρ* = 0.29. For very large values of ρ*, the indirect effect is again statistically significant but negative. However, we expect ρ* to be relatively small in our study. Indeed, as noted before, by studying the change in autonomous motivation and reading frequency rather than the posttest measurements, effects of potential confounders for autonomous motivation and reading frequency that are constant over time are canceled out. As an example, consider parent commitment, which was measured in this study. While parent commitment is strongly correlated with posttest autonomous motivation and reading frequency (correlation = .33 and .48, respectively), it is weakly correlated with the change from baseline in those measures (correlation = .09 and .14). The latter correlations imply a sensitivity parameter value equal to .01, and an analysis that explicitly adjusted for parent commitment indeed had a negligible impact on the within indirect effect. Although we cannot preclude confounding of the relation between change in autonomous motivation and change in reading frequency at the individual level, we thus conclude that for (expected) small positive correlations, the positive indirect effect is still significant but for (rather unexpected) moderate positive correlations, it would no longer be statistically significant.

Effect of teacher training on reading frequency that is mediated by autonomous reading motivation and the individual level (with corresponding 95% confidence interval) for a range of different sensitivity parameters, ρ*.
Estimation of Indirect Effects in the Presence of Interactions
In practice, it is not always realistic to assume that there are no interactions between treatment, mediator, and/or covariates. For illustrative purposes, we will focus here on a setting with a treatment–mediator interaction (which implies that Modeling Assumption M4 is no longer necessary), and instead of models in Equations 5 and 6 consider the following set of data generating models (with Cij dropped for clarity):
with ∊
ij
╨ rij
, but not necessarily
Counterfactual Definitions of Causal Effects in the Presence of T-M Interactions in a 2-1-1 Multilevel Setting
Upon noting that under the model in Equation 20:
it is easy to see that using the ROD approach, whereby one regresses the group-centered outcomes on the group-centered mediators and its interaction with treatment, one can easily obtain unbiased estimators of the pure within indirect effect and the interactive within-mediated effect. Denoting those estimators by
Using similar arguments as before, it can be shown that
Discussion and Conclusion
In this article, we have clarified under which assumptions causal direct, within, and contextual indirect effects can be identified in the 2-1-1 setting. When focusing on the linear setting in particular, we have shown that the within indirect effect can still be identified when additionally allowing for unmeasured common causes of mediator and/or outcome at the class level that have an additive effect. This result extends the findings of Castellano et al. (2014) to the mediation setting. Similar to their fixed effects approach, we proposed the ROD method as an easy-to-implement strategy to estimate the within indirect effect under those assumptions. The second step of our approach also straightforwardly yields an unbiased estimator of the “combined direct effect,” which we defined as the sum of the direct effect and the contextual indirect effect. While neither of the latter two can be identified under upper-level M-Y confounding, their sum is identified under our set of assumptions. In contrast, the PS method aims to estimate those two effects separately, but as shown in this article, stronger assumptions are required to this end. As the within indirect effect can no longer be identified under unmeasured lower-level M-Y confounding, we introduced a sensitivity analysis that allows analysts to assess the robustness of that effect against the presence of such confounders. Similarly, a sensitivity analysis can be used to check the robustness of estimates for the contextual indirect effect of the PS approach in the presence of upper-level confounding. Of note, the ROD approach and the PS approach yield mathematically equivalent estimators for the within indirect effect and the combined direct effect in balanced designs. In unbalanced designs, only the within indirect effect is exactly the same, but simulations (results not shown) revealed negligible differences for the combined direct effect estimators of the two approaches when the linearity and normality assumptions hold.
We have restricted our study to the linear multilevel setting with continuous mediators and outcomes. This type of model is most common in educational research. Mediation analysis from a causal perspective in the more general multilevel setting has received little attention so far in the literature (Preacher, 2015). One of the rare exceptions is the R mediation package (Tingley, Yamamoto, Hirose, Keele, & Imai, 2014). This approach relies on the mediation formula (Imai, Keele, & Tingley, 2010) and will yield unbiased estimators for the direct effect and within indirect effects under the identification Assumptions A1 through A5 outlined in this article when models for mediator and outcome are correctly specified but may suffer from the same weaknesses as the PS method under unmeasured upper-level M-Y confounding (i.e., under Assumptions 3.B). Moreover, the easy-to-implement sensitivity analysis assessing the robustness against unmeasured M-Y confounding that can be performed with the mediation package in the simple setting with independent observations is not yet available in the hierarchical setting. Throughout, we have primarily focused on models with no interactions but gave an introduction on how the proposed ROD method can be elegantly extended to allow for M-Y interactions for example. Researchers should be aware that properties and interpretation of the different estimators rely on the assumption that models are correctly specified. Throughout the article, we assume for example that, conditional on covariates, causal effects are constant across students. When interactions are omitted, for instance, between treatment and covariates, the obtained effects are still indicative as average effects across covariate levels (Liu & Gustafson, 2008), provided that appropriate adjustment for confounders of the M-Y relationship is made.
Another important assumption made in this article is the absence of measurement error. It is well known that, similar to unmeasured confounding, measurement error in the mediator and/or outcome may introduce bias in the direct and indirect effect estimators (Cole & Preacher, 2014; Hoyle & Kenny, 1999; Ledgerwood & Shrout, 2011). Multilevel structural equation modeling (MSEM) offers much flexibility in terms of including latent variables and accounting for measurement error for upper-level covariates (Preacher, Zhang, & Zyphur, 2011). As Lüdtke et al. (2008) pointed out, the appropriateness of multilevel latent covariate or multilevel manifest covariate (MMC) approaches, which treat class-level constructs as latent and observed, respectively, depends on the research question and the nature of the upper-level construct under study. For example, if all students within a class are asked to rate the autonomous motivation of their class as a whole or his or her own autonomous motivation, respectively, the aggregated upper-level construct may be considered as a reflective or formative construct, respectively. Treating the class average motivation as observed, the MMC approach may result in biased estimates of contextual effects for the former. More importantly though, the within effect that we are particularly interested in can still be unbiasedly estimated for either type of the two constructs. Note that throughout the simulation studies, we did not analyze the data with MSEM, because under the scenario of no measurement error the method yields on average the same estimates as the approach of PS, but with more imprecision (Lüdtke et al., 2008).
Finally, in the cluster randomized trials discussed here, only two levels were considered: the student and the class level. However, in a study where schools are randomly assigned to instructional methods, multiple teachers within a given school may be involved. If the student outcome is of interest and each teacher is responsible for a single class, three levels would have to be considered: student, class, and school. While methods are available for the analysis of such three-level data (Pituch, Murphy, & Tate, 2009; Preacher, 2011), the identification assumptions for the different causal effects of interest in such setting remain to be clarified.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The computational resources (Stevin Supercomputer Infrastructure) and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by Ghent University, the Hercules Foundation and the Flemish Government - department EWI. The authors would like to thank the Flemish Research Council (FWO) (Grant G.0111.12).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
