Abstract
The increasing availability of data from multisite randomized trials provides a potential opportunity to use instrumental variables (IV) methods to study the effects of multiple hypothesized mediators of the effect of a treatment. We derive nine assumptions needed to identify the effects of multiple mediators when using site-by-treatment interactions to generate multiple instruments. Three of these assumptions are unique to the multiple-site, multiple-mediator case: (1) the assumption that the mediators act in parallel (no mediator affects another mediator); (2) the assumption that the site-average effect of the treatment on each mediator is independent of the site-average effect of each mediator on the outcome; and (3) the assumption that the site-by-compliance matrix has sufficient rank. The first two of these assumptions are nontrivial and cannot be empirically verified, suggesting that multiple-site, multiple-mediator IV models must be justified by strong theory.
Introduction
In canonical applications of the instrumental variable method, exogenously determined exposure to an instrument induces exposure to a treatment condition which in turn causes a change in a later outcome. A crucial assumption known as the exclusion restriction is that the hypothesized instrument can influence the outcome only through its influence on exposure to the treatment of interest (Heckman and Robb 1985b; Imbens and Angrist 1994). It may be the case, however, that an instrument affects the outcome through multiple treatments, in which case a single instrument will not suffice to identify the causal effects of interest.
To cope with this problem, analysts have recently exploited the fact that a causal process is often replicated across multiple sites, generating the possibility of multiple instruments in the form of site-by-instrument interactions. These multiple instruments can, in principle, enable the investigator to identify the impact of multiple processes regarded as the mediators of the effect of an instrument. Kling, Liebman, and Katz (2007), for example, used random assignment in the Moving to Opportunity (MTO) study as an instrument to estimate the impact of neighborhood poverty (NP) on health, social behavior, education, and economic self-sufficiency of adolescents and adults. Reasoning that the instrument might affect outcomes through mechanisms other than NP, they control for a second mediator, use of the randomized treatment voucher. To do so, they capitalize on the replication of the MTO experiment in five cities, generating 10 instruments (site-by-randomization interactions) 1 to identify the impact of the two mediators of interest, NP and experimental compliance. Using a similar strategy, Duncan, Morris, and Rodrigues (2011) use data from 16 implementations of welfare-to-work experiments to identify the impact of family income, average hours worked, and receipt of welfare as mediators.
Clearly, this strategy for generating multiple instruments has potentially great appeal in research on causal effects in social science. For example, Spybrook (2008) found that, among 75 large-scale experiments funded by the U.S. Institute of Education Sciences over the past decade, the majority were multisite trials in which randomization occurred within sites. In principle, these data could yield a wealth of new knowledge about causal effects in education policy. It is essential, however, that researchers understand the assumptions required to pursue this strategy successfully. To date, we know of no complete account of these assumptions.
Our purpose therefore is to clarify the assumptions that must be met if this “multiple-site, multiple-mediator” instrumental variables (hereafter, MSMM-IV) strategy is to identify the average treatment effects (ATE) in the populations of interest. For simplicity of exposition and corresponding to the applications of MSMM-IV to date, we consider the case where a single instrument (which we denote as
We begin by delineating the assumptions required for identification in the case of a single instrument and a single mediator within a single-site study. We describe the assumptions needed to identify the “local average treatment effect” (LATE) described by Angrist, Imbens, and Rubin (1996) and the (slightly different) assumptions needed to identify the average treatment effect (ATE) among the population. Additionally, we consider the general case where both the instrument and the mediator may be continuous or multivalued.
Following a discussion of the single-site, single-mediator case, we then turn our attention to the case of primary interest: the MSMM-IV design. We specify a set of nine assumptions required for the MSMM-IV model to identify the ATEs of the mediators, three of which are specific to the MSMM-IV case, and which we discuss in some detail.
The Single-Site, Single-Mediator Case
Notation
Suppose that each participant in a single-site study is exposed to a treatment T taking on values in the domain
Note that our terminology and notation differ here from those in standard econometric discussions of instrumental variables (IV). In the econometric tradition, an instrument Z is used to identify the effect of a treatment T on an outcome Y. In this tradition, the reduced form effect of Z on Y is often not of substantive interest; rather, Z is of interest to the econometrician largely because it may be “instrumental” in identifying the effect of T on Y. In our terminology, however, assignment to a treatment T (such an intervention or policy condition) is used as an instrument to identify the effect of mediator M on an outcome Y. Our terminology derives from the program evaluation tradition, in which both the reduced-form effect of T on Y and the effects of the mediators through which T may operate are of interest. Throughout the remainder of this article, we shall use T to denote a treatment assignment condition that is used as an instrument, and we shall use M to denote an experienced mediator condition.
Figure 1 summarizes our notation. We refer to the effect of T on M as the “compliance;” the person-specific compliance is denoted

Mediated and reduced-form effects of T on Y.
Identifying Assumptions
In order to define a set of causal estimands of interest, we first require the assumption that an individual’s potential outcomes depend only on the treatment condition and mediator condition to which that particular individual is exposed (and not on the treatment and mediator conditions of others), known as the Stable Unit Treatment Value Assumption (SUTVA; Rubin 1986). In the standard potential outcomes framework, we typically require a single SUTVA assumption stating that one individual’s potential outcomes do not depend on others’ treatment status. In the IV model, however, the presence of three variables of interest—the treatment T, a mediator M, and an outcome Y—necessitates a pair of such assumptions (Angrist et al. 1996), stated formally below.
Assumption (i): SUTVA: Each unit i has one and only one potential value of the mediator M for each treatment condition t: in particular, for a population of size N, Each unit i has one and only one potential outcome value of Y for each pair of values of treatment condition t and mediator value m: in particular, for a population of size N,
Given the SUTVA assumptions, we can represent the potential outcome Y for a participant who experiences treatment t and mediator value
Our second assumption is that T affects Y only through its impact on the mediator M. This is the standard exclusion restriction assumption:
Assumption (ii): Exclusion restriction:
The exclusion restriction combined with the second SUTVA Assumption (i.b) implies a third SUTVA condition: (i.c) Each unit i has one and only one potential outcome value of Y for each value of the mediator m: in particular, for a population of size N,
The SUTVA assumptions are necessary in order to define the causal estimands of interest. If the treatment variable is binary, for example, the first SUTVA Assumption (i.a) implies that we can define the person-specific casual effect of the treatment on M as
Assumption (iii): Person-specific linearity of the mediator M in T: the person-specific effect of T on mediator M is linear. That is,
Likewise, it will be useful to assume that the person-specific effect of M on Y is linear in M. This is a standard, if not unproblematic, assumption in IV models. In this case, the third SUTVA condition (i.c) implies that we can define the person-specific casual effect of the mediator Y as
Assumption (iv): Person-specific linearity in m: the person-specific effect of the mediator
The combination of (ii), (iii), and (iv) implies that the person-specific effect of T on Y is linear in T:
Thus, defining B as the person-specific effect of T on Y, we can relate the person-specific effects of T on M and of M on Y to the person-specific effect of T on Y by:
The population average ITT effect of interest here is
Assumption (v): Ignorable treatment assignment:
Likewise, Assumption (v) enables us to estimate
Assumption (vi): Effectiveness of the instrument:
In the simple case in which we have a single instrument and a single mediator, the target of the IV estimator is the ratio of the ITT effect to the average compliance:
Equation (3) may be regarded as defining a “compliance-weighted average treatment effect” (CWATE) because each person’s treatment effect Δ is weighted by his or her compliance, Γ. This is a rather unsatifying estimand, as we are typically interested in estimating δ, the ATE, rather than a weighted ATE, particularly where the weights are some unobservable and instrument-specific set of Γ’s (Heckman and Robb 1985a, 1986; Heckman, Urzua, and Vytlacil 2006).
There are two different solutions to this problem that yield a well-defined estimand. First, we can simply assume:
Assumption (vii a): No person-specific compliance-effect covariance:
in which case equation (3) identifies the population ATE as
In the case where both T and M are binary, we can adopt an alternative assumption that may be more tenable than (vii.a). In this case, Angrist et al. (1996) note that Γ can take on only three possible values: Γ = 1 for those for whom the instrument T determines their mediator value (“compliers”); Γ = 0 for those for whom the instrument does not affect the mediator (“always-takers” and “never-takers”); or
Assumption (vii.b): No defiers (or "montonicity"):
Under this assumption, we can simplify the expression for the CWATE in equation (3) to
where
Summary of Single-Site, Single-Mediator IV Assumptions
Approaching the IV model from a potential outcomes framework is particularly useful when we allow mediator effects to be heterogeneous. After imposing Assumptions (i)–(vi); SUTVA, exclusion restriction, linearity, instrument effectiveness, and ignorable treatment assignment), this framework reveals the importance of either (vii.a), the no-compliance-effect-covariance assumption, or (vii.b) the no-defiers assumption. If both of these assumptions fail, the IV estimand is a CWATE: those persons whose mediator is most affected by the instrument will be assigned the greatest weight in the estimand.
The IV Model with Multiple Sites and Multiple Mediators
In the single-site, single-mediator case, our challenge was to derive assumptions that define the ATE (δ) or the LATE (
Six of our assumptions are straightforward extensions of the assumptions derived above in the single-site case, single-mediator case. These include SUTVA, the exclusion restriction, the two linearity assumptions, the assumption of ignorable assignment to T, and either a no compliance-effect covariance assumption (to identify ATE) or a “no defiers” assumption in the binary treatment, binary mediator case (to identify LATE). The assumption of nonzero average compliance that was needed in the single-site case is generalized to the assumption that there exists a full column rank site-by-compliance matrix, literally a design matrix within a multiple regression framework. Standard requirements of regression then generate two additional assumptions: an assumption that one mediator does not affect another and an assumption of independence among the site-level compliances and site-level causal effects. These assumptions are described below.
We first assume that both SUTVA assumptions hold (i.a and i.b) with respect to the vector of P mediators:
Assumption (i): SUTVA: Each unit i has one and only one potential value of the vector of mediators Each unit i has one and only one potential outcome value of Y for each treatment condition t and each vector of mediator values
We next assume that assignment to T influences Y only through the list of P distinct and observable mediators
Assumption (ii): Exclusion restriction: The treatment T affects Y only through its impact on the set of P mediators,
As above, we also assume person-specific linearity of each M in T (iii) and person-specific linearity of Y in each of the mediators (iv). Specifically, we assume that the outcome Y is a linear function of the mediators and that there are no interactions among the mediators.
Assumption (iii): Person-specific linearity of each mediator in T: the person-specific effect of T on each mediator
Assumption (iv): Person-specific linearity of Y in
These imply, respectively, that the person-specific causal effect of T on
We next assume that assignment to T does not influence a given mediator
Assumption (v): Parallel mediators:
Together, the five assumptions above define the person-specific ITT effect as:
Equation (5) says that the person-specific effect of T on Y can be written as the sum of the products of the person-specific effects of T on each mediator and the person-specific effects of that mediator on the Y (we discuss the implications of a failure of the parallel mediator assumption in the Discussion section below). Taking the expectation of equation (5) over the population within a site s yields:
As in the single-site case, we shall need unbiased estimates of the average compliances and ITT effects within each site. Letting K denote the number of sites, we invoke:
Assumption (vi): Ignorable within-site treatment assignment: The assignment of the instrument T must be independent of the potential outcomes within each site:
As in the single-site case, it will next be useful to make either a set of no-compliance-effect covariance assumptions, analogous to (vii.a), or a set of “no defiers” assumptions, analogous to (vii.b). The assumptions made here determine whether the model identifies the ATE or the CATE.
First, if we wish to identify the ATEs of the mediators, we may make the assumption that there is no within-site covariance between
Assumption (vii.a): No within-site compliance-effect covariance:
Alternatively, in the case where T and each of the mediators M 1, M 2, ..., M p are binary and we wish to identify LATE, we invoke:
Assumption (vii.b): No defiers (or "montonicity"):
Either of these two assumptions, in combination with Assumptions (i–vi) generates a multiple regression equation in which an estimable site-average ITT effect
where
If, in contrast, we have a binary M and seek to estimate LATE, we invoke Assumption (vii.b), generating a multiple regression equation of exactly the same form. Specifically, we can write equation (6) as:
where
Equations (7) and (8) use the same outcome
Assumption (viii): Site-by-mediator compliance matrix has sufficient rank. In particular, if The compliance of at least There are at least as many sites as mediators: There is some subset of Q site-specific compliance vectors,
The sufficient rank assumption is a generalization of the familiar instrument effectiveness assumption (Assumption [vi] in the first section). Note that when there is a single mediator (P = 1), the site-by-mediator compliance matrix will have rank 1, so long as
Our final assumption requires that the error term
Assumption (ix.a): Between-site compliance-effect independence: The site-average compliance of each mediator is independent of the site-average effect of each mediator. That is,
Likewise, to identify the LATEs, we assume:
Assumption (ix.b): Between-site compliance-effect independence: The site-average compliance of each mediator is independent of the site complier average effect of each mediator. That is,
Under Assumption (ix.a), we can write the expected value of the error
By the same logic, Assumption (ix.b) implies that the expected value of the error term
Note that Assumptions (ix.a) and (ix.b) are each stronger than an assumption of no between-site compliance-effect covariance (the latter requires only no linear association between compliance and effect; the former requires no association whatsoever). Moreover, note that Assumptions (ix.a) and (ix.b) require not only that there be no compliance-effect association for a given mediator but also that there be no cross-mediator compliance-effect association. That is, the site-average effect of T on a given mediator
Discussion
Summary of Multiple-Site, Multiple-Mediator IV Assumptions
To summarize, in the case of a multisite study in which a treatment T may affect the outcome Y through multiple mediators, we require a number of assumptions in order to identify the average causal effects of the mediators using MSMM-IV methods. In order to identify the ATE in the population, the relevant assumptions are as follows:
SUTVAs;
Exclusion restriction;
Person-specific linearity of the mediators with respect to the treatment;
Person-specific linearity of the outcome with respect to the mediators;
Parallel mediators;
Within-site ignorable treatment assignment;
Zero within-site compliance-effect covariance for each mediator;
Compliance matrix has sufficient rank;
Between-site cross-mediator compliance-effect independence.
In order to identify the CATE in the case of a binary treatment and binary mediators, Assumption (vii.a) is replaced by Assumption (vii.b), no defiers for any mediator; and Assumption (ix.a) is replace by (ix.b), between-site independence of the compliance and complier average effects.
Note that six of these assumptions—SUTVA, the exclusion restriction, the two linearity assumptions, ignorable treatment assignment, and either the zero within-site compliance-effect covariance assumption or the no defiers assumption—are identical to those required for the single-site, single-instrument, single-mediator case (though often the two linearity assumptions are ignored because they are met trivially when the instrument and mediators are binary). Assumptions (v), (viii), and (ix) are specific to the multiple-site, multiple-mediator case (though the sufficient rank Assumption [viii] is equivalent to the instrument effectiveness assumption when there is a single site and single mediator, as we note above). We discuss these three assumptions in more detail below.
The Parallel Mediators Assumption
The assumption that the mediators impact an outcome in parallel is a nontrivial assumption (see Appendix A, which can be found at http://smr.sagepub.com/supplemental/, for a detailed discussion). Consider the Duncan et al. (2011) study described above. In this study, 16 implementations of random assignment welfare-to-work experiments were used to estimate the impact of three hypothesized mediators of the programs: income, hours worked, and welfare receipt. The MSMM-IV models used assume that none of these mediators affects the others. However, this is an implausible assumption, given that both hours worked and welfare receipt are clearly linked to income.
The MTO study analyzed in Kling et al. (2007) provides an opportunity to consider the parallel mediators assumption in concrete terms. In this study, random assignment to a voucher was hypothesized to affect outcomes via two potential mediators—use of the voucher and NP. Because NP could not be influenced except through use of the voucher, the implied structural model is that shown in Figure 2.

Hypothesized treatment and mediator effects in the MTO study.
In this model, treatment assignment affects NP only through use of a voucher (V). Both NP and V may then affect an outcome Y. As detailed in Appendix A, which can be found at http://smr.sagepub.com/supplemental/, identification of
The Site-average Compliance-Effect Independence Assumption
The assumption that the site-average compliances are independent of the site-average effects is nontrivial. Because site-average compliance effects are not randomly assigned to sites, they may not be independent of the site-average mediator effects. Consider a simple example. Suppose we have a multisite study of the impacts of welfare-to-work programs, as in Duncan et al. (2011), where the programs are hypothesized to affect child outcomes by affecting mothers’ hours worked, income, and welfare receipt. Suppose that entry-level wages and the cost of living are higher in some sites than others. In this case, randomized assignment to a training program may induce a greater increase in hours worked and income (higher compliance) in high-wage sites than in low-wage sites (because the wage benefits of work are greater); however, the effect of increased income on child achievement may be lower in high-wage sites than in low-wage sites, because the cost of child care, preschool, and school quality is higher. Such a pattern would induce a negative correlation between the work and income effects of the program and the effects of income on children, violating the assumption of site-average compliance-effect independence.
Although the compliance-effect independence assumption is not empirically verifiable, it may be falsifiable, given sufficient data. Equation (9) implies that, in a multisite study with P mediators and in which each of the nine assumptions is met, a plot in (P + 1) space of the site-average ITT effects (the
In Online Appendices B and C, which can be found at http://smr.sagepub.com/supplemental/, we derive expressions for the bias in the two-stage least squares MSMM-IV estimator when the site-average compliance-effect independence assumption fails.
The Sufficient Rank Assumption
The sufficient rank assumption is relatively straightforward. In order to identify the effects of P mediators using an MSMM-IV model, we require at least as many sites as mediators; we require that the effect of treatment assignment on the mediators varies across sites (for at least P − 1 of the mediators); and we require that there are at least P sites among which these effects are linearly independent. In many practical applications, these assumptions are likely to be met. The average effect of treatment assignment on a mediator is likely to vary across sites for a variety of reasons, including differential implementation, heterogeneity of populations, and differences among sites in baseline conditions or capacity. Moreover, unless the mediators are conceptually very similar, the effects of treatment assignment on the mediators are unlikely to be perfectly collinear.
Nonetheless, in practical applications, the effects of treatment assignment on the mediators are likely to be somewhat correlated (though not perfectly) across sites. This may occur because in sites where a treatment is well implemented, the treatment may affect all mediators more than in sites where it is poorly implemented. Or it may occur because the mediators are correlated in the world, leading to a correlation of compliances. For example, because income is correlated with hours worked, sites in which a treatment—such as a welfare-to-work experiment—induces large changes in hours worked will tend to also be sites in which the same treatment induces large changes in income.
Although such correlations among the
Conclusion
If each of the nine assumptions described above is met, the effects of each mediator are, in principle, identifiable from observed data. Such models provide a possible approach to estimating the effects of the mediators of treatment effects when such mediators cannot themselves be easily assigned at random. The assumptions necessary for consistent identification in MSMM-IV models are not, however, trivial. In addition to the usual IV assumptions, such models require several assumptions. The parallel mediator and site-average compliance-effect independence assumptions, in particular, are relatively strong, and cannot be empirically verified (though with large samples the compliance-effect independence assumption may be falsifiable). Justification of such models must rely, therefore, on sufficiently strong theory or prior evidence to warrant these assumptions.
Although we have framed our discussion in the context of a multisite randomized trial, where “sites” are specific locations (different cities in the MTO example, different studies and cities in the welfare-to-work example), the same logic would apply to any study in which randomization occurs within identifiable subgroups of individuals. Thus, one could stratify the sample of a large randomized trial by sex, age, and race, and treat each sex-by-age-by-race cell as a “site” in order to create multiple “site”-by-treatment interactions as instruments. This would, in principle, allow one to identify the effects of multiple mediators within a single (large) randomized trial, but only under the set of assumptions we describe above. Alternately, one could estimate a set of propensity scores, indicating each individual’s “propensity to comply” with each mediator, and then stratify the sample by vectors of these propensity scores. Using such strata as “sites” in an MSMM-IV model would have two advantages: It would ensure there is no or little within-site compliance-effect covariance (because compliance would be near constant within compliance strata); and it may allow one to create strata among which the site-average compliances are uncorrelated, which may increase the precision of the estimates. Estimating “propensity to comply,” however, is itself a nontrivial enterprise, relying on an additional set of rather strong assumptions (which we do not address here).
Several important issues remain to be addressed in order to fully understand the use of MSMM-IV models. First, although failure of the assumptions will lead to inconsistent estimates, it is not clear how severe the bias resulting from plausible failures of the parallel mediators and compliance-effect independence assumptions will be. Second, we have not discussed the properties of specific estimators of MSMM-IV models or the computation of standard errors from such models. Both issues merit further investigation.
Finally, although the nine assumptions we outline above ensure the consistent estimation of the effects of multiple mediators, they do not ensure unbiased estimation in finite samples. In single-site single-mediator IV models, finite sample bias is a concern when the average compliance is small relative to its sampling variance. In multiple-site, multiple-mediator models, finite sample bias is more complex. In general, however, finite sample bias is likely to be a concern when both the average compliance (across sites) is small and the variance of the site-average compliances is small, relative to the sampling variation of the site-average compliances. A full discussion of finite sample bias is beyond the scope of this article, however.
Footnotes
Authors’ Note
An earlier version of this article was presented at the Annual Meeting of the Society for Research on Educational Effectiveness, Washington, DC, March 2011. All errors are our own.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a grant from the Institute for Education Sciences (R305D090009), and benefited enormously from lengthy conversations with Howard Bloom, Fatih Unlu, Pei Zhu, and Pamela Morris.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
