Abstract
Conventional advice discourages controlling for postoutcome variables in regression analysis. By contrast, we show that controlling for commonly available postoutcome (i.e., future) values of the treatment variable can help detect, reduce, and even remove omitted variable bias (unobserved confounding). The premise is that the same unobserved confounder that affects treatment also affects the future value of the treatment. Future treatments thus proxy for the unmeasured confounder, and researchers can exploit these proxy measures productively. We establish several new results: Regarding a commonly assumed data-generating process involving future treatments, we (1) introduce a simple new approach and show that it strictly reduces bias, (2) elaborate on existing approaches and show that they can increase bias, (3) assess the relative merits of alternative approaches, and (4) analyze true state dependence and selection as key challenges. (5) Importantly, we also introduce a new nonparametric test that uses future treatments to detect hidden bias even when future-treatment estimation fails to reduce bias. We illustrate these results empirically with an analysis of the effect of parental income on children’s educational attainment.
Introduction
Hidden bias from unobserved confounding is a central problem in the social sciences. If unobserved variables affect both the treatment and the outcome, then conventional regression and matching estimators cannot recover causal effects (e.g., Morgan and Winship 2015; Rosenbaum 2002). One set of strategies for mitigating confounding bias that has been used in scattered contributions in sociology and economics involves future treatments, that is, values of the treatment that are realized after the outcome has occurred. The basic intuition behind these strategies is that the same unobserved confounder that affects the treatment variable before the outcome often also affects a future value of the treatment variable, measured after the outcome. If so, future values of the treatment are proxy measures of the unmeasured confounder and may help remove bias.
A few authors have previously appealed to this intuition and proposed a variety of different estimators. For instance, prior research has exploited future treatments in structural equation models (SEM) (Mayer 1997), used future treatments to measure and subtract unobserved bias (Gottschalk 1996), and employed them as instrumental variables (Duncan, Connell, and Klebanov 1997). 1
We posit that future-treatment strategies hold significant promise for social science research for several reasons. First, future treatments can help detect, reduce, and even remove bias from unobserved confounding. Second, future values of the treatment are routinely available in panel data. Third, since future-treatment strategies require only that the treatment variable varies over time (i.e., not the outcome), they are available even when individual-level fixed-effects panel estimators are not. Fourth, since different future-treatment strategies impose different assumptions about the data-generating process (DGP), they are applicable across a wide range of different substantive settings.
In this article, we analyze several prior future-treatment strategies and propose a new and simpler, but more robust, new strategy. We discuss the conditions under which future values of the treatment can reduce or fully remove confounding bias. We also highlight the conditions under which future-treatment strategies introduce more bias than they remove. Specifically, we show that future-treatment strategies are vulnerable in two scenarios: where the outcome affects future treatment (selection) and where past treatment affects future values of the treatment (true state dependence). Yet, even when future-treatment strategies fail to reduce bias, they can still be used for detecting the presence of bias. Thus, we develop a new nonparametric test for hidden bias.
We investigate the performance of future-treatment strategies across a range of data-generating processes, and we assess their relative performance compared to regular regression estimates without corrections for unobserved confounding. We present our analysis in two complementary formats. First, we present our analysis graphically to assist empirical researchers in determining quickly whether a future-treatment strategy is appropriate for their substantive application. Second, we assume linearity to link our graphical results to familiar regression models and to quantify biases. (Online Appendices discuss related approaches, instrumental variables estimation, and provide proofs.) Finally, we illustrate the application of future-treatment strategies with an empirical example that estimates the effect of parental income on children’s educational attainment. The analysis rules out that conventional treatment effect estimates are unbiased and underlines the attractiveness of our control estimate, which implies a smaller causal effect of parental income and children’s educational attainment than that yielded by traditional regression.
Preliminaries: Directed Acyclic Graphs (DAGs), Linear Models, and Identification
In this section, we describe the tools of our formal identification analyses, following Pearl (2013). Since the causal interpretation of statistical analyses is always contingent on a theoretical model of data generation, we first review DAGs to notate the assumed DGP. Second, we state Wright’s (1921) rules, which link the causal parameters of the DGP to observable statistical associations (covariances and regression coefficients) in linear models. 2 Readers familiar with DAGs and Wright’s rules may skip this section.
We use DAGs to notate the causal structure of the analyst’s presumed DGPs (Elwert 2013; Pearl 2009). DAGs use arrows to represent the direct causal effects between variables. We mostly focus on DAGs comprising four variables: a treatment, T, an outcome, Y, a future (postoutcome) value of the treatment, F, and an unobserved variable, U. In keeping with convention, we assume that the DAG shows all common causes shared between variables, regardless of whether these common causes are observed or unobserved. For example, in the DAG
DAGs empower the analyst to determine whether the observed association (e.g., a regression coefficient) between treatment and outcome identifies the causal effect or is biased. The observed association between treatment and outcome is said to identify the causal effect if the only open path connecting treatment and outcome is the causal pathway,
For most of this article, we assume a linear DGP with homogenous (constant) effects, the conventional workhorse of social science. The assumption of linearity may not always be terribly realistic, but it has the advantage of convenience, as it links DAGs directly to ordinary least squares (OLS) regression and conventional SEM methodology. Under linearity and homogeneity, DAGs become linear path models, and every arrow in a linear path model is fully described by its path parameter, p, which quantifies its direct causal effect. Since path parameters are causal effects, they cannot be observed directly. 3 Later, we also consider fully nonparametric models when developing a new test for unobserved confounding.
We work with standardized variables (zero mean and unit variance) throughout for ease of exposition. Standardized path parameters cannot exceed 1 in magnitude.
4
To prevent model degeneracies, we assume that all path parameters lie strictly inside the interval
Wright’s (
1921
) path rule: The marginal (i.e., unadjusted) covariance between two standardized variables A and B,
That is, to calculate the marginal covariance between two variables A and B, first compute the product of the path parameters for each of the open paths between A and B and then sum these products across all open paths. Next, we link the marginal covariances to OLS regression coefficients (with or without control variables). The OLS regression coefficient on T in the unadjusted regression
We call
We call
We omit observed control variables (other than F) from the analysis because they do not contribute to intuition. All of our results generalize to the inclusion of pretreatment control variables. 5
Putting these elements together, the subsequent analyses proceed in four steps. First, we draw the DAG for the DGP we wish to analyze. Second, we use Wright’s rule to express the marginal covariances between observed variables in terms of the true path parameters. Third, we plug these covariances into equations (1)–(3) to obtain the regression coefficients. Finally, we investigate whether any of these regression coefficients, or functions of regression coefficients, equal (or “identify”) the desired causal effect of the treatment on the outcome, and we quantify possible biases.
The Problem: Unobserved Confounding
Figure 1 highlights the problem of unobserved confounding and illustrates our running example. The DAG shows the DGP for an observational study to estimate the total causal effect of a treatment, T (e.g., parental income), on an outcome, Y (e.g., children’s years of completed education). Since treatment is not randomized, the effect of T on Y is likely confounded by one or more factors, U, that jointly affect T and Y (e.g., parental ambition). If so, the unadjusted association between T and Y will be biased for the causal effect of T on Y, because the association will be a combination of the association transmitted along the open causal path

Directed acyclic graph for an observational study of parental income, T, on children’s years of education, Y, with unobserved confounder(s), U, for example, parental ambition.
This regression coefficient is obviously biased for the true causal effect of T on Y, b. The bias equals
Strategies of Bias Correction With Future Treatments
Future treatments can be used to reduce and even fully remove bias from unobserved confounding, depending on both the analytic strategy (e.g., the chosen regression specification) and the assumptions of the DGP. In this section, we introduce two future-treatment strategies under the assumptions of the DGP shown in Figure 2. This model represents a best-case scenario for future-treatment strategies and is commonly assumed in the literature (e.g., Mayer 1997). The model assumes that the causal effect of T on Y is confounded by one or more unobserved variables, U, and that all unobserved factors, U, that confound T and Y also affect the future value of the treatment, F. In other words, F is a proxy measure for the unobserved confounder U. 6

A confounded study where the future value of the treatment, F, is a proxy for the unobserved confounder(s), U.
The assumption that all unobserved confounders of T and Y also affect F is central for future-treatment strategies. Because the assumption cannot be tested empirically, it has to be defended on theoretical grounds. In many applications, it is eminently credible. For example, if parents’ unmeasured ambition, U, affects parental income, T, prior to the child completing education, Y, it likely also affects parental income after the child has completed education, F.
Control Strategy of Future Treatments
Most future-treatment strategies in one way or another exploit the fact that F is a proxy for U. Here, we propose a simple estimator that exploits this fact directly: Since F is a proxy that carries information about U, controlling for F in the regression
Definition 1 (control-strategy estimator): The control-strategy estimator, bC
, for the causal effect of T on Y, b, is given by the F-adjusted regression coefficient on T,
Result 1 evaluates the control-strategy estimator for data generated by the DGP in Figure 2:
Result 1 (bias of the control-strategy estimator in the best case): In data generated by the DGP in Figure 2, the control-strategy estimator evaluates to:
Clearly, the control estimator remains biased because
Result 2 (strict bias reduction of the control-strategy estimator in the best case): In data generated by the DGP in Figure 2, the control-strategy estimate is strictly less biased than the unadjusted OLS estimate.
To see this, note that the control-strategy estimator multiplies the unadjusted OLS bias,
Figure 3 illustrates bias reduction in the control-strategy estimator compared to the unadjusted OLS estimator by graphing the absolute value of the bias multiplier of the control strategy,

Absolute bias multiplier,
The stronger the effect of U on F,
The control strategy gives empirical researchers a straightforward tool for reducing bias from unobserved confounding. All it takes is adding F as a regressor to the regression of Y on T. Controlling for F would also work if researchers additionally included pretreatment covariates, X, in the regression. The bias formulas would have to be adjusted somewhat, but the logic would remain the same. To return to our running example, under the model assumptions of Figure 2, bias in the estimated effect of parental income measured before children complete education would be strictly reduced by controlling for future parental income measured after children complete their education.
Mayer’s Strategy
Mayer (1997) takes a different approach to bias reduction with future treatments. Instead of simply controlling for F in a regression model, she solves the structural equations of the DGP in Figure 2 under the additional assumption that the unobserved confounder, U, affects the future treatment, F, exactly like it affects the treatment, T,
This system is solved uniquely for the desired causal effect, 8
Definition 2 (Mayer’s [
1997
] estimator): Mayer’s estimator for the causal effect of T on Y, b, is given by,
The advantage of Mayer’s estimator is that it removes all bias under the assumptions that the data are generated as in Figure 2 and that U affects T exactly as it affects F,
Second, in contrast to the control-strategy estimator, Mayer’s estimator can increase the unadjusted OLS bias, as shown in result 4.
In other words, bias amplification occurs either (1) when U affects T and F in opposite directions or (2) when U affects T and F in the same direction, but the magnitude of the effect
The solid line in Figure 3 illustrates bias reduction and bias amplification of Mayer’s estimator by graphing the absolute value of the bias multiplier,
The possibility of bias amplification in Mayer’s estimator has not been noted previously. Whether bias amplification occurs depends on the empirical setting and must be carefully evaluated based on subject matter knowledge. We believe that bias amplification can be excluded in many settings. First, in many applications, U will not affect T and F in opposite directions. In our example, it is not generally plausible that parental ambition, U, increases parental income early on, T, but decreases it later, F. Second, since the shared unobserved confounder U is by assumption a baseline characteristic that is temporally closer to T than to F, the effect of U on T will likely exceed the effect of U on F, that is,
On the other hand, we cannot entirely rule out the possibility of bias amplification, even in our running example. Suppose, for example, that we analyze the effect of parental income on children’s educational outcomes among young parents. Young parents with high ambition may still be enrolled in college and hence earn little compared to their less ambitious counterparts who already have jobs. Later, however, these highly ambitious parents may become high-earning professionals, whereas their less ambitious counterparts remain stuck in lower paying jobs. Hence, the effects
Implementing Mayer’s Strategy as a Difference Estimator
The original presentation of Mayer’s estimator required customized programming. Next, we show that Mayer’s estimator straightforwardly equals the difference between two OLS regression coefficients. This enables estimation via all standard statistical software packages and provides additional intuition.
Definition 3 (difference estimator): The difference estimator for the effect of T on Y is the difference between the coefficients on T and F in the regression
Result 5 (equivalence of the difference and Mayer’s estimators):
The equivalence between Mayer’s estimator and the difference estimator holds for all DGPs—not just the DGP of Figure 2—because the definition of the estimators only draws on empirical covariances and not on the structure of the DGP.
Equating Mayer’s estimator with the difference estimator provides additional insight: The idea behind the difference estimator is to use future treatments first to measure and then to remove the spurious association between T and Y.
This fact is best appreciated by investigating the difference estimator under the assumption that the effect of U on T equals the effect of U on F,
Expressing Mayer’s estimator as a difference estimator helps explicate the properties that we claimed for it above. First, the Mayer/difference estimator removes all bias only if
We note that the difference estimator for future treatments has some history in social science methodology. Versions of this differencing logic are discussed by Gottschalk (1996), who explicitly uses future treatments, and by DiNardo and Pischke (1997) and Elwert and Christakis (2008), who analyze structurally similar models without future treatments. Online Appendix A evaluates Gottschalk’s (1996) estimator.
The Difference Strategy of Future Treatments Is Different From Difference-in-differences (DiD)
Despite superficial similarities, the Mayer/difference strategy of future treatments differs from conventional DiD, or gain score, estimation. While both approaches assume the same qualitative causal structure for the DGP, shown in Figure 2, they impose different parametric constraints on this structure and hence derive different estimators. Mayer’s approach interprets F as a future (postoutcome) value of the treatment and assumes that U affects T and F equally,
Choosing Between Future-treatment Estimators
Next, we compare the performance of the two future-treatment strategies and provide guidance for choosing between them. We continue to assume that the data are generated by the DGP of Figure 2. Obviously, maximally cautious analysts should always prefer the control estimator, because, in contrast to the Mayer/difference estimator, it guarantees bias reduction when the data are produced by the DGP in Figure 2 regardless of the relative size of the path parameters. Bias reduction with the control estimator, however, is often quite modest. For most values of the effect
Analysts can sometimes decide between the two future-treatment estimators by comparing the relative positions of the unadjusted OLS, control, and Mayer/difference estimates. Figure 4 illustrates the decision process. Since the control estimate, in expectation, is closer to the true treatment effect than is the unadjusted OLS estimate, the difference between the control estimate and the unadjusted OLS estimate reveals the direction of the unadjusted OLS bias. For example, if the unadjusted OLS estimate is

Illustration of the heuristic for choosing between estimates. The relative position of the control (C), difference (D), and unadjusted OLS (O) estimates can help the analyst decide between alternative estimates. In data generated by Figure 2, the location of the control estimate indicates the direction of unadjusted OLS bias (in this example, upward bias). In scenarios (1) and (2), the control estimate is preferred. In scenario (3), additional assumptions are needed to decide between the control and difference estimates.
If the control and Mayer/difference estimators change the unadjusted OLS estimate in the same direction, but the Mayer/difference estimator is farther away from the unadjusted OLS estimate than is the control estimate (Figure 4, scenario 3), then it does not follow that the Mayer/difference estimator is automatically preferred. For example, with
Challenges to Bias Correction With Future Treatments
The DGP of Figure 2, analyzed so far, provides a best-case scenario for future-treatment strategies to reduce confounding bias in unadjusted OLS regressions because it guarantees bias reduction for the control estimator and even complete bias removal with the Mayer/difference estimator if
True State Dependence: When Treatment Affects Future Treatment
Past and future values of the treatment are typically correlated over time. One reason for this association could be mutual dependence of T and F on the unmeasured confounder U along the path

An unconfounded study with true state dependence of treatment,
To isolate the problem of true state dependence, we first analyze the performance of future-treatment strategies when the effect of T on Y is not confounded (no arrow
Future-treatment strategies are vulnerable to true state dependence because needlessly controlling for F would introduce bias. Since F is a collider variable on the noncausal path
Note that the control estimator in this scenario is biased even though the unadjusted OLS estimator is not. As expected, the bias in the control estimator under true state dependence is a function of the path parameters on the noncausal path
The Mayer/difference estimator is also biased in this scenario, even though the unadjusted OLS estimator is not. Comparing equations (13) and (14) shows that true state dependence introduces less bias into the control-strategy estimator than into the Mayer/difference estimator, unless true state dependence is strongly positive,
Next, we analyze the empirically more interesting DGP of Figure 6, which combines Figure 2 with Figure 5 to form a scenario of true state dependence with unobserved confounding. Here, U is a confounder of T and F, which motivates the use of F as a proxy control to reduce bias in the unadjusted OLS estimator, but T also directly causes F via true state dependence, thus introducing bias into both future-treatment estimators. Without further restrictions, the analytic expressions for the control and difference estimators are unwieldy and scarcely informative (not shown). Depending on the exact parameter constellation, both future-treatment strategies could reduce bias or increase bias in the unadjusted OLS estimator. Hence, analysts must carefully consider existence, direction, and size of true state dependence in their empirical applications.

Nonetheless, future-treatment strategies remain promising if the analyst can defend certain parametric restrictions on the relative size of the path parameters. Consider, for example, the restriction that U affects T to the same extent as it affects F,
Result 6 (bias of the control estimator with true state dependence): In data generated by the model in Figure 6 with the constraint
Result 7 (bias of the Mayer/difference estimator with true state dependence): In data generated by the model in Figure 6 with the constraint
The bias multipliers of the control and Mayer/difference estimators, RC and RM , are obviously closely related, though their behavior is somewhat surprising. Explorations of the parameter space (see Online Appendix E) reveal several facts, summarized in Table 1:
Performance of the Control Estimator and the Mayer/Difference Estimator in the Presence of State Dependence and Assuming
Result 8 (relative performance of the control and Mayer/difference estimators under confounding and true state dependence): In data generated by the model in Figure 6 with the constraint
With (often unrealistic) negative
11
state dependence, The Mayer/difference estimator is strictly bias reducing, The control estimator is strictly bias amplifying, and the bias increases as true state dependence becomes more negative.
With (often realistic) positive state dependence, (3) The Mayer/difference estimator reduces bias up to moderate positive state dependence, 0 ≤ f ≤ .05 for up to moderately strong confounding, |a| < 5. It strictly amplifies bias above f > 05. (4) The control estimator reduces bias for most values of positive state dependence, 0 ≤ f ≤ 0.78, for up to moderately strong confounding, |a| < 0.5. It strictly amplifies bias above f ≥ 0.78. (5) The control estimator is less biased than the Mayer/difference estimator above f = 0.37, and more biased otherwise.
Table 1 underlines that true state dependence, which is a common concern in sociology, ruins the strict bias-reduction property of the control estimator. To muddy matters further, judging the size of the standardized path parameters in this scenario is not easy in practice. The difficulty is that the variances of T and F almost certainly differ when true path dependence is present. Hence, it is difficult to assert the equality of
Nonetheless, under realistic values of moderate positive true state dependence, both the control and the Mayer/difference estimators are likely bias reducing. And for moderate and strong positive state dependence, the control estimator outperforms the Mayer/difference estimator and remains (strongly) bias reducing as long as the effects of U on T and F are not too large.
Selection Bias: When the Outcome Affects Future Treatment
Selection also complicates future-treatment strategies for unobserved confounding. We say that selection is present when the outcome exerts a causal effect on the future value of the treatment, as captured by the arrow Y → F in Figure 7. Selection is a concern in many situations. For example, in a study of the effect of parental income on educational attainment, college enrollment might affect parents’ income if parents adjust their labor supply to the financial needs of the child. In other scenarios, selection may be absent. For example, when studying the effect of parental income on children’s test scores, it is implausible to believe that children’s test scores affect future values of parental income (except, perhaps, when a child’s abysmal test scores inspire a parent to quit their job to tutor the child).

An unconfounded study with selection,
Figure 7 isolates the problems of selection. Since the effect of T on Y is unconfounded, the unadjusted OLS estimator again recovers the true causal effect,
and the difference strategy estimator evaluates to
Since

A confounded study with selection,
The bias-reduction properties of both future-treatment estimators with confounding and selection strongly depend on the underlying path parameters. Simulations (see Online Appendix) suggest that the Mayer/difference estimator is usually performing worse, and often dramatically so, than the control estimator as long as the path parameters, p, are not too large,
Table 2 summarizes the divergent performance of the control and the Mayer/different estimator. The upshot is that for scenarios in which path parameters are at most moderately strong
Performance of the Control Estimator and the Mayer/Difference Estimator in the Presence of Selection and Weak to Moderate Path Parameters,
Future-treatment Tests for Unobserved Confounding
Future treatments can also be used to detect, and even formally test for, the presence of unobserved confounding between T and Y. Importantly, testing for bias is possible under substantially weaker assumptions than bias reduction or bias removal. In this section, we develop a nonparametric test for unobserved confounding via two results. (Readers uninterested in the technical details may skip directly to result 12.)
Definition 4: Let V be an unobserved variable that directly causes treatment, T,
Assumption 1: The DGP contains at least one unobserved variable V.
Assumption 1 is quite minimal compared to models discussed before. First, assumption 1 requires only partial knowledge of the DGP rather than a fully articulated DAG. Second, it is nonparametric, that is, it puts no restrictions on the functional form of the effects. Third, it does not require constant effects across individuals.
Result 11: Given assumption 1, independence between F and Y conditional on T and X,
By contraposition, result 11 says that unobserved confounding between T and Y will lead to a conditional association between Y and F (given T and

The causal effect of T on Y is confounded after controlling for
Result 11 thus turns the conditional independence between F and Y into an indicator for the absence of unobserved confounding between T and Y. However, result 11 does not yet justify a formal test for unobserved confounding because the result is not symmetric: Independence between F and Y (given T and

F and Y are associated even though there is no unobserved confounding between T and Y. Assumption 1 holds because

F and Y are associated even though there is no unobserved confounding between T and Y. Assumption 1 holds because
The asymmetry in result 11 is fixed, and a proper test for unobserved confounding is provided, by assumptions 2 and 3 in result 12.
Definition 5: Let
Assumption 2: All variables in
Like assumption 1, assumption 2 is nonparametric, that is, does not assume linearity or effect homogeneity. Nonetheless, assumption 2 is stronger than assumption 1 (which it implies). Assumption 2 requires that all unobserved causes of F (except those in Fs idiosyncratic error term) also cause T. This rules out unobserved confounding between Y and F by any event that occurs after T. Assumption 2, however, still does not require that all unobserved confounders of T and Y also cause F, as we had assumed in prior sections of this article (e.g., Figure 2).
Assumption 3: Y does not directly or indirectly cause F (no selection).
Result 12: Given assumptions 2 and 3, independence between F and Y conditional on T and
Result 12 says that any (nonparametric or parametric) test of conditional independence between F and Y given T and
using a conventional two-sided t test is a valid test of the null hypothesis of no unobserved confounding. 15
Empirical Illustration
Motivation
We illustrate the utility of future-treatment strategies by elaborating on Mayer’s (1997) original empirical example of the causal effect of parental income on children’s educational attainment. Parental income is widely observed to be positively associated with children’s test scores. This association could be at least partially causal because high income allows parents to invest more in their children’s education (Kornrich and Furstenberg 2012; Schneider, Hastings, and LaBriola 2018), for instance, by providing private tutors (Buchmann, Condron, and Roscigno 2010), which promotes educational success (see also Mayer 1997:45ff). On the other hand, the observed association could also be due to unobserved confounding, for example, by parents’ ambition, which may increase not only parents’ own income but also the educational success of their children. Of more than merely historical interest, this example remains salient for contemporary debates on intergenerational transmission, which are plagued by concerns about unobserved confounding (Morgan and Winship 2015; Sobel 1998).
Data
We analyze data from the Panel Study of Income Dynamics PSID (2019). In an effort to replicate the estimates provided by Mayer (1997:161ff), we closely follow her decisions in the construction of the analytic samples (covering birth cohorts 1954 through 1968) and variables as described there. The outcome, Y, is children’s years of education completed by age 24 (
Results
Our analyses replicate Mayer’s published results almost perfectly. For instance, Mayer’s main analysis (based on cohorts born between 1954 and 1968) estimates the unstandardized coefficient of logged family income on children’s years of education as
Our empirical illustration of future-treatment strategies estimates a series of OLS models that regress the outcome (Y, years of education) on different combinations of the covariates: The treatment (T, parental income), the future treatment (F, future parental income), and all observed pretreatment control variables mentioned above
Estimating the Causal Effect of Parental Income on Children’s Years of Education With and Without Future Treatments.
Note. Standardized OLS regression coefficients (standard errors in parentheses); weighted. Significance tests for the difference between coefficients across models are using seemingly unrelated regression.
Statistical significance at † p < .10. *p < .05. **p < .01. ***p < .001 (two-tailed test).
Best-case Scenario
The best-case scenario for future-treatment estimation is given by the DGP in Figure 2. We recall that this scenario assumes that all confounders of T and Y also affect F and that all confounders of F and Y also affect T. It also assumes the absence of true state dependence (no arrow
We begin by testing for the absence of unobserved bias in the unadjusted association between parental income, T, and child’s educational attainment, Y. Model 1 gives this unadjusted association as
Now that we have provided evidence for in the existence of bias, we use the control strategy to reduce it. The control strategy focuses on the F-adjusted coefficient in model 2,
The Mayer/difference method, applied to model 2, estimates the treatment effect as the difference between the partial coefficients on T and F,
From the application of the control strategy, we learned that the naive estimate of model 1 is upwardly biased. If the Mayer/difference estimator had yielded a larger estimated treatment effect than the naive estimate (cf. Figure 4, scenario 1), we would have concluded that the Mayer/difference strategy amplifies rather than reduces existing bias. If, by contrast, the Mayer/difference estimator had fallen between the naive and the F-adjusted estimate of the treatment effect, then, by the argument of Figure 4 (scenario 2), we would have concluded that the difference strategy is less effective in reducing bias than the control strategy. In both instances, we would have preferred the control estimate to the Mayer/difference estimate.
In our application, however, the Mayer/difference estimate moves the naive estimate in the same direction as, but more strongly than, the control estimate (Figure 4, scenario 3). Yet, without further assumptions about the strength of the path parameters, we do not know whether the Mayer/difference estimate is closer to the true causal effect than the control estimate. The most conservative analyst may therefore prefer the estimate provided by the control strategy in this empirical application, noting, however, that bias reduction may be relatively modest unless the effect
In some applications, the analyst may have reasonable expectations about the direction and sign of the effects
Next, we estimate a treatment effect by controlling for observables (model 3). This covariate-adjusted model estimates the treatment effect as
Although, in model 3, we control for a number of important control variables, the careful analyst will still worry about unobserved bias. This worry is addressed in the future-adjusted model 4. Again, the null hypothesis of no unobserved bias cannot be rejected since the coefficient on F,
The control estimate of model 4 is smaller than the baseline treatment effect of model 3 (
True State Dependence and Selection
When the analyst is not willing to rule out true state dependence and selection, the interpretation of the estimates presented in Table 3 may or may not change. In our empirical example, true state dependence,
In our empirical example, selection,
Conclusion
The problem of unobserved confounding is profound. Most research in the social sciences is observational and observational studies cannot rule out bias from unobserved confounding. The direction and especially the size of the bias are often difficult to gauge, in part because the bias could originate in confounders that are as yet unknown to science.
In this article, we have discussed future values of the treatment variable as a tool for detecting, reducing, and removing bias from unobserved confounding. Future treatments have occasionally been used for bias removal in prior research. Here, we have subjected several easily computed future-treatment strategies to a detailed analysis, introduced a simple new strategy, and compared the relative strengths and weaknesses of these estimators to each other and to baseline conventional regression estimates. While we identify challenges to future-treatment strategies, we do not stop there. To maximize the usefulness of future-treatment estimators in applied research, we also demonstrate how additional assumptions about effect sizes can help choose between estimators and inform their interpretation.
The idea behind future-treatment strategies is intuitive: Any variable that affects the treatment variable before the outcome likely also affects it after the outcome has been measured. In other words, future treatments can proxy for unobserved confounders. We have used this insight directly and proposed controlling for future treatments as a covariate in a regression (our control estimator). This estimator has the great advantage of being strictly bias reducing for some linear DGPs.
Analyzing important prior future-treatment strategies, we have noted that Mayer’s (1997) estimator is not strictly bias reducing even in the best-case scenario and may in fact amplify conventional OLS bias. The same is true of Gottschalk’s (1996) future-treatment estimator (Online Appendix A). Nonetheless, Mayer’s estimator holds promise because, in certain situations, it reduces bias more than the control estimator.
Future-treatment strategies have several advantages over other strategies for dealing with unobserved confounding. One advantage lies in the ready availability of future-treatment measures in most panel data. Another is the ease of implementation—including future treatments as control variables in a conventional regression analysis. In contrast to fixed-effects estimation, future-treatment strategies to reduce unobserved bias do not require repeated measures of the outcome nor do they require long panels (three periods suffice; see also Vaisey and Miles [2017] for a critical discussion of fixed-effects estimation based on three observation points). Several large social science surveys newly facilitate the application of the future-treatment strategy. For example, recent waves of the General Social Survey included three-wave panels (Hout 2017), and the redesigned Survey of Income and Program Participation includes four-wave panels.
Finally, future-treatment strategies can be used for the dual purpose of detecting and reducing—sometimes even removing—unobserved confounding. Indeed, we have shown that future treatments can detect the presence of bias even when they cannot reduce this bias.
A limitation shared with all strategies for reducing and removing hidden bias from unobserved confounding is that causal inference always requires detailed knowledge of the DGP. Within the confines of linear and homogenous models, we have highlighted two conditions that pose particular challenges for future-treatment estimators: true state dependence (when prior treatment causes future treatment) and selection (when the outcome causes the future treatment). In both scenarios, all future-treatment estimators may increase or decrease bias in unadjusted OLS estimates. And whereas selection may be ruled out in many substantive applications, true state dependence often remains a credible threat. Based on our analytic results, however, we have argued that the control estimator remains bias reducing for moderate confounding under moderate true state dependence and is surprisingly robust to selection as well.
Since future-treatment strategies make different demands on the DGP than fixed effects or instrumental variables estimators (see Online Appendix B), and because future-treatment measures are widely available in panel data, future-treatment strategies promise help where other popular strategies may fail.
Supplementary Material
Supplemental Material, ElwertPfeffer2019_OnlineAppendix - The Future Strikes Back: Using Future Treatments to Detect and Reduce Hidden Bias
Supplemental Material, ElwertPfeffer2019_OnlineAppendix for The Future Strikes Back: Using Future Treatments to Detect and Reduce Hidden Bias by Felix Elwert and Fabian T. Pfeffer in Sociological Methods & Research
Footnotes
Authors’ Note
Acknowledgment
We thank Yongnam Kim and Zeyu Wei for valuable advice and N. E. Barr for copy editing. This work was supported by a grant from the University of Wisconsin Graduate School Research Competition and a Vilas Mid-Career Fellowship from the University of Wisconsin. We gratefully acknowledge use of the services and facilities of the Center for Demography and Ecology at the University of Wisconsin–Madison, funded by NICHD Center Grant P2C HD047873, and the Population Studies Center at the University of Michigan, funded by NICHD Center Grant R24 HD041028. The collection of data used in this study was partly supported by the National Institutes of Health under grant number R01 HD069609 and the National Science Foundation under award number 1157698.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a grant from the University of Wisconsin Graduate School Research Competition and a Vilas Mid-Career Fellowship from the University of Wisconsin; Center for Demography and Ecology at the University of Wisconsin–Madison, funded by NICHD Center Grant P2C HD047873; and the Population Studies Center at the University of Michigan, funded by NICHD Center Grant R24 HD041028. The collection of data used in this study was partly supported by the National Institutes of Health under grant number R01 HD069609 and the National Science Foundation under award number 1157698.
Supplementary Material
Supplementary material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
