Abstract
Social scientists routinely address temporal dependence by adopting a simple technical fix. However, the correct identification strategy for a causal effect depends on causal assumptions. These need to be explicated and justified; almost no studies do so. This article addresses this shortcoming by offering a precise general statement of the (nonparametric) causal assumptions required to identify causal effects under temporal dependence. In particular, this article clarifies when one should condition or not condition on lagged dependent variables (LDVs) to identify causal effects: one should not condition on LDVs, if there is no reverse causation and no outcome autocausation; one should condition on LDVs if there are no unobserved common causes of treatment and the lagged outcome, or no unobserved persistent causes of the outcome. When only one of these is true (with one exception), the incorrect decision will induce bias. Absent a well-justified identification strategy, inferences should be appropriately qualified.
Introduction
Temporal dependence poses a problem for causal inference. Temporal dependence is the property that, after conditioning on covariates, observations are not independent across time; for parametric estimation, this means that the error term is not independent across time. 1 Concerns about temporal dependence are prevalent in studies of time series (cross-sectional) data; some of the most highly cited works in political methodology concern this issue (Beck and Katz 1995; Beck, Katz, and Tucker 1998; Box-Steffensmeier and Jones 2004). 2 Failure to properly account for temporal dependence can lead to biased, usually too small, standard errors. Further, depending on the cause of the temporal dependence, estimates of causal effects 3 can be biased and inconsistent. It is, therefore, imperative that scholars who seek to draw causal inferences understand well the implications of temporal dependence.
One especially common response to temporal dependence involves conditioning on (possibly transformed) lagged dependent variables (LDVs). Conditioning is the process of adjusting an estimate based on the values of a set of covariates (the “conditioning set”). This adjustment is usually done by including control variables in a regression model, though adjustment can also be done using other methods such as stratification, matching, and inverse-probability weighting. A lagged dependent variable is the dependent variable from a previous time period. Event history models based on “gap time” 4 implicitly condition on (transformed) LDVs (see Online Appendix A1).
In some areas of social science, such as in international relations, almost all work conditions on (transformed) LDVs; it is taken for granted that one should “control for temporal dependence.” For example, there was a recent exchange in Political Analysis about the appropriate way to control for temporal dependence, with Carter and Signorino (2010) recommending cubic polynomials of time since the last event against cubic splines of time since last event (Beck et al. 1998); these are each a specific transformation of the LDVs, see Online Appendix A1. Assumed by this conversation was that controlling for some function of the LDVs would improve estimates of the causal parameters of interest. This article contributes to this conversation by making clear that this is not necessarily the case. 5 In fact, for the case of the study of the democratic peace, which this article considers, the identification assumptions for excluding LDVs are arguably more plausible than the identification assumptions for including them.
Estimates are not necessarily improved by controlling for temporal dependence, no matter what specification is used. Whether estimates will be improved or harmed depends on causal assumptions that need to be explicated and justified. 6 This article will systematically show what nonparametric assumptions are needed to justify using (transformed) LDVs for causal identification. This article joins other work that has pointed out some of the causal assumptions required of different temporal specifications (Glynn and Quinn 2013; Keele and Kelly 2006; Morgan and Winship 2015:chap. 11; Wilson and Butler 2007) and of the dangers of controlling for LDVs (Achen 2000). More broadly, it speaks to the importance of understanding the assumptions underlying our causal estimates (Collier, Sekhon, and Stark 2010; Dunning 2008; Sekhon 2009; Shalizi and Thomas 2011).
Nonparametric identification involves identifying causal effects without any knowledge of the functional form of the causal processes. Nonparametric identification is a useful place to start for a few reasons. First, the question of nonparametric identification is in one sense prior to the question of parametric identification, since if an effect can be nonparametrically identified then it can be parametrically identified, and we need not rely on strong parametric assumptions to do so. Second, in social science we rarely have confidence about the functional form of causal processes; accordingly, inferences that don’t depend on parametric assumptions are often more credible. Third, insights arising from studying nonparametric identification provide a foundation for understanding the challenges for identification in many of the species of parametric models, including better understanding of the existence and character of particular kinds of biases, such as omitted variable, included variable, selection, and measurement biases (works doing so using similar tools as this article include Elwert and Winship 2014; Glynn and Gerring 2013; Glynn and Kashin 2013; Glynn and Quinn 2013; Morgan and Winship 2015; Pearl 2009b).
Scholars can determine the appropriate conditioning set for nonparametric identification using the tools of structural causal models (Pearl 2009b). Using these tools, this article establishes that nonparametric identification is not possible for a general temporal causal process (process D). Not conditioning on the LDV leads to confounding biases; conditioning on the LDV leads to collider biases. Further, this result generalizes to the broader class of temporal causal processes, DG , whose causal graph contains process D as a subgraph.
This article then articulates the minimal assumptions on process D, and therefore also necessary assumptions on DG , required to nonparametrically identify the causal effect using backdoor adjustment. 7 The principle takeaway result is that we should not condition on LDVs if we can assume both that there is no reverse causation (no effect of Yt −k , the lagged outcome, on treatment) and no outcome autocausation (no effect of Yt −k on Yt not mediated by treatment); we should condition on LDVs if we can assume either that there are no unobserved common causes of treatment and the lagged outcome, or no unobserved persistent causes of the outcome. However, the causal processes implied by the above sets of minimal identification assumptions are observationally indistinguishable; the data alone can not tell us which conditioning strategy is appropriate. This article then illustrates how to think about these empirical commitments using the study of the democratic peace as an example.
To summarize, temporal dependence is a symptom of potentially deep problems for causal inference. Overcoming these problems requires making causal assumptions. These assumptions should be explicated and justified. 8 This article will show how to do so.
Temporal Dependence in Social Science
Temporal dependence is the property that, after conditioning on covariates, observations of the same unit are dependent across time. In the context of parametric models, temporal dependence is the property that the error term is dependent across time. Temporal dependence is typically diagnosed by looking for autocorrelation in the residuals of a regression (e.g., Greene 2012:§20.7). Since most statistical procedures assume independence of observations, conditional on some covariates, 9 the belief that data are temporally dependent often means that a central assumption is violated.
Temporal dependence can arise from many causal processes, such as omitted unit-specific effects, omitted temporally correlated variables, autocorrelated disturbances, and outcome autocausation (an effect of Yi,t−k on Yi ,t not going through treatment). 10 Some of these causes of temporal dependence are relatively benign, making estimates of standard errors inconsistent, but not biasing estimates of the causal parameters of interest. Other causes of temporal dependence will be more harmful, making it impossible to identify (and hence have consistent estimators for) causal parameters of interest. A large toolbox of solutions have been developed to address particular sources of temporal dependence, which include: estimating a multilevel model to account for cross-sectional unit-specific effects (Gelman and Hill 2006; Greene 2012:§11.4-11.5; Wooldridge 2010:chap. 10-11); conditioning on lags of the causal factor or other variables; conditioning on lags of the outcome, sometimes with a specific structure as with first-difference models (Allison 1990; Morgan and Winship 2015:chap. 11); quasi-differencing the outcome (Cochrane and Orcutt 1949; Prais and Winsten 1954); and estimating the variance–covariance matrix so as to allow for serial correlation such as with the Newey–West robust consistent estimator for serial correlation (Newey and West 1987; also see Beck and Katz 1995; Freedman 2006; King and Roberts 2015). For reviews, see Beck and Katz (2011), Greene (2012), Hamilton (1994), King (1998), Wilson and Butler (2007), and Wooldridge (2009, 2010).
Each of these methods is justified by showing that they lead to estimators with desirable properties for particular causal processes. Thus, a crucial step in the use of any method for causal inference is the articulation and justification of the causal assumptions underlying the method. In practice, however, most articles seem to treat temporal dependence as a problem amenable to a “technical fix,” offering no statement or defense of the causal conditions required for their method (Wilson and Butler 2007). Part of the difficulty is that the identification assumptions for many methods are bundled together and expressed as a parametric model, which are hard to unpack and judge in terms of the relative plausibility of particular causal assumptions. By contrast, social scientists are most able to evaluate claims about the existence of causal effects, sometimes the sign of effects, and sometimes the rough magnitude of effects but rarely the functional form of effects.
This article addresses this shortcoming by offering a precise statement of the nonparametric causal commitments required to identify causal effects under temporal dependence and to justify the decision to condition or not to condition on LDVs. To do so, we now introduce the problem of nonparametric identification and the tool of causal graphs for analyzing it.
Nonparametric Identification under Temporal Dependence
Social scientists want consistent estimators for causal effects. 11 In order to have a consistent estimator for a causal effect, we must be able to identify it. A causal effect is identifiable if we could determine its value, given an “unlimited number of observations” (Manski 2007:3). 12 An effect is nonparametrically identifiable if we can identify the effect without knowledge about the functional form of causal processes. 13
Nonparametric identification can be evaluated using a set of methods involving causal graphs, referred to by some as structural causal models (Pearl 2009a). Structural causal models provide a means for evaluating the circumstances under which conditioning on a particular set of covariates could, given unlimited data, nonparametrically recover causal effects. Structural causal models are similar to structural equation models but rather than identify effects using parametric assumptions, they identify effects using assumptions about the nonexistence of certain causal effects that make the causal model Markovian and acyclic (see below; Pearl 2009a:§3.2.3). If we have additional insight about functional form then we can supplement these tools with those structural features, but typically in social science we are not confident about functional form. Structural causal models are also consistent with the potential outcomes framework. 14
For textbook references on nonparametric identification using causal graphs, and closely related concepts, see Hernan and Robins (2015), Morgan and Winship (2015), Pearl (2009a), Shalizi (in press:chap. 20-24), Spirtes, Glymour, and Scheines (2001). Because these methods are relatively new to social science, the following subsection (and Online Appendix A) introduces some key concepts.
Nonparametric Identification Using Causal Graphs
Following the Neyman–Rubin model of potential outcomes (see Online Appendix A.2), the effect of W is identifiable if treatment assignment is conditionally ignorable (Rosenbaum and Rubin 1983; Stone 1993):
A directed graph is a set of vertices (V) and a set of directed edges (E). For example, graph

Graph
A directed acyclic graph (DAG), like graph
In order to apply DAGs to any given problem, we need to know when we can represent a causal process using a causal graph. A nonparametric process 16 D can be analyzed using a DAG G, if the probability function on observable variables P that D generates is Markov relative to G, which means that every variable in P is independent of its nondescendants in G, conditional on its parents (Pearl 2009b:theorem 1.2.7). A process D and probability distribution P that are Markov relative to a DAG G are said to satisfy the Parental Markov Condition. G is a causal DAG if G represents a causal process that is Markov relative to G. In simple terms, a causal process D can be depicted using a causal DAG if no variable can affect itself (no cycles), and the only causal effects not depicted are independent disturbances.
To establish ignorable treatment assignment, we want to determine whether treatment is independent of the potential outcomes. To do so, we will use a property called d-separation (d is for directional), which refers to whether two variables are not causally connected to each other. If two variables in a causal DAG are d-separated, then they will be statistically independent. On the other hand, if two variables are d-connected, then they will typically 17 be statistically dependent. See the Online Appendix A.3 for the formal definition of d-separation.
There are three basic kinds of causal relationships that will d-connect two variables. (1) Two variables will be d-connected if one is the direct cause of the other; for example, in Figure 1, c′ and W are d-connected because c′ → W, (2) two variables will be d-connected if they share a common cause that is not conditioned upon; in Figure 1, W and Y′ are d-connected because
; the
is conditioned upon, in which case we say that Y′ is an activated collider between c′ and e′, more on this below.
In general, any two variables will be d-connected if they are on a path of d-connected variables. For example, in Figure 1, Y′ and Y are d-connected through two paths:
.
18
An indirect causal effect is a sequence of directed edges with all arrows pointing in the same direction; denote an indirect effect as
A collider is a variable that is affected by two other variables (Y′ in
To summarize, dependence in a causal DAG can (and typically will) flow through a path that consists only of unblocked chains of causation, unblocked common causes, and activated colliders. Dependence will not flow through unactivated colliders. If there is no path d-connecting two variables, then they are d-separated and statistically independent.
Finally, to identify the causal effect of W on Y we want to isolate all causal paths W → Y, so called front-door paths because they involve causal paths that begin with an arrow pointing out of W (out the “front-door” of W). To do so, we need to block all confounding causal paths between W and Y, so called back-door paths because they involve an arrow pointing into W (into the “back-door” of W). If we are able to find a set of observed variables C that blocks all back-door paths, then we can identify the effect of W on Y, and we say that C satisfies the back-door criterion relative to (W,Y) (see Online Appendix A.4 for the formal definition). When there exists a C that satisfies the back-door criterion relative to (W,Y) the causal effect is identifiable using the back-door adjustment formula (see Online Appendix A.4). However, when selecting C we have to be careful not to condition on colliders that would unblock a back-door path, inducing bias. This gives rise to the counterintuitive result that conditioning on a pretreatment covariate can induce bias to what was otherwise an unbiased estimator. This is known as M bias because the causal graph makes an “M” (Greenland, 2003; Ding and Miratrix, 2015; Thoemmes, 2015). 19 In fact, it is precisely because of M bias that it is sometimes problematic to condition on LDVs, as examined below.
We can now put all of this together. If a causal process D (and induced probability distribution P) satisfy the parental Markov condition, then D can be represented by causal DAG G. Using G, we can look for a set of covariates C that will block all spurious (back-door) causal paths between W and Y, leaving only the desired (front-door) causal effects of W on Y. However, we have to be careful in selecting C that it does not activate a collider that opens a back-door path. If we find such a C that satisfies the back-door criterion relative to (W,Y) then, given infinite data and positivity, we can recover the causal effect—the full marginal distributions of the potential outcomes—by conditioning on C and using the back-door adjustment formula.
A General Causal Process: Process D
This section will introduce a general temporal causal process, denoted process D. To do so, I adopt as few assumptions about the causal process as possible and I enumerate all assumptions that I make. Assumptions in a causal DAG consist of statements about the absence of causation between observed variables, the absence of unobserved shared causes between observed variables, and the absence of other observed variables that are d-connected to at least two variables.
For ease of exposition, we first assume that there are only three observable variables of interest: the lagged outcome Y′, the treatment variable W (the causal factor of interest), and the outcome Y.
(AV ) observables: V O = {Y′, W, Y} where V O is the set of observable variables.
Later, we will weaken this assumption, demonstrating that the main result generalizes to causal processes with many observable variables, so long as the new variables do not provide exhaustive mechanisms for certain effects in process D. 20
In order for W to cause Y, W must occur prior to Y, therefore (A
1) Y is posttreatment, which implies that
(or equivalently,
For ease of exposition, I will also assume that Y′ occurs prior to W. Modifying this assumption will not change any of the main results.
(A
2) Y′ is pretreatment:
In order for it to be feasible to identify the effect of W on Y, we must assume: (A
3) No fundamental confounding:
s.t.
In process D, all pairs of causal factors are allowed to have unobserved common causes, except for W and Y (by A
3). c′ denotes all unobserved common causes of W and Y′:
This process could generate data in a time series, in a cross section, or both, so long as the observations are sufficiently independent that an infinite draw of them would identify all relevant quantities. For a depiction of how this process could look like when strung together in a time series, see Online Appendix A.5. For discussion of other issues, causal quantities, and identification strategies in time-series data, see Blackwell (2013), Blackwell and Glynn (2014), Pearl and Robins (1995), Robins, Hernan, and Brumback (2000).
Because D has independent errors, and all causality is directed acyclic, process D satisfies the parental Markov condition and can therefore be represented by a DAG, specifically Graph D in Figure 2 and analyzed using the tools of structural causal models (Pearl 2009b:theorem 1.2.7; Shalizi in press:§20.2).

Process D and Graph D.
Backdoor Paths in Process D
To identify the causal effect (through backdoor adjustment) in process D, we want to find a conditioning set C that satisfies the backdoor criterion. For a given conditioning set, the backdoor criterion will not be satisfied if there is an open backdoor path, which is a path of dependence (d-connected nodes) connecting W and Y, starting with an arrow pointing into W (in the “back door” of W).
In process D there are only two possible conditioning sets: the set containing the LDV, C = {Y′}, and the empty set, C = {}. When conditioning on the LDV, C = {Y′}, there are open backdoor paths:
(where the box denotes conditioning). These are denoted as Bcoll
because they involve a bias arising from conditioning on a collider (the LDV); “B” is used to serve as a reminder that these refer to “backdoor paths” that, when open, will induce a “Bias”.
There are also open backdoor paths when not conditioning on the LDV, C = {}. Denote these as Bcon
because they involve a confounding bias that could be blocked through the LDV.

Sources of bias. The red dashed arrows denote the biasing paths. The top figure illustrates the collider biases Bcoll that arise when conditioning on Y′. The blue square denotes conditioning on this variable. The bottom figures illustrate the confounding biases Bcon that arise when not conditioning on Y′.
Therefore, there is no conditioning set that satisfies the backdoor criterion for process D. There are confounding biases when not conditioning on the LDV; there are collider biases when conditioning on the LDV. Coupled with the fact that there is no exhaustive mediator of the treatment effect, this gives us the first result (the proof is in Online Appendix A.6).
Generalization to Causal Processes with Other Observable Variables
For process D, there is no identification strategy. To what extent will this result generalize to other nonparametric causal processes in which there are additional observable variables? Informally, the answer is that it will generalize so long as the new variables do not provide exhaustive mechanisms for any of the causal paths in process D (formally, the answer is that the result will generalize so long as there remains a hedge in the new graph, see Shpitser and Pearl 2006). When exhaustive mechanisms exist, various solutions become possible.
To see this, note that the fundamental problem for nonparametric identification in process D is that when we don’t condition on the LDV there is a spurious association through Bcon
and if we do condition on the LDV we induce spurious association through Bcoll
. Adding observable variables to process D can not change this basic dilemma so long as they don’t exhaustively mediate all of the causal connections on one of these sets of paths of spurious association.
23
We can therefore generalize Proposition 1 to the set of nonparametric causal processes, denoted DG
for general, that have graph D as a subgraph (proof in Online Appendix A.7):
Nonparametric identification through backdoor adjustment can only become less attainable in DG : for any conditioning set C 1, backdoor paths that are open in process D will also be open in DG but there may now also be additional open backdoor paths in DG .
To summarize, Theorem 1 shows that causal effects are not identified for a large class of nonparametric causal processes. Implicit to this theorem is the insight that any time series methods that can identify the effect of W in DG
must be identifying off of parametric assumptions (typically, additive effects in a linear link function) or other assumptions not made in DG
. For example, as noted by Beck and Katz (2011, 339), Hamilton (1994:226) shows how in a linear version of D, in which e′ consists of autoregressive AR (1) disturbances, the effect of W can be identified using an ADL(2,1) model (autoregressive distributed lag model of order 2 in autoregression and order 1 in distributed lags). That is, the analyst estimates
Methodological texts typically focus on special cases of DG , often parametric special cases, in which identification is possible. However, such a focus may mislead scholars by understating the difficulty of eliminating biases. Social phenomena are complex and our knowledge of them limited. Even when we restrict ourselves to causal processes that are not fundamentally confounded (A 3), the causal process may still not permit causal identification, no matter what conditioning strategy we use. When not conditioning on LDVs, we will have bias from Bcon ; when conditioning on LDVs, we will have bias from Bcoll . In such circumstances, a pragmatic way forward will be to evaluate our estimators under different presumed causal processes (e.g., Beck and Katz 1995, 1996; Bertrand, Duflo, and Mullainathan 2002; Box-Steffensmeir, Boef, and Joyce 2007; Franzese and Hays 2007; Freedman 2005; Glynn and Quinn 2013; Keele 2010; Keele and Kelly 2006) to sign and bound these biases (Blackwell 2014; Rosenbaum 2002) and qualify our inferences appropriately.
If practitioners do move to a special case of DG in which causal identification is possible, the move should be done with caution, detailing, and justifying the additional assumptions employed for causal identification. To assist scholars in understanding the assumptions underlying claims to causal identification, the following section will systematically state the minimal nonparametric assumptions needed for identification in process D.
Assumptions for Identification
Nonparametric identification in process D, and therefore also DG , is impossible because of the two sources of bias, Bcoll and Bcon . Nonparametric identification (through backdoor adjustment) requires that one of these backdoor paths be absent. This section will introduce the assumptions we can make on process D and will offer some substantive discussion to illustrate how scholars might want to evaluate their plausibility. This section will then systematically summarize the minimal nonparametric conditions 25 that must be assumed for process D (and DG ) to rule out these backdoor paths so as to permit identification.
Discussion of Identification Conditions
To identify the causal effect (via backdoor adjustment), we need to assume away some of the causal connections (edges) so as to break one of the backdoor paths. There are four sets of causal effects (edges) in process D that we could assume away. I discuss these roughly in order of my assessment of their plausibility for causal processes of interest to social scientists. I will also not evaluate whether these conditions are strictly true, since for nonexperimental social phenomena it is hard to rule out any causal connection. Instead I will evaluate whether the conditions could be approximately true, so that any net effects are likely to be small, as smaller effects will typically lead to smaller biases.
To illustrate how these conditions can be evaluated I will discuss them in the context of the study of the democratic peace (Dafoe 2011; Hayes 2012; Russett 1993). The causal proposition at the heart of the democratic peace is that some set of characteristics of democratic regimes promotes peace amongst democracies (or inversely, some characteristics of autocratic regimes promote conflict against democracies). Evaluating these conditions is not an easy task as it requires assessing the plausibility of several specific kinds of causal connections. Accordingly, the following discussion is merely an illustration of how scholars can approach the task: what an argument for or against a condition would look like. Future research could devote more concentrated efforts into evaluating each of these conditions for specific fields. Online Appendix A.9 also briefly considers two other examples.
(A4) No “reverse causation”:
(A 4) No reverse causation implies that the (lagged) outcome (e.g., whether a war occurred last year) does not have an effect on the causal factor of interest (whether the government is democratic). The red dashed edge in the accompanying figure depicts the effect that is assumed to be absent.
To argue that this condition is likely to be false requires arguing that reverse causation is likely: that war occurrence affects regime type. To make such an argument, we could appeal to examples where the occurrence of war had a profound effect on regime type. For example, after World War II, the United States and Soviet Union each promoted democratic and communist governments in their respective spheres of influence. We could also appeal to scholarship that has investigated this specific causal connection. For example, Thompson (1996) argues for reverse causation: zones of peace promote democracy. However, interpreting claims about causal connections are subtle because different causal connections often look similar to each other. Thompson’s argument illustrates this, since Thompson (1996) also invokes “aspirations to regional hegemony” and “domestic concentrations of economic and political power” as common causes of regional conflict and autocracy, which would be better characterized as common causes of W and Y′ (Y′ ← c′ → W) not as Y′ → W.
To argue that condition A 4 is likely to be true requires arguing that reverse causation is unlikely: that war occurrence does not have systematic effects on regime type. To do so, we could argue that political institutions are deeply rooted and highly persistent (Acemoglu et al. 2008), immune to all but the most penetrating conflicts. Thus, we could say that A 4 is more likely to be true for the study of smaller level conflicts, like militarized interstate disputes, than for the study of large wars. In response to the World War II example, we could argue that the effects seem as likely to be positive (more democratic) as negative (less democratic), so that the average effect is approximately zero. We could also invoke other theory and research that finds no evidence of an effect of war occurrence on regime type (Mousseau and Shi 1999).
In my assessment, (A 4) no reverse causation seems to be approximately true for the democratic peace, at least as applied to low level conflict events.
(A5) No outcome autocausation (not mediated by treatment):
A 5 assumes that the realization of the outcome (war this year) does not have a causal effect on the outcome in future time periods (war in the future), other than through treatment (democracy).
We could argue that this condition is likely to be false for the democratic peace by pointing out the various ways that war occurrence makes war more (or less) likely in the future. The occurrence of war could make war more likely through the hardening of hatred between peoples, the strengthening of hawkish domestic coalitions, and the centralization of authority. War could make future war less likely through the depletion of military and industrial resources, the exhaustion of resolve, the revelation of information, or the resolution of disputes.
We could argue that this condition is likely to be (approximately) true by showing how these purported mechanisms are weak, or that the positive and negative effects balance so that the average effect is close to zero. Note that outcome autocausation can not be identified using autocorrelation in the outcome unless we believe we have controlled for all persistent causes of the outcome (all e′).
In my assessment no outcome autocausation (A 5) is somewhat plausible for the democratic peace.
(A6) No unobserved common causes of W and Y′
OR
. A
6 assumes that there are no unobserved factors that affect both W (democracy) and Y′ (lagged peace).
Whereas A 4 and A 5 involved assuming no causal effect between known and observed factors, A 6 and A 7 will require making assumptions about the absence of causal effects on a subset of all other factors, known and unknown. For this reason A 6 and A 7 are much harder to empirically corroborate; they cannot, even in principle, be definitively evaluated through experiments, and our ability to interrogate them is constrained by our imagination about possible causes.
A 6 (W and Y′ do not have any common unobserved causes) is very similar to the condition of no confounding (W and Y do not have any common unobserved causes). As with the proposition of no confounding, we can affirm A 6 if we have confidence that treatment is as if randomly assigned, independently across time, which implies A 6 and A 4. 26
We could argue that A 6 is approximately true for the democratic peace in a similar manner as we would argue that democracy is not fundamentally confounded with peace. We could argue that we have adequately conditioned on the most important common causes of democracy and (lagged) peace. We could diagnose this claim through placebo tests, for example, by looking to see whether some pretreatment variable that doesn’t affect democracy is uncorrelated with democracy. We could argue that A 6 is false by pointing out important common causes that are not adequately controlled and through failed placebo tests. In my assessment, A 6 is not approximately true for the democratic peace: There are many common causes of democracy and lagged peace that have not been (and cannot be) adequately conditioned away (for a descriptive perspective on the democratic peace, see Dafoe 2011; Dafoe, Oneal, and Russett 2013).
Unlike A
4 and A
5, A
6 can be made true by improved theory and empirics. Specifically, if we know all of the common causes of W and Y′ and have sufficiently good measures of them, then we can make A
6 true by conditioning on them. In addition, if we can find exhaustive isolated mechanisms for the causal effect of any particular c′ on Y′ or c′ on W then we can condition on them to block this source of bias. Graphically, to be an exhaustive mechanism requires that on every path
there is an observed mediator. To be an isolated mechanism requires that this mediator not be a collider on another backdoor path (e.g., it is not a consequence of e′).
27
One especially promising mechanism is W′, the lagged treatment, because it is often plausible that much of the dependence between W and Y′ flows through the dependence between W and W′. For example, suppose
(A7) No unobserved persistent causes of the outcome
∀e′, either
OR
. A
7 assumes that there are no unobserved factors that affect both Y′ (lagged peace) and Y (peace).
Like A 6, A 7 is a very strong assumption since it assumes away a constellation of causal effects for all other known and unknown factors, empirical interrogation of A 7 is limited by our imagination and patience, and A 7 cannot be experimentally verified or disproven, even in principle. A 7 is especially implausible since we can be confident that we have not observed all causes of the outcome, and causes of the outcome, like most phenomena, are usually temporally persistent. For the democratic peace, we could argue against A 7 by pointing out the many temporally persistent causes of peace and war that cannot be adequately controlled for: unresolved disputes over territory, proximity, enduring rivalries, deep-rooted amity or enmity, historical grievances, and geopolitical insecurity. A 7 becomes more plausible to the extent that we think we are aware of and can adequately measure the most important persistent causes of the outcome. I regard A 7 as false for the democratic peace.
Minimal Identification Assumptions
The following theorem will now state the minimal 29 assumptions that we need to make about process D for identification. These sets of assumptions will correspond to the maximal 30 subgraphs of process D that permit identification for a particular conditioning set. These assumptions achieve identification by eliminating either Bcoll or Bcon .
Possible nonparametric assumptions on Process D are:
(A
4) No “Reverse Causation”:
(A
5) No Outcome Autocausation (not Mediated through Treatment):
(A
6) No Unobserved Common Causes of W and Y′ : ∀c′, either
OR
.
(A
7) No Unobserved Persistent Causes of the Outcome:
OR
.
The maximal subgraphs of Process D that permit identification for a particular conditioning set are:
By eliminating Bcoll
:
By eliminating Bcon
:
Given temporal dependence, Bcon simplifies to:
When Bcoll
are absent, nonparametric identification is possible with conditioning set
Recall that
. Therefore, Bcoll
will be eliminated if we assume: A
6 OR A
7.
Recall that Bcon = {BRC 1, BRC 2, BAC }; BRC 1 = {W ←Y′ →Y}. BRC 2 = {W ←Y′ ←e′→Y}; BAC = {W ←c′ →Y′→Y}. Therefore, Bcon will be eliminated if we assume: (A 4 OR A 5) AND (A 4 OR A 7) AND (A 5 OR A 6)
), or believe it is present, we can simplify the above condition to:
The causal processes implied by these assumptions can be represented by the following graphs, denoted by their assumptions in subscripts as follows (∧ denotes the AND logical operators, ∨ the OR logical operator):

The three maximal subgraphs of D that permit identification. Subscript denotes the (positive) assumptions that define the graph.
Observational Indistinguishability
Suppose that we are willing to assume that we are studying one of the causal processes that permit identification, but we are not sure which. Formally, we assume
While diagnostics often are available in the parametric settings (e.g., De Boef and Keele 2008), in this nonparametric setting and absent additional assumptions there are no empirical implications that can be used to determine the underlying causal process. The set of empirical predictions of causal processes
Generalizing to Process DG
These identification assumptions for process D have a natural generalization to process DG . The above showed that the minimal nonparametric assumptions sufficient for identification in process D are AI . Similarly, any set of nonparametric assumptions sufficient for identification in process DG must include AI . AI are necessary, but possibly not sufficient, assumptions for identification in process DG . This follows from the fact that graph DG , being a supergraph of graph D, will contain all the backdoor paths in graph D, and possibly more. Therefore, any set of sufficient identifying assumptions for process DG must include those necessary to rule out the backdoor paths in process D.
Conclusion
The results of this article are summarized in the Venn diagram in Figure 5. For process D, depicted by the rectangle, there are two biasing paths Bcoll
and Bcon
that make nonparametric identification impossible. If we are willing to assume

Venn diagram of the set of nonparametric (temporally dependent) causal processes based on process D, demarcated by identifying assumptions.
A number of insights can be drawn from these results. (1) Most fundamentally, these results affirm the truth of the proposition, for this context, that causal inference depends on causal assumptions. In the setting examined here, without additional causal assumptions we are in the space of processes D in which nonparametric identification is not possible. Estimators that don’t condition on the LDV will suffer from Bcon
biases, and estimators that do condition on the LDV will suffer from Bcoll
biases. (2) There are multiple identification strategies. These rely on different causal assumptions, and they imply different estimators, some of which include LDVs and some of which exclude them. There is a conventional wisdom that it is always a good idea for reducing bias to include an LDV in your estimator, especially in the presence of temporal dependence. These results show that this is only true (nonparametrically), if we are willing to rule out Bcoll
by assuming
We can also simplify our problem if we can make, or rule out, certain assumptions. (4) For example, assume A 4 no reverse causation. Our problem then simplifies to determining which is a more likely source of temporal dependence: outcome autocausation or unobserved persistent causes of the outcome. If the temporal dependence comes mostly from outcome autocausation, then we should condition on LDVs; if mostly from temporally persistent causes of the outcome, then we should not condition on LDVs. The logic for this rule is described in Online Appendix A.10. (5) Alternatively, for many areas of study it is not plausible that we could measure all persistent causes of the outcome. Thus, we would reject A 7. We then see that an LDV is only justified if we think there are no unobserved causes of W and Y′ (A 6), whereas no LDV is preferred if we think there is no reverse causation and no outcome autocausation (A 4 and A 5).
We can apply these lessons for nonparametric identification to the democratic peace. In my assessment: A 4 is plausible, A 5 is somewhat plausible, A 6 is not plausible (though maybe with W′), and A 7 is not plausible. Accordingly, this recommends not conditioning on the LDV (contrary almost all work on this subject), or, if there is still sufficient variation after conditioning on lagged democracy, conditioning on both the LDV and lagged democracy. For other questions, different conditioning strategies will be appropriate. Online Appendix A.9 considers the plausibility of these assumptions for the study of the resource curse (government rents from natural resources promote authoritarianism, e.g., Haber and Menaldo, 2011) and the economic determinants of vote share for a presidential incumbent (gross domestic product promotes incumbent vote share, e.g., Lewis-Beck and Stegmaier, 2000).
In my assessment of the resource curse, A 4 is plausible, A 5 is not plausible, A 6 is somewhat plausible (and especially with W′), and A 7 is not plausible. This recommends conditioning on the LDV, probably also with lagged treatment. For the economic determinants of incumbent vote share, A 4 and A 5 are plausible, A 6 is probably not plausible, and A 7 is not plausible. This recommends not conditioning on the LDV. Of course, in each of these applications there are reasonable alternative interpretations of the plausibility of the assumptions. What is important is that scholars articulate and justify the assumptions underlying their inference.
Temporal dependence is a sign that our causal estimates may be biased. There is no purely technical fix for these biases. Any fix depends on causal assumptions, and if the wrong fix is used, biases can be introduced where there were none to begin with. To provide more informed causal inferences, scholars should articulate and defend their causal identification assumptions. This article outlined what these causal assumptions are for nonparametric identification under temporal dependence; parametric identification assumptions are often similar, though they obviously also typically depend on assumptions about functional form. Given that social scientists rarely have confident causal knowledge about the processes we study, we should continue to evaluate the robustness of our estimators to violations of their assumptions. By doing so, we will be better able to recognize when our inferences rely on certain causal assumptions, to direct research toward investigation of those crucial assumptions, and to accurately appraise the confidence of our inferences.
Footnotes
A. Appendix for “Nonparametric Identification of Causal Effects Under Temporal Dependence.”
Acknowledgment
For helpful comments, I am grateful to Chris Achen, Peter Aronow, Larry Bartels, Neal Beck, Scott Bennett, Henry Brady, Giacomo Chiozza, Thad Dunning, Robert Franzese, Kristian Gleditsch, Sophia Hatz, John Henderson, Greg Huber, Kosuke Imai, Xiaojun Li, Daniel Masterson, Will Moore, Betsy Ogburn, Judea Pearl, Paul Poast, Jonathan Renshon, Heiner Schulz, Ilya Shpitser, Jasjeet Sekhon, James Stimson, Laura Stoker, Michael Tomz, Guadalupe Tunon, Jiahua Yue, Baobao Zhang, Magnus Überg, and especially David Freedman.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplementary material for this article is available online.
