Path Analysis for Binary Random Variables

Abstract

The decomposition of the overall effect of a treatment into direct and indirect effects is here investigated with reference to a recursive system of binary random variables. We show how, for the single mediator context, the marginal effect measured on the log odds scale can be written as the sum of the indirect and direct effects plus a residual term that vanishes under some specific conditions. We then extend our definitions to situations involving multiple mediators and address research questions concerning the decomposition of the total effect when some mediators on the pathway from the treatment to the outcome are marginalized over. Connections to the counterfactual definitions of the effects are also made. Data coming from an encouragement design on students’ attitude to visit museums in Florence, Italy, are reanalyzed. The estimates of the defined quantities are reported together with their standard errors to compute p values and form confidence intervals.

Keywords

directed acyclic graph logistic regression recursive system effect decomposition multiple mediators

Introduction

The decomposition of the total effect of a treatment X on an outcome variable Y into direct and indirect effects is a central topic in empirical research. In linear models, the relationship between total, direct, and indirect effects is well understood (Alwin and Hauser 1975; Bollen 1987; Baron and Kenny 1986; Cochran 1938) and a simple decomposition is available. Such a decomposition is based on the linearity of the marginal model for Y against X, where the coefficient of X is equal to the sum of the direct effect and the indirect effects. Outside the linear case, this simplicity is lost, as in the marginal model of Y against X only, either the effect of X on Y is a complex function of the original parameters or the error term does not possess nice properties or both.

We here consider situations where the outcome Y is a binary random variable. Contributions have addressed the case of one continuous mediator (see, e.g., Breen, Karlson, and Holm 2013, 2018; Karlson, Holm, and Breen 2012; MacKinnon et al. 2007). Recent results concern the exact parametric form of the marginal effect of X on Y on the log odds ratio scale when the mediator is also binary (Stanghellini and Doretti 2019). In this setting, when X is continuous the marginal model of Y against X is nonlinear unless some conditional independence assumptions hold, and a rather complex formula links the marginal and conditional effect of X on Y. Similarly, for a discrete X, the parameters of the conditional model combine in a nonlinear fashion to form the marginal effect. For analogous results on the log relative risk scale, see Lupparelli (2019).

Starting from the results in Stanghellini and Doretti (2019), we here elaborate a novel proposal for the direct and indirect effect definitions on the log odds scale for a treatment variable X either continuous or discrete. The postulated system can be represented by a directed acyclic graph (DAG); see Lauritzen (1996, chap. 2), to which we refer for definitions, see also Elwert (2013) for an account in the sociological context. Our proposal is based on zeroing the path-specific regression coefficients. Graphically, this corresponds to deleting one arrow in the associated DAG and thereby represents the analogue of the path analysis method.

We initially focus on a single-mediator context and show that the marginal effect can be written as the sum of the indirect and direct effects plus a residual term that vanishes under some specific conditions. The proposed parametric relationship allows, for the specific setting under investigation, to solve the debate on which method should be used to disentangle the total effect, that is, the product method or the difference method (Breen et al. 2018). It also avoids fitting two nested models, thereby sidestepping the issue of unequal variance (Winship and Mare 1983). We then extend our derivations to the case of multiple binary mediators, also modeled as a recursive system of univariate logistic regressions. In this context, additional path-specific effects can be defined and different research questions addressed. Although the paper draws from the derivation in Stanghellini and Doretti (2019), some novel results are also presented. With reference to a single mediator, a general formulation of functional form linking the log odds of the mediator and of the outcome to the covariates is considered. This is then extended to the multiple mediator context, for which a strategy for deriving the direct and indirect effects when marginalizing over an intermediate or outer mediator is also illustrated.

Our approach is developed in a purely associational context that, in general, holds no interpretation for causal inference. However, if the recursive system of equation is structural (Pearl 2009, chap. 7) and no unmeasured confounders exist, the total effect and some of its components can be endowed with a causal interpretation. Notice that a decomposition of the total effect based on counterfactual entities has been given by Pearl (2001, 2012) and extended to the odds ratio scale by VanderWeele and Vansteelandt (2010). This parallelism is also addressed in this article.

In the second section, we offer the general theory for the case of a single mediator. A case study concerning a randomized encouragement experiment on cultural consumption performed in Florence (Italy) is also presented as a guiding example. Results of a simulation study investigating maximum likelihood (ML) estimation of the effects and other related measures is also reported in the third section, while the extension to the multiple mediator setting is contained in the fourth section. In the fifth section, we address other complex issues concerning path-specific effects, whereas links with counterfactual definitions are explored in the sixth section. Finally, in the seventh section, we draw some conclusions.

Figure 1.

Data generating process when (A) no conditional independences hold, (B) $X ╨ Y | W$ , (C) $W ╨ Y | X$ , and (D) $W ╨ X$ .

Effect Decomposition With a Single Mediator

We first focus on a very simple model for a binary outcome Y, a binary mediator W, and a treatment X, that can be either discrete or continuous (see Figure 1A for the corresponding DAG). Our aim is to decompose the total effect of X on Y on the log odds scale. Our postulated models are a logistic regression for Y given X and W and for W given X, that is,

log \frac{P (Y = 1 | X = x, W = w)}{P (Y = 0 | X = x, W = w)} = β_{0} + β_{x} x + β_{w} w + β_{x w} x w,

and

log \frac{P (W = 1 | X = x)}{P (W = 0 | X = x)} = γ_{0} + γ_{x} x .

Notice that we allow for the interaction between X and W in the outcome equation. In order to make the paper self-contained, the derivations in Stanghellini and Doretti (2019) to evaluate the total effect as a function of the parameters of models (1) and (2) are here reproduced, prior the introduction of its decomposition into direct, indirect, and residual effects.

As a guiding example through this section, we consider an experiment aiming at identifying the best incentives to offer high school students in Florence to enhance cultural interest and increase art museum attendance. Three treatment levels are considered: A flyer given to the students with the main information about the Palazzo Vecchio museum constitutes the first level; a flyer and a presentation of the museum from an expert constitute the second level; a flyer, the presentation, and a reward in the form of extra-credit points for their final school grade constitute the third level. All students receive a free entry ticket to Palazzo Vecchio. The aim of the experiment is not only to assess the total effect of the treatment (X) on students museum’s attendance (Y) but also to understand to what extent this effect could be stimulated by student’s visit to Palazzo Vecchio (W); see Forastiere et al. (2019); Lattarulo, Mariani, and Razzolini (2017).

The interest is in the marginal model of Y against X, as a function of the parameters in equations (1) and (2). From first principles of probabilities, it follows that:

log \frac{P (Y = 1 | X = x)}{P (Y = 0 | X = x)} = - log \frac{P (W = w | Y = 1, X = x)}{P (W = w | Y = 0, X = x)} + log \frac{P (Y = 1 | W = w, X = x)}{P (Y = 0 | W = w, X = x)} .

The second term of the righthand side (RHS) of the above equality is given from model (1), while the parametric expression of the first term is not immediately derived from models (1) and (2), as it involves the probability of W after conditioning on X and Y (not after conditioning on X only). However, by repeated use of the previous relationship, we have:

log \frac{P (W = 1 | Y = y, X = x)}{P (W = 0 | Y = y, X = x)} = log \frac{P (Y = y | W = 1, X = x)}{P (Y = y | W = 0, X = x)} + log \frac{P (W = 1 | X = x)}{P (W = 0 | X = x)} .

Using equations (1) and (2), after some simplifications, we find:

log \frac{P (W = 1 | Y = y, X = x)}{P (W = 0 | Y = y, X = x)} = y (β_{w} + β_{x w} x) + log \frac{1 + exp (β_{0} + β_{x} x)}{1 + exp (β_{0} + β_{x} x + β_{w} + β_{x w} x)} + γ_{0} + γ_{x} x .

In what follows, we denote with $g_{y} (x)$ the $log \frac{P (W = 1 | Y = y, X = x)}{P (W = 0 | Y = y, X = x)}$ , that is,

\begin{array}{l} g_{y} (x) & = log \frac{P (W = 1 | Y = y, X = x)}{P (W = 0 | Y = y, X = x)} \\ = y (β_{w} + β_{x w} x) + log \frac{1 + exp (β_{0} + β_{x} x)}{1 + exp (β_{0} + β_{x} x + β_{w} + β_{x w} x)} + γ_{0} + γ_{x} x . \end{array}

Since ${(1 + exp g_{y} (x))}^{- 1}$ corresponds to $P (W = 0 | Y = y, X = x)$ , substituting in equation (3) for $w = 0$ , we find:

log \frac{P (Y = 1 | X = x)}{P (Y = 0 | X = x)} = log \frac{1 + exp g_{1} (x)}{1 + exp g_{0} (x)} + β_{0} + β_{x} x .

Denoting with

{RR}_{W | Y, X = x} = \frac{P (W = 1 | Y = 1, X = x)}{P (W = 1 | Y = 0, X = x)},

the relative risk of W for varying Y in the distribution of $X = x$ , we have

{RR}_{W | Y, X = x} = \frac{exp g_{1} (x) {1 + exp g_{0} (x)}}{exp g_{0} (x) {1 + exp g_{1} (x)}} .

Analogously, letting $\bar{W} = 1 - W$ , then

{RR}_{\bar{W} | Y, X = x} = \frac{1 + exp g_{0} (x)}{1 + exp g_{1} (x)} .

It then follows that equation (6) can be rewritten as

log \frac{P (Y = 1 | X = x)}{P (Y = 0 | X = x)} = β_{0} + β_{x} x - log {RR}_{\bar{W} | Y, X = x},

or, alternatively, as

log \frac{P (Y = 1 | X = x)}{P (Y = 0 | X = x)} = β_{0} + β_{x} x + β_{w} + β_{x w} x - log {RR}_{W | Y, X = x} .

Notice that conditioning on a set of covariates $C = (C_{1}, \dots C_{p})$ does not strongly alter the structure of equations (6) and (8). We here offer an example for $p = 1$ , with both additive and interaction effects up to the second order in both models for Y and W. After the marginalization over W, we obtain the marginal model for Y given X and C as:

log \frac{P (Y = 1 | X = x, C = c)}{P (Y = 0 | X = x, C = c)} = β_{0} + β_{x} x + β_{c} c + β_{x c} x c - log {RR}_{\bar{W} | Y, X = x, C = c}

with

{RR}_{\bar{W} | Y, X = x, C = c} = \frac{1 + exp g_{0} (x, c)}{1 + exp g_{1} (x, c)},

and

\begin{array}{l} g_{y} (x, c) & = y (β_{w} + β_{x w} x + β_{c w} c) \\ + log \frac{1 + exp (β_{0} + β_{x} x + β_{c} c + β_{x c} x c)}{1 + exp (β_{0} + β_{x} x + β_{c} c + β_{w} + β_{x w} x + β_{x c} x c + β_{c w} c)} \\ + γ_{0} + γ_{x} x + γ_{c} c + γ_{x c} x c . \end{array}

See Online Appendix A (Supplementary material for this article is available online) for a general formulation that includes more covariates and possibly nonlinear link functions.

With reference to the cultural consumption data, 15 classes for a total of 294 students, all aged between 15 and 18, from three different schools, were randomly assigned at baseline (March/April 2014) to the three treatment levels (X). At the second occasion, after two months, researchers collected the entry tickets to record student visits (W). Finally, after six months, they collected the concluding questionnaire with general information on visits to other museums (Y). A questionnaire with information on background characteristics of the students and their families was also administered. Among all the covariates, only one appears to be relevant in the model for the outcome, that is, the binary variable C taking value 1 for students considering themselves mainly interested in mathematics/science and 0 if they are mainly interested in humanities. At follow-up, 28 students were absent, so the final sample included 266 students. Data are reported in Table 1 and are publicly available at https://www-tandfonline-com-s.web.bisu.edu.cn/doi/abs/10.1080/07350015.2019.1647843 as supplementary material of Forastiere et al. (2019).

Table 1.

Contingency Tables for ( $Y, X, W, C$ ) for the Cultural Consumption Experiment.

		Y				Y
$W = 0$	$C = 0$	0	1	$W = 0$	$C = 1$	0	1
X	1	19	2	X	1	48	17
	2	14	21		2	14	28
	3	3	3		3	23	21
		Y				Y
$W = 1$	$C = 0$	0	1	$W = 1$	$C = 1$	0	1
X	1	0	0	X	1	1	2
	2	1	0		2	6	3
	3	1	9		3	19	11

Table 2 contains the output of the ML estimation of the logistic regression models for the outcome and for the mediator. We use the subscript ${2, 1}$ to denote the contrast of level 2 (Flyer + Presentation) versus level 1 (Flyer) and ${3, 1}$ for the contrast of level 3 (Flyer + Presentation + Reward) versus level 1. Notice that the interaction terms $β_{x_{{2, 1}} w}$ and $β_{x_{{3, 1}} w}$ in the outcome equation are significant.

Table 2.

Maximum Likelihood Estimates of the Two Logistic Models for Y and W for the Cultural Consumption Experiment.

Parameter	Estimate	Standard Error	95 Percent Confidence Interval		p Value
	$Y \sim β_{0} + β_{x} X + β_{c} C + β_{w} W + β_{c w} C W$
$β_{0}$	−1.6186	0.3857	−2.3746	−0.8626	.0000
$β_{x_{{2, 1}}}$	1.9345	0.3676	1.2139	2.6550	.0000
$β_{x_{{3, 1}}}$	1.1329	0.3865	0.3754	1.8904	.0034
$β_{c}$	0.4597	0.3540	−0.2342	1.1536	.1941
$β_{w}$	4.3290	1.5427	1.3053	7.3527	.0050
$β_{x_{{2, 1}} w}$	−3.7077	1.4725	−6.5937	−0.8217	.0118
$β_{x_{{3, 1}} w}$	−2.2708	1.3365	−4.8903	0.3488	.0893
$β_{c w}$	−2.4770	0.9255	−4.2910	−0.6630	.0074
$W \sim γ_{0} + γ_{x} X$
$γ_{0}$	−3.3557	0.5873	−4.5069	−2.2046	.0000
$γ_{x_{{2, 1}}}$	1.3145	0.6767	−0.0118	2.6409	.0521
$γ_{x_{{3, 1}}}$	3.1326	0.6245	1.9087	4.3565	.0000

Ignoring for now the sampling errors, we see that the $g_{y} (x, c)$ function can be formed by plugging in equation (10) the ML estimates of the parameters. The function expresses the log odds ratio of W after conditioning on X, Y and the covariate C.

We now present a definition for the total, direct, and indirect effects in the situation with no covariates. When covariates C are present, the parametric formula of $TE (x)$ and its decomposition, for both continuous and discrete X, vary with the level of C. Notice that the direct and indirect effects so defined do not sum to the total effect, but a residual term remains. This term is zero only under some specific conditions that we are going to discuss.

Total effect: Let $TE (x)$ be the effect of X on Y, on the log odds scale, in the distribution of $(Y | X)$ obtained after marginalization on W. For X continuous and differentiable, the total effect is defined as the derivative of equation (6) with respect to x. For X discrete, the total effect is defined as the difference between equation (6) evaluated at two different levels of X.

Indirect effect: Let $IE (x)$ be the indirect effect of X on Y on the log odds scale. The indirect effect is defined as the part of the total effect of X on Y through W only. It is evaluated after imposing, in the total effect, the coefficients of X in the model for Y equal to zero, that is, $β_{x} = β_{x w} = 0$ , that is, $IE (x) = TE (x) |_{β_{x} = β_{x w} = 0}$ , so that $X ╨ Y | W$ and the effect of X on Y is mediated by W (see Figure 1B).

Direct effect: Let $DE (x)$ be the direct effect of X on Y on the log odds scale. The direct effect is defined as the part of the total effect due to X only. It is obtained after imposing, in the total effect, the coefficients of W in the model for Y equal to zero, that is, $β_{w} = β_{x w} = 0$ so that $W ╨ Y | X$ (see Figure 1C). In other words, we have $DE (x) = TE (x) |_{β_{w} = β_{x w} = 0}$ . This definition is aligned with the collapsibility of odds ratio, as explained by Xie, Ma, and Geng (2008, Corollary 3). It is important to notice that the direct effect can also be seen as the effect of X on Y keeping $W = 0$ .

Residual effect: Let $RES (x)$ be the residual effect of X on Y on the log odds scale defined as $TE (x) - DE (x) - IE (x)$ . Clearly, by construction

TE (x) = DE (x) + IE (x) + RES (x) .

Notice that this residual term is always null in linear models. In this context, the total effect can be decomposed into the sum of the direct and indirect effect. Provided that these two are positive, it is therefore meaningful to look at the ratio between the indirect effect and the total effect, as it gives an indication of the proportion of total effect due to the mediator W (i.e., the proportion mediated). When a residual effect is present, the ratio between the indirect and total effect can still provide information on the weight of the indirect effect on the total effect, though with a less clear interpretation (see Continuous Case and Discrete Case subsections).

In what follows, we study in detail the decomposition of the total effect for the simple case without covariates, where X can be either continuous or discrete. Addition of covariates can be done in a straightforward manner.

Continuous Case

We first look at the case of X continuous and differentiable. Let

TE (x) = \frac{d}{d x} log \frac{P (Y = 1 | X = x)}{P (Y = 0 | X = x)} .

It is possible to show that

\begin{array}{l} TE (x) & = β_{x} {1 - Δ_{y} (x) Δ_{w} (x)} \\ + β_{x w} {P (W = 1 | Y = 1, X = x) - Δ_{w} (x) P (Y = 1 | W = 1, X = x)} \\ + γ_{x} Δ_{w} (x), \end{array}

where

\begin{array}{l} Δ_{y} (x) & = P (Y = 1 | W = 1, X = x) - P (Y = 1 | W = 0, X = x) \\ = \frac{exp (β_{0} + β_{x} x + β_{w} + β_{x w} x)}{1 + exp (β_{0} + β_{x} x + β_{w} + β_{x w} x)} - \frac{exp (β_{0} + β_{x} x)}{1 + exp (β_{0} + β_{x} x)} \end{array},

and

\begin{array}{l} Δ_{w} (x) & = P (W = 1 | Y = 1, X = x) - P (W = 1 | Y = 0, X = x) \\ = \frac{exp g_{1} (x)}{1 + exp g_{1} (x)} - \frac{exp g_{0} (x)}{1 + exp g_{0} (x)}, \end{array}

with $g_{y} (x)$ as in equation (5). Equation (12) confirms the well-known fact that the marginal logistic model is nonlinear in x, also providing the explicit expression of it. Notice that, as shown in Stanghellini and Doretti (2019), all terms in curly bracket are bounded between 0 and 1, while $Δ_{w} (x)$ is bounded between −1 and 1. Notice further that $Δ_{w} (x)$ and $Δ_{y} (x)$ share the same sign, and they are both zero whenever $W ╨ Y | X$ .

Indirect effect: Following the definition, we evaluate the total effect assuming $β_{x} = β_{x w} = 0$ , that is,

IE (x) = TE (x) |_{β_{x} = β_{x w} = 0} = γ_{x} Δ_{w}^{*} (x),

where $Δ_{w}^{*} (x)$ is $Δ_{w} (x)$ evaluated at $β_{x} = β_{x w} = 0$ . The indirect effect of X on Y through W depends on the value of x and is null if either $γ_{x}$ or $β_{w}$ are zero. It can be shown that, for all x, $Δ_{w}^{*} (x)$ and $β_{w}$ share the same sign and, therefore, the indirect effect is concordant with the $γ_{x} β_{w}$ product (see Online Appendix B, Supplementary material for this article is available online). However, the magnitude of the effect varies with x.

Direct effect: Following the definition, we evaluate the direct effect after assuming $β_{w} = β_{x w} = 0$ , that is,

\begin{array}{l} DE (x) & = TE (x) |_{β_{w} = β_{x w} = 0} = β_{x} . \end{array}

Notice that equation (14) follows as $Δ_{y} (x)$ and $Δ_{w} (x)$ are zero when $β_{w} = β_{x w} = 0$ .

Residual effect: Finally, the residual effect is given by difference as follows

\begin{array}{l} RES (x) & = TE (x) - IE (x) - DE (x) \\ = - β_{x} {Δ_{y} (x) Δ_{w} (x)} \\ + β_{x w} {P (W = 1 | Y = 1, X = x) - Δ_{w} (x) P (Y = 1 | W = 1, X = x)} \\ + γ_{x} {Δ_{w} (x) - Δ_{w}^{*} (x)} . \end{array}

It is therefore apparent that the effect above vanishes whenever $β_{x} = β_{x w} = 0$ or $β_{w} = β_{x w} = 0$ . As a matter of fact, in the former case, we have $Δ_{w} (x) = Δ_{w}^{*} (x)$ , whereas in the latter case, $Δ_{w} (x) = Δ_{y} (x) = 0$ . Notice that the latter case coincides with the condition of collapsibility of odds ratio (see Xie et al. 2008, Corollary 3). Since all terms in curly brackets are bounded, expression (15) also highlights that the sign of the residual effect depends on the relative magnitude of the coefficients and is not linked to the logistic regression coefficients in a clear way. As a matter of fact, even if the direct and indirect effects share the same sign, the sign of the residual effect may be either positive or negative. Thus, as mentioned in the second section, for a fixed level of x, the ratio between the indirect effect and the total effect may provide an indication of the relative strength of the indirect effect, though with a less clear interpretation.

Some cases of interest: Reformulating the total effect by definition as in equation (11), we study in detail the decomposition of the total effect into indirect (equation [13]), direct (equation [14]), and residual (equation [15]) effects for some cases of interest.

Case (i) When the recursive logistic models can be depicted as in Figure 1B, that is, $X ╨ Y | W$ , it follows from the definition above that

TE (x) |_{β_{x} = β_{x w} = 0} = IE (x),

and both $DE (x)$ and $RES (x)$ are zero.

Case (ii) When the recursive logistic models can be depicted as in Figure 1C, that is, $W ╨ Y | X$ , it follows from the definition above that

TE (x) |_{β_{w} = β_{x w} = 0} = DE (x),

and both $IE (x)$ and $RES (x)$ are zero.

Case (iii) A noticeable situation arises after imposing $β_{x w} = 0$ . In this case, the total effect is

TE (x) |_{β_{x w} = 0} = DE (x) + IE (x) + RES (x) |_{β_{x w} = 0},

where

RES (x) |_{β_{x w} = 0} = - β_{x} {Δ_{y} (x) Δ_{w} (x)} |_{β_{x w} = 0} + γ_{x} {Δ_{w} (x) - Δ_{w}^{*} (x)} |_{β_{x w} = 0} .

Notice that this assumption does not itself reflect into any conditional independence. If we further assume $γ_{x} = 0$ , then some simplifications arise. Thus,

TE (x) |_{β_{x w} = γ_{x} = 0} = DE (x) + RES (x) |_{β_{x w} = γ_{x} = 0},

where

RES (x) |_{β_{x w} = γ_{x} = 0} = - β_{x} {Δ_{y} (x) Δ_{w} (x)} |_{β_{x w} = γ_{x} = 0} .

It is possible to see that under this condition, $| TE (x) | \leq | β_{x} |$ , in line with results obtained by Neuhaus and Jewell (1993) in a more general context.

Case (iv) When the recursive logistic models can be depicted as in Figure 1D, that is, $W ╨ X$ , it follows from the definition above that

TE (x) |_{γ_{x} = 0} = DE (x) + RES (x) |_{γ_{x} = 0},

where

\begin{array}{l} RES (x) |_{γ_{x} = 0} & = - β_{x} Δ_{y} (x) {Δ_{w} (x)} |_{γ_{x} = 0} \\ + β_{x w} {P (W = 1 | Y = 1, X = x) - Δ_{w} (x) P (Y = 1 | W = 1, X = x)} |_{γ_{x} = 0} . \end{array}

In this case, there is an effect modification due to conditioning of an additional variable, in line with well-known results on noncollapsibility of parameters of logistic regression models (see Xie et al. 2008). In addition, we notice that even in this simple case, the linearity of X in the marginal model is lost. Furthermore, if $β_{x}$ and $β_{x w}$ are both positive (negative), the marginal effect is also positive (negative), thereby recovering the finding in Cox and Wermuth (2003) on the condition to avoid the effect reversal (i.e., the marginal and conditional effects having opposite signs).

Discrete Case

Without loss of generality, we here assume that X is binary. The total effect of X on Y can be derived by taking the first difference of, equivalently, equations (7) or (8). We here opt for differentiating equation (7). Then,

TE (x) = β_{x} + log {RR}_{\bar{W} | Y, X = 0} - log {RR}_{\bar{W} | Y, X = 1},

which explicitly becomes

TE (x) = β_{x} + log \frac{1 + exp g_{1} (1)}{1 + exp g_{0} (1)} - log \frac{1 + exp g_{1} (0)}{1 + exp g_{0} (0)},

with $g_{y} (x)$ as in equation (5). Notice that, in order to make the extension to X discrete straightforward, we maintain the x notation. Obviously, in the case of a binary X, $TE (x) = c p r (Y, X)$ , a constant term corresponding to the cross product ratio of the marginal table for Y and X.

Indirect effect: Following the definition above, we evaluate the total effect assuming $β_{x} = β_{x w} = 0$ , that is,

IE (x) = TE (x) |_{β_{x} = β_{x w} = 0} = log \frac{1 + exp g_{1}^{*} (1)}{1 + exp g_{0}^{*} (1)} - log \frac{1 + exp g_{1} (0)}{1 + exp g_{0} (0)},

where $g_{y}^{*} (x)$ is $g_{y} (x)$ evaluated at $β_{x} = β_{x w} = 0$ . Notice that $g_{y}^{*} (0) = g_{y} (0)$ , see equation (5). In parallel with the continuous case, the indirect effect is null if either $β_{w}$ or $γ_{x}$ are zero. It can be shown with some algebra that it is concordant with the product of the two coefficients.

Direct effect: Following the definition of the direct effect, we evaluate the direct effect after assuming in the total effect $β_{w} = β_{x w} = 0$ , that is,

DE (x) = TE (x) |_{β_{w} = β_{x w} = 0} = β_{x} .

Residual effect: By definition, the remaining effect is evaluated by difference, such as

RES (x) = TE (x) - IE (x) - DE (x) = log \frac{1 + exp g_{1} (1)}{1 + exp g_{0} (1)} - log \frac{1 + exp g_{1}^{*} (1)}{1 + exp g_{0}^{*} (1)},

with $g_{y}^{*} (x)$ as above. It can easily be seen that $RES (x)$ is zero as soon as $β_{x} = β_{x w} = 0$ , leading to $g_{y}^{*} (1) = g_{y} (1)$ , or $β_{w} = β_{x w} = 0$ , leading to $g_{1} (x) = g_{0} (x)$ and $g_{1}^{*} (x) = g_{0}^{*} (x)$ (see equation [5]). The latter situation coincides with the condition of collapsibility of odds ratio (see Xie et al. 2008, Corollary 3). However, like in the continuous case, there is no clear relationship between the sign of this effect and the logistic regression coefficients.

Some cases of interest: Following the definition, we reformulate the total effect of a binary X on a binary Y as in equation (11) and we study in detail the decomposition of the total effect into the indirect (equation [17]), direct (equation [18]), and residual (equation [19]) effects for some cases of interest.

Case (i) When the recursive logistic models can be depicted as in Figure 1B, that is, $X ╨ Y | W$ , it follows from the definition above that,

TE (x) |_{β_{x} = β_{x w} = 0} = IE (x),

and both $DE (x)$ and $RES (x)$ are zero.

Case (ii) When the recursive logistic models can be depicted as in Figure 1C, that is, $W ╨ Y | X$ , it follows from the definition above that

TE (x) |_{β_{w} = β_{x w} = 0} = DE (x),

as all other terms are zero.

Case (iii) After imposing $β_{x w} = 0$ , the total effect is

TE (x) |_{β_{x w} = 0} = DE (x) + IE (x) + RES (x) |_{β_{x w} = 0},

in which

RES (x) |_{β_{x w} = 0} = log \frac{1 + exp g_{1} (1)}{1 + exp g_{0} (1)} |_{β_{x w} = 0} - log \frac{1 + exp g_{1}^{*} (1)}{1 + exp g_{0}^{*} (1)} .

Notice that $g_{y}^{*} (x)$ is $g_{y} (x)$ evaluated at $β_{x} = β_{x w} = 0$ . If we further assume $γ_{x} = 0$ , after some algebra, it is possible to show that under this condition $| TE (x) | \leq | β_{x} |$ in line with results obtained by Neuhaus and Jewell (1993).

Case (iv) When the recursive logistic models can be depicted as in Figure 1D, that is, $W ╨ X$ , it follows from the case above

TE (x) |_{γ_{x} = 0} = DE (x) + RES (x) |_{γ_{x} = 0}

where

RES (x) |_{γ_{x} = 0} = log \frac{1 + exp g_{1} (1)}{1 + exp g_{0} (1)} |_{γ_{x} = 0} - log \frac{1 + exp g_{1}^{*} (1)}{1 + exp g_{0}^{*} (1)} |_{γ_{x} = 0} .

After some algebra, it is possible to show that this condition is sufficient to avoid effect reversal as proved by Cox and Wermuth (2003) in a more general context.

With reference to the cultural consumption data, in Table 3, the decomposition of the total effect of moving from level 1 to the other levels of X is reported. Notice that this decomposition is based on the estimated parameters of Table 2 and does not require to estimate the marginal model of Y against X and C only, thereby avoiding the issue of comparing parameters coming from two logistic models with unequal variance. The 95 percent confidence intervals and p-values are calculated using the approximated standard errors evaluated via the delta method (Oehlert 1992).

Table 3.

Estimates (Est.), Standard Errors (SEs), 95 Percent Confidence Intervals (CIs), and p Values of the Effects for the Cultural Consumption Experiment.

	$C = 0$					$C = 1$
Effect	Est.	SE	95 Percent CI		p Value	Est.	SE	95 Percent CI		p Value
${DE}_{{2, 1}}$	1.934	.368	1.214	2.655	.000	1.934	.368	1.214	2.655	.000
${IE}_{{2, 1}}$	0.364	.192	−0.011	0.740	.057	0.176	.139	−0.096	0.449	.205
${RES}_{{2, 1}}$	−0.476	.197	−0.862	−0.089	.016	−0.475	.227	−0.919	−0.031	.036
${TE}_{{2, 1}}$	1.822	.348	1.141	2.506	.000	1.635	.341	0.968	2.303	.000
${DE}_{{3, 1}}$	1.133	.386	0.375	1.890	.003	1.133	.386	0.375	1.890	.003
${IE}_{{3, 1}}$	1.475	.316	0.856	2.094	.000	0.795	.477	−0.141	1.731	.096
${RES}_{{3, 1}}$	−0.846	.300	−1.435	−0.257	.005	−1.057	.567	−2.168	0.054	.062
${TE}_{{3, 1}}$	1.762	.369	1.038	2.486	.000	0.871	.340	0.205	1.538	.010

In the upper part of Table 3, the decomposition for the ${2, 1}$ contrast is reported. For both levels of C, the total and direct effects are positive and statistically significant, while the indirect effect, also positive, is moderately significant in $C = 0$ (p value = .057) and nonsignificant for $C = 1$ (p value = .205). As for the ${3, 1}$ contrast, in the lower part of Table 3, we see instead that all the direct, indirect, and total effects are positive and statistically significant for both $C = 0$ and $C = 1$ (though the indirect effect for $C = 1$ is only moderately significant). Although, for each contrast, the total, direct, and indirect effects are all positive, their interpretation in terms of proportion mediated is not possible given the presence of a negative residual effect (see the second section). Such an effect is rather large in magnitude, possibly due to a large interaction coefficient (see Table 2).

In summary, the direct and total effects of moving from level 1 to level 2 of X are positive and statistically significant in all groups of students, while the indirect effect, also positive, is significant only for students mainly interested in humanities. When moving from level 1 to level 3 of X, all effects are positive and significant. We believe that this is an important message on how to design incentives to increase museums attendance of high school students which cannot be easily derived by simply looking at the estimated coefficients in Table 2.

Our results are aligned with the ones in the original studies. Lattarulo et al. (2017) estimated an average causal effect based on the mean difference and the difference in difference methods, marginally with respect to W. Instead, in the study of Forastiere et al. (2019), the authors performed a decomposition of the total effect based on counterfactual entities using the principal stratification method.

Effect decomposition on the probability scale

So far, we have considered effect decompositions operating on the logistic scale. However, sociologists and econometricians are quite often concerned with effects on the probability scale (also called partial effects; see Wooldridge 2010, chap. 15), for which effect decompositions in specific contexts, typically on the additive scale, have also been proposed (Breen et al. 2013; Karlson et al. 2012). We here denote with $η (x)$ the RHS of equation (6), that is,

\log \frac{P (Y = 1 | X = x)}{P (Y = 0 | X = x)} = η (x) = \log \frac{1 + \exp g_{1} (x)}{1 + \exp g_{0} (x)} + β_{0} + β_{x} x .

For the continuous $X$ case, the probability effect is defined as the derivative with respect to $x$ of the probability function. The total probability effect ( $T P E (x)$ ) is therefore so defined:

TPE (x) = \frac{d}{d x} P (Y = 1 | X = x) = P (Y = 1 | X = x) {1 - P (Y = 1 | X = x)} TE (x),

where $TE (x)$ corresponds to the total effect on the logistic scale as defined in equation (12). The result follows after taking the derivative of $expit {η (x)} = exp η (x) / {1 + exp η (x)}$ with respect to its argument $η (x)$ .

On the other hand, for X binary or discrete, the total effect on the probability scale can be defined by simply taking the difference across levels of X of the marginal probability. For the binary X, this becomes:

TPE (x) = P (Y = 1 | X = 1) - P (Y = 1 | X = 0) = expits η (1) - expit η (0) .

In analogy with the approach of the second section, the direct probability effect $(DPE (x))$ and indirect probability effect $(IPE (x))$ are defined by zeroing the corresponding coefficients in $TPE (x)$ . To obtain an additive decomposition, we also define the residual probability effect (RPE(x)) by difference as:

RPE (x) = TPE (x) - DPE (x) - IPE (x) .

With particular reference to the continuous case, this amounts to:

\begin{array}{l} DPE (x) & = TPE (x) |_{β_{w} = β_{x w} = 0} = expit (β_{0} + β_{x} x) {1 - expit (β_{0} + β_{x} x)} β_{x}, \\ IPE (x) & = TPE (x) |_{β_{x} = β_{x w} = 0} = expit {η^{*} (x)} [1 - expit {η^{*} (x)}] IE (x), \end{array}

where $IE (x)$ is as in equation (13) and $η^{*} (x)$ is $η (x)$ evaluated at $β_{x} = β_{x w} = 0$ . Like the effects on the logistic scale, all these probability effects are local measures, since they depend on the specific value x. Global measures can be defined by averaging the aforementioned local quantities. For instance, given a population of N units (i = 1,…, N), one could define the average total probability effect (ATPE) as:

ATPE = \frac{1}{N} \sum_{i = 1}^{N} TPE (x_{i}),

with the average direct probability effect (ADPE) and the average indirect probability effect (AIPE) defined analogously. However, these average effects should be taken with caution when there is a strong variation across values of x.

For the cultural consumption data, the effects on the probability scale are summarized in Table 4, which has the same structure of Table 3. As expected, results are in line with the ones on the log odds scale. Notice that averaging the effects may be not appropriate in applications with a strong variation across levels of x, as in this case, especially with reference to the indirect effects.

Table 4.

Estimates (Est.) of the Effects on the Probability Scale for the Cultural Consumption Experiment, with Standard Errors (SEs), 95 Percent Confidence Intervals (CIs), and p Values.

	$C = 0$					$C = 1$
Effect	Est.	SE	95 Percent CI		p Value	Est.	SE	95 Percent CI		p Value
${DPE}_{{2, 1}}$	.413	.069	.279	.547	.000	.446	.074	.301	.591	.000
${IPE}_{{2, 1}}$	.063	.031	.001	.124	.046	.035	.028	−.020	.090	.215
${RPE}_{{2, 1}}$	−.073	.032	−.135	−.010	.023	−.099	.047	−.190	−.008	.034
${TPE}_{{2, 1}}$	.403	.068	.269	.537	.000	.382	.072	.240	.523	.000
${DPE}_{{3, 1}}$	.216	.067	.084	.347	.001	.255	.082	.094	.415	.002
${IPE}_{{3, 1}}$	.317	.078	.164	.470	.000	.176	.119	−.058	.410	.141
${RPE}_{{3, 1}}$	−.144	.074	−.289	.000	.049	−.236	.154	−.539	.067	.127
${TPE}_{{3, 1}}$	.388	.091	.210	.566	.000	.194	.098	.003	.386	.046

Simulation Study

In this section, we present results of a simulation study to investigate how well the relative amount of indirect effect is recovered, also in relation with already existing methods. In particular, Karlson et al. (2012) and Breen et al. (2013) derive a decomposition of the total effect that can be applied to the case of a continuous mediator W, when the response model for Y can either be a logistic or a probit one with no interaction terms. In this context, the total effect, as measured by the marginal coefficient of X on Y, is the sum of the direct and indirect effects as in linear models. When all effects are positive, it is therefore meaningful to compute the proportion mediated as the ratio between the indirect and total effect (see Equation [21] in Breen et al. 2013). The authors present a method to sidestep the well-known issue of unequal variances, known as the Karlson Holm and Breen (KHB) method. They also propose to adapt it to the binary mediator case, by postulating a linear probability model for W.

We here postulate a logistic model for W and analyze, through simulations, the behavior in finite samples of the KHB method and of the proposed method. For X binary, we compare the KHB method with the ratio between the estimates of the effects IE(x) and TE(x) in the second section, obtained by plugging-in the ML estimates of the parameters in the corresponding expression. For X continuous, we notice that the KHB measure should be interpreted as the proportion mediated on the probability scale in the same fashion as discussed in Effect Decomposition on the Probability Scale subsection. For this reason, we proceed as follows. Given a sample of n units, an estimate of the corresponding effect is formed by averaging across units the corresponding entities. As an instance, the estimated ATPE is so formed:

\hat{ATPE} = \frac{1}{n} \sum_{i = 1}^{n} \hat{TPE} (x_{i}),

where $\hat{TPE} (x_{i})$ is the estimated total probability effect of unit i, obtained by plugging-in the ML estimates in the corresponding expression. The estimated $AIPE$ is formed accordingly and the ratio between the two entities is then taken. We recall from the second section that, due to the presence of the residual effect, this measure should not be interpreted as proportion mediated in the usual way. The above measure is then compared with the KHB measure.

We consider a basic setting with no covariates, where the outcome Y and the mediator W are generated according to Equations (1) and (2) respectively. Though our method can accommodate for the treatment–mediator interaction $β_{x w}$ in the outcome equation, for a fair comparison, it is here posed to zero. The remaining parameters are set to: $β_{0} = - 2 = γ_{0}$ and $β_{w} = 2 = γ_{x}$ , while $β_{x}$ is varying in ${0.4, 0.9, 1.8}$ in order to explore different relative amounts of indirect effect.

We define three sample sizes, that is, $n \in {250, 500, 1000}$ . In the binary treatment case, the X variate is sampled from a Bernoulli distribution with probability equal to 0.5. In the continuous treatment case, we first generate a large pseudo-population of size $N_{pop} = 150, 000$ from a Normal distribution with null mean and variance equal to 2 and then create X by extracting a random sample of size n from it. In this way, the true value of the $AIPE / ATPE$ ratio is computed on the pseudo-population and does not vary with the sample size. Once X is obtained, for each n, $N = 2, 000$ replications of $(W, Y)$ are drawn and estimation is performed.

Table 5 summarizes the simulation results. Notice that for X binary, the dependence of $IE$ and $TE$ on x is removed. As expected, the proposed estimators always approach the true values as the sample size grows, with smaller root mean squared error (RMSE) in all scenarios considered. Conversely, the KHB estimator is biased, with RMSE increasing with the value of $β_{x}$ . Furthermore, if for the binary case the bias seems reasonable, it becomes consistent in the continuous case.

Table 5.

True Value and Simulation Average, Variance, and Root Mean Squared Error (RMSE) for the KHB Method and the Proposed Method (RSD).

n			250	500	1,000	250	500	1,000	250	500	1,000
$β_{x}$	True Value	Method	Average			Variance			RMSE
	$IE / TE$ (X binary)
0.4	.716	KHB	.732	.683	.669	.144	.033	.014	.379	.184	.129
0.4	.716	RSD	.757	.732	.724	.064	.024	.011	.256	.154	.107
0.9	.532	KHB	.475	.468	.462	.020	.008	.004	.153	.111	.092
0.9	.532	RSD	.544	.539	.535	.020	.009	.004	.141	.094	.064
1.8	.364	KHB	.301	.300	.297	.004	.002	.001	.092	.079	.074
		RSD	.367	.367	.364	.007	.003	.002	.082	.058	.040
	$AIPE / ATPE$ (X continuous)
0.4	.590	KHB	.531	.521	.513	.028	.013	.006	.178	.133	.108
0.4	.590	RSD	.561	.589	.580	.010	.004	.002	.103	.064	.042
0.9	.437	KHB	.316	.317	.310	.008	.004	.002	.149	.135	.134
0.9	.437	RSD	.411	.458	.440	.002	.002	.001	.055	.046	.026
1.8	.351	KHB	.180	.178	.180	.002	.001	.001	.178	.177	.173
1.8	.351	RSD	.328	.343	.352	.001	.001	.000	.042	.026	.017

Note: AIPE = average indirect probability effect; ATPE = average total probability effect; RSD = Raggi Stanghellini Doretti.

Extension to Multiple Binary Mediators

The proposed definitions of direct, indirect, and residual effects, together with their parametric formulations, extend nicely to the situation where multiple binary mediators are present. Suppose there are k mediators and that a full ordering among the variables $(Y, W_{1}, W_{k - 1}, \dots, W_{k}, X)$ is available such that each variable is a potential response variable for the subsequent ones. The system can be represented via a DAG (see Figure 2). We assume that each response model is a hierarchical logistic model. In the following, if we impose the regression coefficient of one covariate to be zero, all higher order interaction terms involving this covariate are implicitly imposed to zero. For brevity, we denote with $W_{> j}$ the set of all W_r such that $r > j$ . The coefficients of X and of W_j and of their interactions in the logistic regression of Y are denoted with $β$ in a self-explaining fashion.

Figure 2.

Directed acyclic graph with k mediators.

In analogy with equation (7), the logistic model for Y given X, obtained after marginalization upon the k mediators, is

\begin{array}{l} log \frac{P (Y = 1 | X = x)}{P (Y = 0 | X = x)} & = β_{0} + β_{x} x - \sum_{j = 1}^{k} log {RR}_{{\bar{W}}_{j} | Y, X = x, W_{> j} = 0} \end{array},

where

{RR}_{{\bar{W}}_{j} | Y, X = x, W_{> j} = w_{> j}} = \frac{1 + exp g_{0}^{(w_{< j})} (x, w_{> j})}{1 + exp g_{1}^{(w_{< j})} (x, w_{> j})} .

In Online Appendix C (Supplementary material for this article is available online), the relevant expressions are given.

In line with what done for the simple case, we here offer a generalization of the definitions for the total, direct, indirect, and residual effects under the situation of k mediators.

Total effect: Let $TE (x)$ be the total effect of X on Y on the log odds scale, after marginalization on k binary mediators. For X continuous and differentiable, the total effect is defined as the derivative of equation (21) with respect to x. It follows that

\begin{array}{l} TE (x) & = β_{x} + \sum_{j = 1}^{k} \frac{exp g_{1}^{(w_{< j})} (x, w_{> j} = 0)}{1 + exp g_{1}^{(w_{< j})} (x, w_{> j} = 0)} \frac{d}{d x} g_{1}^{(w_{< j})} (x, w_{> j} = 0) \\ - \sum_{j = 1}^{k} \frac{exp g_{0}^{(w_{< j})} (x, w_{> j} = 0)}{1 + exp g_{0}^{(w_{< j})} (x, w_{> j} = 0)} \frac{d}{d x} g_{0}^{(w_{< j})} (x, w_{> j} = 0) . \end{array}

For X discrete, the total effect is defined as the difference between equation (21) evaluated at two different levels of X. Without loss of generality, we here assume X binary and take the difference for $x = 1$ and $x = 0$ . Then,

\begin{array}{l} TE (x) = β_{x} + \sum_{j = 1}^{k} log \frac{{RR}_{{\bar{W}}_{j} | Y, X = 0, W_{> j} = 0}}{{RR}_{{\bar{W}}_{j} | Y, X = 1, W_{> j} = 0}} . \end{array}

Equation (23) can also be written as follows

\begin{array}{l} TE (x) & = β_{x} + \sum_{j = 1}^{k} log \frac{1 + exp g_{1}^{(w_{< j})} (1, w_{> j} = 0)}{1 + exp g_{0}^{(w_{< j})} (1, w_{> j} = 0)} - \sum_{j = 1}^{k} log \frac{1 + exp g_{1}^{(w_{< j})} (0, w_{> j} = 0)}{1 + exp g_{0}^{(w_{< j})} (0, w_{> j} = 0)} . \end{array}

Direct effect: Let $DE (x)$ be the direct effect of X on Y on the log odds scale. Let ${\bar{β}}_{W}$ be the set of all $β$ regression coefficients of each mediator in the model for Y, including also the interaction terms both between mediators and between mediators and X. The direct effect is evaluated in $TE (x)$ after imposing ${\bar{β}}_{W} = 0$ , thereby $Y ╨ W_{1}, \dots, W_{k} | X$ , that is,

\begin{array}{l} DE (x) = TE (x) |_{{\bar{β}}_{W} = 0} = β_{x} . \end{array}

Global indirect effect: The global indirect effect $GIE (x)$ can be defined in analogy with the previous definitions, as the total effect evaluated when the direct effect of X, $β_{x}$ , is zero. Since we only deal with hierarchical models, this implies that all interaction terms between X and the mediators are also zero. Therefore,

GIE (x) = TE (x) |_{β_{x} = 0} .

Residual effect: Let $RES (x)$ be the residual effect evaluated by difference, that is,

RES (x) = TE (x) - GIE (x) - DE (x) .

From previous derivations, nonzero residual effects are induced by graphs having more than one arrow pointing to Y. Therefore, we can state that the residual effect is zero whenever one of the two following graphical conditions holds: (i) there is no direct path from X to Y or (ii) there is the direct path from X to Y and no other arrow is pointing to Y. As an instance, the model corresponding to the DAG in Figure 3A has a nonzero $RES (x)$ as there is a direct arrow form X to Y and other two arrows are pointing to Y, while models corresponding to DAGs as in Figure 3B and C are such that $TE (x) = GIE (x)$ and $RES (x) = 0$ .

Figure 3.

Directed acyclic graphs with $k = 2$ mediators when (A) no conditional independences hold; (B) $X ╨ Y | {W_{1}, W_{2}}$ , $W_{2} ╨ W_{1} | X$ , and $X ╨ W_{2}$ ; and (C) $Y ╨ {X, W_{2}} | W_{1}$ .

In a setting with multiple mediators, one is also interested in a path-specific indirect effect, that is, the effect that is due to some mediators only, and is null whenever one arrow along the pathway is deleted. Notice that, in this setting, also other research questions are of interest, such as the path-specific indirect effects when some mediators are marginalized over. They are addressed in the fifth section.

Path-specific indirect effect: Let A be one of the $2^{k - 1}$ ordered subsets of $(W_{1}, W_{2}, \dots, W_{k})$ containing at least one element of W. Let i_A be the ordered set of indices j such that $W_{j} \in A$ . The path-specific indirect effect ${PSIE}_{A} (x)$ is obtained from the total effect after imposing that:

$β_{x} = 0$ ;

$β_{w_{j}} = 0$ with $j = {1, 2, \dots, k}, j \neq min {i_{A}}$ the smallest index in i_A ;

$γ_{r, j} = 0$ with $W_{r} \in A$ , $j > r, j \neq ℓ_{r}$ , where $ℓ_{r}$ is the index following r in i_A ; and

$γ_{r, x} = 0$ with $W_{r} \in A$ , $r \neq max {i_{A}}$ the largest index in i_A .

In this way, each path-specific indirect effect contains only the parameters pertaining to the path (including the intercepts). It then follows that the indirect effect ${PSIE}_{A} (x)$ is null whenever one of the following conditions holds:

$β_{w_{j}} = 0$ with $j = min {i_{A}}$ ;

$γ_{r, ℓ_{r}} = 0$ with $W_{r} \in A$ ; and

$γ_{r, x}$ with $r = max {i_{A}}$ .

Notice that each of the conditions above implies deleting one arrow in the DAG corresponding to the model of interest.

As an instance, let $k = 6$ , $i_{A} = {2, 3, 5}$ . The path-specific indirect effect is obtained from $TE (x)$ after imposing that:

$β_{x} = β_{x w_{1}} = \dots = β_{x w_{6}} = 0$ ;

$β_{w_{1}} = β_{w_{3}} = β_{w_{4}} = β_{w_{5}} = β_{w_{6}} = 0$ ;

$γ_{2, 4} = γ_{2, 5} = γ_{2, 6} = γ_{3, 4} = γ_{3, 6} = γ_{5, 6} = 0$ ; and

$γ_{2, x} = γ_{3, x} = 0$

where $γ_{j, x}$ and $γ_{j, i}$ are, in order, the coefficients of X and W_i in the equation of W_j against its parent nodes in the corresponding DAG. The above definition allows for only one path from X to Y, which is $X \to W_{5} \to W_{3} \to W_{2} \to Y$ . It is null whenever $γ_{5, x}$ or $γ_{3, 5}$ or $γ_{2, 3}$ or $β_{w_{2}}$ are zero.

Notice that the definition of path-specific indirect effects allows only for direction preserving paths, that is, paths with all arrows pointing to the same direction. As a matter of fact, only ordered subsets of W are allowed to form A. This choice is justified by the fact that these are the only subsets with a nonzero path-specific indirect effect. To clarify the issue, see the graph in Figure 4. The path $X \to W_{3} \to W_{1} \leftarrow W_{2} \to Y$ is not admitted as it gives rise to $(W_{3}, W_{1}, W_{2})$ , not an ordered subset of W. However, as W ₁ is a collider node, the path between X and Y is blocked by $(W_{3}, W_{1}, W_{2})$ and the corresponding path-specific effect is zero (see Pearl 2009, chaps. 1 and 3).

Figure 4.

Directed acyclic graph with W ₁ acting as a collider node in the path $X \to W_{3} \to W_{1} \leftarrow W_{2} \to Y$ .

It is also important to notice that, in parallel to the single-mediator case, $DE (x)$ coincides with the effect of X on Y keeping $W_{1} = W_{2} = \dots = W_{k} = 0$ . However, ${PSIE}_{A} (x)$ does not in general coincide with the indirect effect after keeping the mediators not in A equal to zero. To see this, notice that in Figure 4, the path-specific indirect effect for $A = (W_{3}, W_{2})$ is evaluated after imposing that $β_{x} = β_{w_{1}} = 0$ . This effect does not coincide with the one obtained after conditioning on $W_{1} = 0$ (see Elwert and Winship 2014).

Notice that the sum of all the path-specific indirect effects in general is not equal to the global indirect effect. This is true even when there is just one path from X to Y. This is due to the different ways to deal with the effects induced in noncollapsible subgraphs. These are subgraphs involving three random variables, $(W_{i}, W_{j}, W_{r})$ , or $(W_{i}, W_{j}, Y)$ , $i > j > r$ , such that there are two arrows pointing to the inner node, that is, W_r or Y. In this case, W_j acts as a mediator between W_i and W_r , or Y, and there is a nonzero residual effect (see the second section). Specifically, the global indirect effect includes all residual effects, whereas path-specific indirect effects do not.

As an example, consider the models with DAG as in Figure 5A and B. In both DAGs, there is just one indirect path leading from X to Y, that is, $X \to W_{3} \to W_{1} \to Y$ , with $A = (W_{3}, W_{1})$ . Both models have $GIE (x) \neq {PSIE}_{A} (x)$ . The model corresponding to Figure 5A has two noncollapsible subgraphs, namely, the ones induced by $(W_{3}, W_{2}, W_{1})$ and $(W_{2}, W_{1}, Y)$ , while the model corresponding to Figure 5B has one noncollapsible subgraph, namely the one induced by $(W_{2}, W_{1}, Y)$ . Notice the different meaning of the parameters attached to the arrow $W_{3} \to W_{1}$ in the two effects: In the $GIE (x)$ , it is the total effect of W ₃ on W ₁, while in the path-specific ${PSIE}_{A} (x)$ , it is the direct effect.

Figure 5.

Directed acyclic graph with $k = 3$ mediators (A) $Y ╨ {X, W_{3}} | {W_{1}, W_{2}}$ , $W_{1} ╨ X | {W_{2}, W_{3}}$ , and $W_{2} ╨ X$ and (B) $Y ╨ {X, W_{3}} | {W_{1}, W_{2}}$ , $W_{1} ╨ {X, W_{2}} | W_{3}$ , and $W_{2} ╨ X$ .

Other Path-specific Indirect Effects of Interest

Suppose now that the research question involves path-specific effects in the model obtained after marginalization over some mediators while others are kept in the model. First of all, the parameters of the marginal model of interest should be obtained and then the path-specific indirect effects can be evaluated. Two different situations may arise. The first one involves marginalization over an inner mediator, and therefore equation (21) can be used in a straightforward manner. The second one involves marginalization over one intermediate/outer node and more technicalities are necessary. We here present an instance of both situations.

Suppose that the research question involves investigation of the path-specific indirect effects in the model obtained after marginalization over W ₁ of a model corresponding to the DAG in Figure 6A. In Figure 6B, the DAG corresponding to the marginal model of interest is presented, with the red arrows corresponding to parameters that change due to the marginalization over W ₁. The expressions for these parameters is reported in Online Appendix D (Supplementary material for this article is available online). The only nonzero path-specific indirect effects are for $A = {W_{2}}$ and $A = {W_{3}}$ . They can be evaluated by making use of the parameters of the marginal model. Notice that, as expected, the $GIE (x)$ in the marginal model is equal to the $GIE (x)$ in the original model.

Figure 6.

(A) Marginalization over the inner mediator W ₁ and (B) quantifying the parameters (in red the parameters that change).

Quantification of effects in models obtained after marginalization over intermediate or outer nodes involves repeated use of the derivations here presented. We here detail the steps to be followed for the case with $k = 2$ mediators. Generalizations to more complex models can be derived after repeatedly applying the procedure here proposed.

Suppose that we wish to evaluate the indirect effect in the model with W ₁ as unique mediator, that is, the model obtained after marginalization over W ₂ (see Figure 7). This implies deriving the parametric formulation of

\begin{array}{l} log \frac{P (Y = 1 | X = x, W_{1} = w_{1})}{P (Y = 0 | X = x, W_{1} = w_{1})} & = log \frac{P (Y = 1 | X = x, W_{1} = w_{1}, W_{2} = 0)}{P (Y = 0 | X = x, W_{1} = w_{1}, W_{2} = 0)} \\ + log \frac{P (W_{2} = 0 | Y = 0, X = x, W_{1} = w_{1})}{P (W_{2} = 0 | Y = 1, X = x, W_{1} = w_{1})} . \end{array}

in which the second term of the RHS of the equation is to be determined. From repeated use of the derivations in Online Appendix A (Supplementary material for this article is available online), we have

\begin{array}{l} log \frac{P (Y = 1 | X = x, W_{1} = w_{1})}{P (Y = 0 | X = x, W_{1} = w_{1})} & = β_{0} + β_{x} x + β_{w_{1}} w_{1} + β_{x w_{1}} x w_{1} + log \frac{1 + exp h_{1, w_{1}} (x)}{1 + exp h_{0, w_{1}} (x)} . \end{array}

with the expression of $h_{y, w_{1}}$ in Online Appendix E (Supplementary material for this article is available online). The values of the marginal parameters are straightforward (see, e.g., Online Appendix D, Supplementary material for this article is available online).

Figure 7.

(A) Marginalization over the outer mediator W ₂ and (B) quantifying the parameters (in red the parameters that change).

Causal Interpretation of Total, Direct, and Indirect Effects

In the counterfactual framework, many approaches exist to mediation analysis, and a review is in Huber (2019). In a single-mediator context, VanderWeele and Vansteelandt (2010) define the counterfactual notion of direct and indirect effects when the outcome is binary, thereby focusing on the log odds scale. Within a regression analysis context with a continuous mediator, the authors present an approximated parametric formulation of the effects that holds under the rare outcome assumption of Y. Valeri and VanderWeele (2013) address the same problem when also the mediator is binary, again modeled under the rare outcome assumption. It is therefore worth to explore the links existing between the effects introduced here and these causal effects defined in a formal counterfactual framework. Since the latter are contrasts expressed, possibly after a logarithmic transformation, by a difference, this parallel holds for X binary. Notice that, differently from the above cited approaches, we here present a decomposition based on the exact formulation of the effects on the log odds scale.

Under the assumption that the recursive system of equation is structural in the sense of Pearl (2009, chap. 7), one can give the total effect and some of its components a causal interpretation. To say that the recursive system of equations is structural implies that the DAG is a causal graph that satisfies a set of axioms, namely, composition, effectiveness, and reversibility (see also Steen and Vansteelandt 2018).

With a single binary mediator, a parallelism between the structural definition of a DAG and the sequential ignorability assumption of Imai, Keele, and Yamamoto (2010) exists (see Pearl 2012; Shpitser and VanderWeele 2011). Under the assumption of no unmeasured counfounder of the treatment–outcome relationship, possibly after conditioning on a set of pretreatment covariates C, the total effect of X on Y here presented corresponds to the total causal effect as defined by VanderWeele and Vansteelandt (2010). Similarly, assuming that there are no unobserved confounders of the treatment–outcome relationship, possibly after conditioning on a set of pretreatment covariates C, and no unmeasured confounders of the mediator-outcome relationship, after conditioning on the treatment X and possibly some pretreatment covariates C, the direct effect can be seen as the controlled direct effect (VanderWeele and Vansteelandt 2010) after an external intervention to fix $W = 0$ is performed (see Discrete Case subsection, Case (ii)). Less obvious are the parallelisms in terms of natural effects of VanderWeele and Vansteelandt (2010). It is possible to show that the pure natural indirect effect can be seen as the total effect after assuming $β_{x} = β_{x w} = 0$ (i.e., the indirect effect; see Discrete Case subsection, Case (i)) and that the pure natural direct effect corresponds to the total effect after assuming $γ_{x} = 0$ , that is, $X ╨ W$ (see Discrete Case subsection, Case (iv)). More details are in Doretti, Raggi, and Stanghellini (2021).

When multiple causally ordered mediators are present, several possible effects are of interest (see Daniel et al. 2015; Steen et al. 2017). However, in this case, the definition of natural direct and indirect effects is more cumbersome, and sometimes effects of interest are nonidentifiable (see, e.g., the situation described in Avin, Shpitser, and Pearl [2005]). Again, if no unobserved confounders exist, some parallelisms continue to hold. As an instance, the direct effect can be seen as the controlled direct effect of X on Y after an external intervention to fix the mediators $W_{1} = W_{2} = \dots = W_{k} = 0$ is performed (see also VanderWeele and Vansteelandt 2014). Our approach allows further to appreciate the total and controlled direct effect of X on Y also when some mediators are marginalized over, while others are kept in the system. We believe that this is an important research question in many applied studies.

Conclusions

Logistic regression is by far the most used model for a binary response. Further than in a mediation context, the models here proposed may arise in longitudinal studies with a binary outcome measured at different occasions.

With reference to a single mediator, we have proposed a novel decomposition of the total effect into direct and indirect effects that is more appropriate for the nonlinear case and that, under certain conditions, reduces to the classical definition in the linear case. Additionally, this decomposition overcomes the issue of unequal variance when fitting two nested models. As an illustrative example, we have reanalyzed data based on an encouragement program to stimulate students’ attitude to visit museums. We have shown how the decomposition of the total effect could avoid erroneous conclusions on the direct and indirect effects and provides additional information that cannot be found by just looking at the results of the separate regressions analysis. Although the total, direct, and indirect effects are all positive, a substantial residual effect, possibly due to a large interaction coefficient, hinders the interpretation in terms of proportion mediated.

Additional important results concern the extension of the definitions to the multiple mediator context. Repeated use of the decomposition of the total effect allows to address complex issues like quantifying the total, direct, and indirect effects when a subset of mediators are marginalized over. Links to the causal effects have also been established.

Supplemental Material

Supplemental Material, sj-docx-1-smr-10.1177_00491241211031260 - Path Analysis for Binary Random Variables

Supplemental Material, sj-docx-1-smr-10.1177_00491241211031260 for Path Analysis for Binary Random Variables by Martina Raggi, Elena Stanghellini and Marco Doretti in Sociological Methods & Research

Supplemental Material

Supplemental Material, sj-r-1-smr-10.1177_00491241211031260 - Path Analysis for Binary Random Variables

Supplemental Material, sj-r-1-smr-10.1177_00491241211031260 for Path Analysis for Binary Random Variables by Martina Raggi, Elena Stanghellini and Marco Doretti in Sociological Methods & Research

Supplemental Material

Supplemental Material, sj-r-2-smr-10.1177_00491241211031260 - Path Analysis for Binary Random Variables

Supplemental Material, sj-r-2-smr-10.1177_00491241211031260 for Path Analysis for Binary Random Variables by Martina Raggi, Elena Stanghellini and Marco Doretti in Sociological Methods & Research

Footnotes

Authors note

The author Martina Raggi is also affiliated with Université de Paris, INSERM U1153, Epidemiology of Ageing and Neurodegenerative diseases Paris, France

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Martina Raggi

Elena Stanghellini

Supplemental Material

The supplemental material for this article is available online.

References

Alwin

D. F.

Hauser

R. M.

. 1975. “The Decomposition of Effects in Path Analysis.” American Sociological Review 40(1):37–47.

Avin

Shpitser

Pearl

. 2005. “Identifiability of Path-specific Effects.” Pp. 357–63 in Proceedings of International Joint Conference on Artificial Intelligence. Edinburgh, Schotland.

Baron

R. M.

Kenny

D. A.

. 1986. “The Moderator–Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations.” Journal of Personality and Social Psychology 51(6):1173.

Bollen

K. A.

1987. “Total, Direct, and Indirect Effects in Structural Equation Models”. Sociological Methodology 17:37–69.

Breen

Karlson

K. B.

Holm

. 2013. “Total, Direct, and Indirect Effects in Logit and Probit Models.” Sociological Methods & Research 42(2):164–91.

Breen

Karlson

K. B.

Holm

. 2018. “A Note on a Reformulation of the KHB Method.” Sociological Methods & Research. (https://doi.org/10.1177/0049124118789717). Accessed October 1, 2020.

Cochran

W. G.

1938. “The Omission or Addition of an Independent Variate in Multiple Linear Regression.” Supplement to the Journal of the Royal Statistical Society 5(2):171–6.

Cox

Wermuth

. 2003. “A General Condition for Avoiding Effect Reversal after Marginalization.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(4):937–41.

Daniel

De Stavola

Cousens

Vansteelandt

. 2015. “Causal Mediation Analysis with Multiple Mediators.” Biometrics 71(1):1–14.

10.

Doretti

Raggi

Stanghellini

. 2021. “Exact Parametric Causal Mediation Analysis for a Binary Outcome with a Binary Mediator.” Statistical Methods & Applications. doi: 10.1007/s10260-021-00562-w

11.

Elwert

2013. “Graphical Causal Models.” Pp. 245–73 in Handbook of Causal Analysis for Social Research, edited by Morgan

S. L.

. Berlin, Germany: Springer.

12.

Elwert

Winship

. 2014. “Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable.” Annual Review of Sociology 40:31–53.

13.

Forastiere

Lattarulo

Mariani

Mealli

Razzolini

. 2019. “Exploring Encouragement, Treatment, and Spillover Effects using Principal Stratification, with Application to a Field Experiment on Teens’ Museum Attendance.” Journal of Business & Economic Statistics 39(1): 244-258.

14.

Huber

2019. “A Review of Causal Mediation Analysis for Assessing Direct and Indirect Treatment Effects.” Technical report, Université de Fribourg, Fribourg.

15.

Imai

Keele

Yamamoto

. 2010. “Identification, Inference and Sensitivity Analysis for Causal Mediation Effects.” Statistical Science 25(1):51–71.

16.

Karlson

K. B.

Holm

Breen

. 2012. “Comparing Regression Coefficients between same-Sample Nested Models Using Logit and Probit: A New Method.” Sociological Methodology 42(1):286–313.

17.

Lattarulo

Mariani

Razzolini

. 2017. “Nudging Museums Attendance: A Field Experiment with High School Teens.” Journal of Cultural Economics 41(3):259–77.

18.

Lauritzen

S. L.

1996. Graphical Models, Volume 17. Oxford, England: Clarendon Press.

19.

Lupparelli

2019. “Conditional and Marginal Relative Risk Parameters for a Class of Recursive Regression Graph Models.” Statistical Methods in Medical Research 28(10-11):3466–86.

20.

MacKinnon

D. P.

Lockwood

C. M.

Brown

C. H.

Wang

Hoffman

J. M.

. 2007. “The Intermediate Endpoint Effect in Logistic and Probit Regression.” Clinical Trials 4(5):499–513.

21.

Neuhaus

J. M.

Jewell

N. P.

. 1993. “A Geometric Approach to Assess Bias Due to Omitted Covariates in Generalized Linear Models.” Biometrika 80(4):807–15.

22.

Oehlert

G. W.

1992. “A Note on the Delta Method.” The American Statistician 46(1):27–9.

23.

Pearl

. 2001. “Direct and Indirect Effects.” Pp. 411–20 in Proceedings of the 17th International Conference on Uncertainty in Artificial Intelligence, UAI’01, edited by Breese

Jack

Koller

Daphne

. San Francisco, CA: Morgan Kaufmann.

24.

Pearl

2009. Causality: Models, Reasoning, and Inference. 2nd ed. New York: Cambridge University Press.

25.

Pearl

2012. The Mediation Formula: A Guide to the Assessment of Causal Pathways in Nonlinear Models. Hoboken, NJ: Wiley Online Library.

26.

Shpitser

VanderWeele

T. J.

. 2011. “A Complete Graphical Criterion for the Adjustment Formula in Mediation Analysis.” The International Journal of Biostatistics 7(1):16.

27.

Stanghellini

Doretti

. 2019. “On Marginal and Conditional Parameters in Logistic Regression Models.” Biometrika 106(3):732–39.

28.

Steen

Loeys

Moerkerke

Vansteelandt

. 2017. “Flexible Mediation Analysis with Multiple Mediators.” American Journal of Epidemiology 186(2):184–93.

29.

Steen

Vansteelandt

. 2018. “Mediation Analysis.” Pp. 423–56 in Handbook of Graphical Models, edited by Maathuis

Wainwright

Drton

Lauritzen

. Boca Raton, FL: CRC Press.

30.

Valeri

VanderWeele

T. J.

. 2013. “Mediation Analysis allowing for Exposure–Mediator Interactions and Causal Interpretation: Theoretical Assumptions and Implementation with SAS and SPSS Macros.” Psychological Methods 18(2), 137–50.

31.

VanderWeele

T. J.

Vansteelandt

. 2010. “Odds Ratios for Mediation Analysis for a Dichotomous Outcome.” American Journal of Epidemiology 172(12):1339–48.

32.

VanderWeele

T. J.

Vansteelandt

. 2014. “Mediation Analysis with Multiple Mediators.” Epidemiologic Methods 2(1):95–115.

33.

Winship

Mare

R. D.

. 1983. “Structural Equations and Path Analysis for Discrete Data.” American Journal of Sociology 89(1):54–110.

34.

Wooldridge

J. M.

2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, MA: MIT Press.

35.

Xie

Geng

. 2008. “Some Association Measures and Their Collapsibility.” Statistica Sinica 18(3):1165–183.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.22 MB

0.01 MB