Abstract
One interesting idea in social network analysis is the directionality test that utilizes the directions of social ties to help identify peer effects. The null hypothesis of the test is that if contextual factors are the only force that affects peer outcomes, the estimated peer effects should not differ, if the directions of social ties are reversed. In this article, I statistically formalize this test and investigate its properties under various scenarios. In particular, I point out the validity of the test is contingent on the presence of peer selection, sampling error, and simultaneity bias. I also outline several methods that can help provide causal estimates of peer effects in social networks.
Introduction
The complexity in social network data makes it very difficult to identify causal peer effects (An 2011a; Anagnostopoulos, Kumar, and Mahdian 2008; Christakis and Fowler 2007, 2013; Cohen-Cole and Fletcher 2008a, 2008b; Fowler and Christakis 2008; Lyons 2011; Noel and Nyhan 2011; Shalizi and Thomas 2011; VanderWeele 2011; VanderWeele, Ogburn, and Tchetgen 2012). One interesting idea in social network analysis is the directionality test that utilizes the directions of social ties to identify peer effects. Duncan, Haller, and Portes (1968) may be the first study that examines how peer effects vary by the directions of social ties. They found that the influence of friend on ego appeared to be stronger than that of ego on friend and argued this might be because friend constituted a significant figure for ego while the converse might not be true.
Christakis and Fowler (2007) also found that friend nominees influenced the nominators but the latter had no statistically significant influence on the former. They argued that since exposure to unobserved contextual factors should have an equal influence on actors, unequal peer effects by the friendship directions should have been generated by something other than contextual factors, for example, social esteem that an individual held toward perceived friends. Similarly, Anagnostopoulos et al. (2008) formulated an edge-reversal test of peer effects. They argued that since effects of contextual factors were independent of social ties, reversing the edges of social ties should not significantly change the estimated peer effects while it would be the case if peer influence was the driving force.
In short, the directionality test of peer effects can be summarized as follows. If contextual factors are the only force that affects peers’ outcomes, we should not obtain different estimates of peer effects if the friendship directions are reversed. Otherwise, “something” other than contextual factors (e.g., peer influence) is at work to generate the different estimates. However, this argument may not hold, because the estimated peer effects when friendship directions are reversed can differ for alternative reasons, such as peer selection, sampling error, and simultaneity bias. Thus, observing unequal peer effects does not necessarily reject the null (i.e., excluding contextual effects). Furthermore, rejecting the null does not necessarily prove the existence of peer influence.
Critiques of the directionality test have been presented previously. Shalizi and Thomas (2011) showed that the directionality test could break down because of homophily on latent traits. They used simulations to demonstrate this point. But their simulations did not consider contextual effects and so slightly mis-presented the setting of the directionality test. Moreover, the latent homophily explanation they provided seems to be inaccurate. Homophily (i.e., actors tend to affiliate with others with similar attributes) does not necessarily produce asymmetric peer effects. As shown analytically later in this article, it is only a special type of peer selection that will lead to this result, namely, when egos tend to choose a homogeneous group of alters as friends. The simulations in Shalizi and Thomas (2011) worked exactly because they specified individuals to friend acquaintances who were more like a median type of person (Shalizi and Thomas 2011:219).
Lyons (2011) also commented on the directionality test. He pointed out the differences in the estimated peer effects shown in Christakis and Fowler (2007) were not statistically significant because their 95% confidence intervals overlapped. He also used a simple example to show that homophily or shared environment rather than social influence can cause asymmetric peer effects. In this article, I provide a rigorous formalization of the directionality test and analytically show that it is not necessarily homophily but a special kind of peer selection that causes asymmetry in social influence.
The main motivation of this article is to show that under simple, tractable, and reasonable models for peer effects, the directionality test may fail if there are alternative explanations. Among the three possible explanations (peer selection, sampling error, and simultaneity bias), the argument of sampling error has never been raised before in the literature. For the other two arguments that have been discussed in the literature, I present them analytically rather than just through simulations or simple examples and so am able to reveal more details. The purpose of this article is not to undervalue the novelty of directionality test, but rather to provide a more rigorous examination of the conditions under which it is likely to hold or not to hold. To that end, I first formalize the directionality test and examine its statistical properties under various situations. Then I outline several methods for estimating peer effects. Since the main focus of this article is not to review these methods and relevant reviews have been available elsewhere (An 2011a; VanderWeele and An 2013; VanderWeele et al. 2012), I will be brief in introducing these methods. Finally, I conclude.
The Directionality Test
Table 1 shows a small data set with some typical features in social network data. The first column shows the IDs of six egos while the second column the IDs of the friends the egos nominate. For simplicity, each ego is allowed to nominate only one friend. Note that actors 1 and 2 are mutual friends because they nominate each other as a friend. Also note that actor 5 has been nominated by both actors 3 and 4 as a friend. Bear in mind that these two features of the network data have profound impact on the identification and estimation of peer effects.
An Example Showing the Typical Data Structure for Estimating Peer Effects.
There are two kinds of peer effects that may be sociologically interesting. The first one is the effects of nominated friends on egos. The other is the effects of the egos on the friends. In other words, the egos and the friends may affect one another simultaneously. For estimating these two types of peer effects, we can specify a pair of regression equations.
1
where E and F denote the outcomes of the egos and the friends, receptively; X includes both egos’ and friends’ (exogenous) covariates; U contains unobserved contextual factors shared by the actors; v and ε are independent error terms. The effects from peers’ outcomes may be called endogenous peer effects, as measured by α1 and α2, respectively. The effects of peers’ characteristics (like demographics and family background) may be called exogenous peer effects, as measured by β1 and β2, respectively. The goal of this model is to estimate endogenous peer effects while controlling for exogenous peer effects. 2
The directionality test argues if contextual factors are the only force that generates the correlation in peers’ outcomes (conditioning on possible covariate effects), then the estimates for peer effects should be equal regardless of the directions of social ties. Formally, the test assumes the absence of peer effects and the following data generation process:
With this assumption, the directionality test argues that the estimates for α1 and α2 in equations (1) and (2) should be equal. The ordinary least squares (OLS) estimates of α1 and α2 in equations (1) and (2) are as follows:
3
Given equations (3) and (4), the OLS estimates can be reexpressed as follows:
The estimated coefficients
VanderWeele et al. (2012) suggested using peers’ lagged outcomes instead of current outcomes to estimate peer effects. This strategy helps to get around a simultaneity issue (see subsequently) and arguably is more robust to contextual confounding. But the model cannot estimate contemporaneous peer effects. In addition, the model requires having longitudinal data, which is generally more difficult to obtain. Perhaps more importantly for the purpose of this study, the directionality test is almost irrelevant in the lagged peer effects model, because the two estimated peer effects are not necessarily equal under the null hypothesis and even when there are no competing reasons. 5 For these reasons, I will focus on the contemporaneous peer effects model but will discuss the implications of the findings for the lagged peer effects model when applicable. In particular, as shown subsequently, lagged outcomes may be used as instruments for current outcomes to estimate contemporaneous peer effects.
Peer Selection
The first alternative reason for which the variances of the outcomes of egos and friends can differ is peer selection. If egos tend to select a homogeneous group of alters as friends, then the nominated friends may have a smaller variation in their outcomes, namely, Var(F) < Var(E). 6 For example, actors may be more likely to nominate those who are fitter as friends. So the variance in the weight status (e.g., as measured by body mass index) of the friends will be smaller than that of the egos. Actors may also more likely nominate smokers as friends so that the variance in the smoking status of the friends is much smaller than that of the egos.
In these cases, when we use the friends’ outcomes to predict the egos’ outcomes as in equation (1), we may obtain a statistically significant and substantively large estimate of peer effects. But when we use the egos’ outcomes to predict the friends’ outcomes as in equation (2), due to the larger variation in the egos’ outcomes, we may obtain a smaller and statistically insignificant estimate of peer effects. 7 Hence, just due to peer selection alone, we may obtain different estimates of peer effects. Thus, insufficiently accounting for peer selection can endanger the validity of the directionality test.
Note that in a special case where only mutual friendships are allowed (either due to symmetrization of friend nominations or due to constrains in survey designs that only elicit mutual friendships), peer influence is symmetric (i.e.,
Sampling Error
Now suppose there are no peer selection: Friends are randomly drawn with replacement by egos from all actors. Typically, there will be repetition in the drawn friends, just as in network data some actors tend to be repeatedly nominated by others as friends. For example, as shown in Table 1, actor 5 is nominated by both actors 3 and 4 as a friend.
The random friend nominations lead to equal (in expectation) sample variances of the outcomes of the friends and the egos. But the distributions of the two sample variances may differ substantially. I conduct simulations to convey this point more vividly. Basically, I generate 300 observations from a standard normal distribution as the sampled outcomes for the egos. Then from the egos’ outcomes, I resample with replacement 300 observations as the friends’ outcomes. To mimic the fact that some actors tend to be repeatedly listed as friends, I specify the resampling probability at .03 for 5 percent of the observations and split the remaining probability evenly among the remaining observations. Then I calculate the sample variances of the outcomes of the egos and the friends, respectively. I repeat this process 500 times to obtain 500 sample variances for each group. Figure 1 shows the density plots of the sample variances across the 500 samples. The distributions of both sample variances concentrate on the designed unit variance. But the sample variances of the friends’ outcomes are much dispersed than the sample variances of the egos’ outcomes. Specifically, the variance of the sample variances of the friends’ outcomes is about five times as large as the variance of the sample variances of the egos’ outcomes (.035 vs. .007). Probably more importantly, across the 500 samples, 142 (or 28 percent) times the difference in the sample variances between the two groups is beyond the ±2 standard deviations of the sample variance of the egos’ outcomes.

Density plots of the sample variances of the outcomes of the egos and the friends.
This finding suggests that if we conduct multiple studies, it is likely just by chance alone in many studies Var(F) ≠ Var(E). So the two estimated peer effects will differ even if the true peer effects are equal. 8 The magnitude of the effect of the sampling error depends on the extent of repetition in friend nominations. If repetition is rare, then the effect may be small. Otherwise, the effect may be large enough to affect the validity of the directionality test. Hence, to properly use the directionality test, it may be better to remove the repeated friend nominations too. 9
Simultaneity Bias
As shown earlier, the assumption of the directionality test implies the absence of peer effects. However, rejection of the null hypothesis does not necessarily imply the existence of peer effects. In addition, even when both peer selection and sampling errors can be ruled out, the test may not be used for causal estimation of peer effects. This is because the estimated peer effects in equations (1) and (2) may suffer from simultaneity bias.
Note that equations (1) and (2) essentially comprise a pair of linear simultaneous equations. Here we focus on linear simultaneous equations because of its relative advantage over nonlinear simultaneous equations. When the outcomes are binary like in the case of Christakis and Fowler (2007), at first glance, nonlinear simultaneous models like simultaneous probit (Heckman 1978; Maddala 1983) or simultaneous logit (Schmidt and Strauss 1975) seem more appropriate. The problem is that these nonlinear models are unidentified unless peer effects are assumed to be equal or one of them is zero (Maddala 1983; Schmidt and Strauss 1975). 10 Thus, these nonlinear models can lead to artificial directional differences or indifferences in estimated peer effects. These models also tend to be computationally instable because of their complicated and often unrealistic distributional assumptions. For these reasons, linear simultaneous models are generally preferred, even when the outcomes are binary (Angrist and Pischke 2008:147-48). 11 See Thomas (2013) and (An 2015) for more detailed reviews on the properties of these simultaneous models.
In equations (1) and (2), friends’ outcomes (F) and egos’ outcomes (E) are endogenous variables because by construction F (as a function of E) is correlated with U and v and E (as a function of F) is correlated with U and ε. Thus, simply estimating the two equations by OLS or any other methods without accounting for these correlations will result in biased estimates of peer effects. This is so-called simultaneity bias (Wooldridge 2009:207). To see this point more clearly, we can reexpress the estimated coefficients as follows:
I replace the endogenous variables with their compositions, for example,
A potential solution to the simultaneity problem is to regress the outcomes only on the exogenous variables, namely, through the so-called reduced-form regressions. In other words, we can reexpress the regression equations (1) and (2) as follows:
When the contextual factors (U) are unobserved, fitting the two equations will result only two estimated coefficients (for the covariate X in each equation). But there are six parameters to estimate. So the model is underidentified. The model is also underidentified even when the contextual factors (U) are observed. In that case, we will obtain four estimated coefficients, which are still fewer than the number of parameters in the model. The model is still underidentified even if we assume equal peer effects (i.e., α1 = α2) and equal contextual effects (i.e., γ1 = γ2), because as the number of parameters reduces to four, the number of estimated coefficients shrinks to three. In short, the model is underidentified, regardless what assumptions are imposed on the parameters. Prior research (Manski 1993) has termed this problem as the reflection problem—it is impossible in this type of model to separate the estimates of endogenous peer effects from the estimates of other effects.
Simultaneity is only one basic form of network dependence. There may be other higher-order network dependence (e.g., transitivity, clustering, and preferential attachment) as well. For example, in a transitive triad, the friend of a friend is also a friend. Thus, the correlation in a pair of actors’ outcomes may also be produced by sharing a common friend. To account for this fact would require, to say the least, controlling for the outcome of the common friend.
In addition, most network surveys only allow each ego to nominate a limited number of friends (typically less than 10 and sometimes can be just 1). Thus, there may be a substantial number of network ties unobserved. The effects of these unobserved ties are somewhat equivalent to the unobserved contextual effects. Hence, sometimes there may appear to be only a small degree of network dependence in an observed network, but the effects of unobserved network ties could be substantial. Without accounting for these effects, the estimated peer effects may suffer from omitted variable bias too.
Methods for Estimating Peer Effects
If the goal is to estimate peer effects rather than evaluating the directionality test, then the issue becomes more complicated. As pointed out by Christakis and Fowler (2007), there are three possible reasons for peers’ outcomes to be correlated: peer influence, homophily, and contextual confounding. Thus, to properly estimate the effects of peer influence, it is critical to tease out the effects of competing causes. This article and prior research like An (2011a) further point out simultaneity can be another issue.
Assuming adequate control has been made for homophily and contextual confounding, VanderWeele et al. (2012) suggested using the lagged peer effects model to solve the simultaneity problem. As they pointed out, the assumption that competing causes have been adequately controlled for can be too strong and the model is only good for testing the null of no peer effects. In addition, if the goal is to estimate contemporaneous peer effects, then the lagged peer effects model is insufficient.
One method to provide consistent estimates of contemporaneous peer effects is to use instrumental variables (IVs). The following shows a pair of linear simultaneous equations with IVs:
The Z E and Z F are IVs for the outcomes of egos and friends, respectively. They are assumed to be correlated with E and F, respectively, but uncorrelated with the unobserved contextual factors U and either of the error terms.
The basic idea of IV method is to utilize the exogenous variations in the instruments to help identify the effects of the endogenous variables. In a two-stage least squares (2SLS) framework, the IV estimation amounts to fitting OLS on the following equations:
where
where covariates are omitted for conciseness, and
Comparing the IV method with the lagged peer effects model, both require more information than what cross-sectional data can offer. The IV method requires having good instruments. The lagged peer effects model requires collecting longitudinal data. Given the availability of good instruments, however, the IV method does not need to control for homophily or contextual confounding (O’Malley et al. 2014). This is a great feature of the IV method, as in practice neither homophily nor contextual confounding can easily be controlled for. The IV method also helps to tease out higher-order network dependence like transitivity because the instruments are likely to be orthogonal to the outcomes of shared contacts. Thus, it is not necessary to control for the outcomes of shared contacts, except for efficiency concerns.
However, the abovementioned arguments are not meant to claim the superiority of the IV method over alternative methods because they are useful in different contexts. There are many occasions where good instruments may be difficult to obtain or assess. Then alternative methods may work better in these scenarios. In fact, there is some connection we may draw between the IV method and the lagged peer effects model. For example, the lagged outcomes may be used as instruments for current outcomes to estimate contemporaneous peer effects. This would work if homophily only depends on current outcomes but not past outcomes and contextual confounding operates only at the current time. Under these conditions (very demanding though), the lagged outcomes are uncorrelated with either homophily or contextual confounding and can be treated as exogenous.
Some notable applications of IV method to estimating peer effects include An (2015), Bramoull, Djebbari, and Fortin (2009), Duncan et al. (1968), O’Malley et al. (2014), etc. For example, Bramoull et al. (2009) used a side subject in an intransitive triad as an instrument for the middle subject to study the effect of the middle subject on the other side subject. As creative it is, this approach faces several constrains. First, there must be a large number of intransitive triads in a network to provide sufficient statistical power. Second, there must be no unmeasured connections between the two side subjects. Third, there must be no contextual factors that affect the subjects simultaneously. In reality, none of the above are easily satisfied.
Besides the IV method, other methods for estimating causal peer effects include experiments with a partial treatment design (An 2011b) and the stochastic actor-oriented model (Steglich, Snijders, and Pearson 2010). None of the methods offers a universal solution. Each has its own advantages and disadvantages. See An (2011a) and VanderWeele and An (2013) for detailed reviews on these methods.
Conclusion
Because of the complicated dependence in social network data, identifying and estimating causal peer effects is a challenging task. The directionality test aims to utilize the directions of social ties to facilitate identification. In this article, I statistically formalize the test and examine its properties under various scenarios. I point out three factors that may invalidate the test, namely, peer selection, sampling error, and simultaneity bias. I also outline several methods that can help provide more robust causal estimates of peer effects, including instrumental variables methods, experimental methods, and dynamic network models. In conclusion, I applaud for the novelty of the directionality test, but also want to alert researchers to scrutinize its conditions when putting it into practice.
Footnotes
Acknowledgments
I thank the editor and the anonymous reviewers for their helpful feedback to improve this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
