A Tale of Twin Dependence: A New Multivariate Regression Model and an FGLS Estimator for Analyzing Outcomes With Network Dependence

Abstract

In this article, I present a new multivariate regression model for analyzing outcomes with network dependence. The model is capable to account for two types of outcome dependence including the mean dependence that allows the outcome to depend on selected features of a known dependence network and the error dependence that allows the outcome to be additionally correlated based on patterned connections in the dependence network (e.g., according to whether the ties are asymmetric, mutual, or triadic). For example, when predicting a group of students’ smoking status, the outcome can depend on the students’ positions in their friendship network and also be correlated among friends. I show that analyses ignoring the mean dependence can lead to severe bias in the estimated coefficients while analyses ignoring the error dependence can lead to inefficient inferences and failures in recognizing unmeasured social processes. I compare the new model with related models such as multilevel models, spatial regression models, and exponential random graph models and show their connections and differences. I propose a two-step, feasible generalized least squares estimator to estimate the model that is computationally fast and robust. Simulations show the validity of the new model (and the estimator) while four empirical examples demonstrate its versatility. Associated R package “fglsnet” is available for public use.

Keywords

network analysis multivariate regression multilevel modeling FGLS

In many settings, outcomes tend to be correlated across units of analysis. For example, the academic achievement of students in a classroom or a school may be correlated. The number of friends a person has is likely correlated among the egos and the alters. Perceived risk may be correlated among residents living in the same neighborhood. Regardless of the cause for the correlation, it is important to account for the correlation to make proper statistical estimates and inferences.

Several approaches have been used in the past to account for such correlations. For example, clustered standard errors aim to adjust for the correlation among units in the same cluster. But this approach usually allows only one cluster while multiway clustering is still rarely used (Cameron, Gelbach, and Miller 2011; Zeileis 2006). Another approach is multilevel modeling, which can accommodate multiple clusters (Guo and Zhao 2000; Mustillo and Mustillo 2012; Rabe-Hesketh and Skrondal 2008; Raudenbush and Bryk 2002; Sampson, Raudenbush, and Earls 1997; Snijders and Bosker 2012). But one problem with multilevel modeling (and also clustered standard errors) is that it assumes outcome correlations exist evenly within a cluster, whereas in reality, the outcomes in the same cluster may not be correlated evenly and outcomes may be correlated across clusters.

In this article, I advocate collecting social network data to more accurately represent and model correlated outcomes. Specifically, given a network that depicts the dependence structure across units, I present a new multivariate regression model that can incorporate two types of outcome dependence: (1) the mean dependence that allows the outcome to depend on selected features of the dependence network and (2) the error dependence that allows the outcome to be additionally correlated according to pattered connections in the dependence network. For example, when predicting a group of students’ smoking status, the outcome can depend on the students’ positions in their friendship network and also be correlated among friends. Overall, the two types of dependence help provide more accurate and comprehensive modeling of correlated outcomes, whereas past models usually account for only one type of outcome dependence. In particular, one unique strength of the new model is that it can account for various forms of error dependence based on patterned connections in the dependence network (e.g., according to whether the ties are asymmetric, mutual, or triadic).

I will compare the new model with related models, including multilevel models, spatial regression models, and exponential random graph models (ERGMs), and present both differences and connections between them. I will also show the feasible generalized least square (FGLS) estimator which I propose to estimate the new model is computationally fast and robust.

To show the versatility of the new model, I will present four empirical examples. The first example examines students’ attitude toward smoking in a health intervention and shows the new model helps disentangle the direct treatment effect from the treatment diffusion effect while accounting for extra outcome correlations. This example also shows the new model can be used to account for multilevel correlations besides network-based dependence. The second example examines how social ties and political power affect wealth accumulation among a group of noble Florentine families in the Renaissance period. This example shows how to combine multiple dependence networks to better model error dependence and how the new model differs from traditional spatial regression models. The third example studies advising relations among a group of corporate managers. This example shows that significant error dependence may indicate that important variables have been omitted from the model. The last example examines popularity among students in a Southern U.S. school. It shows that the new model provides a new approach (compared to the ERGMs) to modeling selected network features while also accounting for dependence in tie formation.

The article proceeds as follows. First, I will describe the basic model and show how it can be used to characterize two types of outcome dependence, namely, the mean dependence and the error dependence. To put the new model in context and show its distinct features, I will also compare it with related models. Second, I will introduce the FGLS estimator to estimate the new model and discuss its properties in relation to other estimation methods. Third, I will present simulations to evaluate the performance of the new model and the FGLS estimator. Fourth, I will present four empirical examples for illustration. Finally, I will conclude and discuss possible improvements of the model and the FGLS estimator.

Model and Method

The Basic Model

Assume M is a matrix representing the dependence network across units. For example, it can be a friendship network in a school (Resnick et al. 1997) or a business collaboration network in a city (Padgett and Ansell 1993). If two units are connected in the network, then the corresponding cell in M is coded as one and otherwise zero. For now, let’s assume M is binary and undirected. Later I will relax these restrictions. Below I introduce a multivariate normal regression model for modeling a continuous outcome Y with network dependence.

Y = X β + F γ + ε, ε \sim MN (0, σ^{2} Ω) .

In the model, X represents covariates and F represents selected features of the dependence network. I call the dependence of Y on F as the mean dependence in order to distinguish it from the error dependence to be introduced below. In practice, F typically include individual centrality measures in M : (1) degree centrality, which measures the number of connections a unit has in M (Freeman 1978); (2) closeness centrality, which measures how fast a unit can reach to others in M , defined as $\frac{\sum_{j} d_{i j}^{- 1}}{n - 1}$ , where $d_{i j}$ is the shortest path between units i and j (Gil and Schmidt 1996); and (3) betweenness centrality, which measures the brokerage power of a unit in M, defined as $\sum_{j k} \frac{g_{j k}^{i}}{g_{j k}}$ , where $g_{j k}$ is the number of shortest paths between units j and k and $g_{j k}^{i}$ is the number of those paths that pass i (Butts 2008). For example, when studying student smoking behavior, the number of friends a student has may be a predictor (Suh, Shi, and Brashears 2017). Of course, depending on the context, the network features can also be at the group level, for example, network density, or even include interactions between the individual level and the group level of network statistics. More network measures can be found in Wasserman and Faust (1994).

The error term $ε$ ε is assumed to follow a multivariate normal distribution with a zero mean and a covariance $σ^{2} Ω$ . The diagonal of $Ω$ contains ones. Its off-diagonal equals to $ρ$ if two units are connected in M and to zero otherwise. This specification helps model extra outcome correlations caused by unobserved factors shared by the connected units.

Ω_{i j} = \{\begin{array}{l} 1 for i = j, \\ ρ for i \neq j and M_{i j} = 1, \\ 0 for i \neq j and M_{i j} = 0. \end{array}

When the dependence network is directed, one approach is to symmetrize it by removing nonmutual ties or by converting nonmutual ties to mutual ones. In the former case, $Ω_{i j} = ρ$ only if $M_{i j} = M_{j i} = 1$ for $i \neq j$ . In the latter case, $Ω_{i j} = ρ$ if $M_{i j} = 1$ or $M_{j i} = 1$ for $i \neq j$ .

For a directed dependence network, one can also differentiate the error dependence by relational distance or structural equivalence between units (Leenders 2002) or the types of ties between units as introduced below. Inspired by social network analysis, this article presents three forms of error dependence that represent unmeasured social processes such as hierarchy, reciprocity, and transitivity: (1) asymmetric dependence between units with asymmetric ties in M , (2) mutual dependence between units with mutual ties in M , and (3) triadic dependence between units who not only have mutual ties between themselves but also have h ( $\geq 1$ , a threshold that may vary in different contexts) common neighbors in M . To differentiate the three forms of error dependence, one can adjust the covariance matrix $Ω$ as follows:

Ω_{i j} = \{\begin{array}{l} 1 for i = j, \\ ρ_{t} for i \neq j \neq k and M_{i j} = M_{j i} = 1 and \sum_{k} I_{M_{i j k} = Δ} \geq h, \\ ρ_{m} for i \neq j \neq k and M_{i j} = M_{j i} = 1 and \sum_{k} I_{M_{i j k} = Δ} < h, \\ ρ_{a} for i \neq j and M_{i j} \neq M_{j i}, \\ 0 for i \neq j and M_{i j} = M_{j i} = 0. \end{array}

where $M_{i j k} = Δ$ indicates units i, j, and k form a triangle in the dependence network, and $\sum_{k} I_{M_{i j k} = Δ}$ counts the numbers of triangles that both units i and j are embedded in.

In principle, this model can also include error dependence formed upon indirect ties or other kinds of social processes. Some of these have been explored in Leenders (2002), but more variants can be specified following how endogenous tie dependence is specified in ERGMs (Morris, Handcock, and Hunter 2008).

Comparisons With Related Models

Below I compare the new model with related models and point out their differences and connections as well as their relative advantages and disadvantages. Before going into details, I should point out there is an R package “tnam” (not available in the current version of R but available in the R archive) that can account for mean dependence on centrality measures (Leifeld and Cranmer 2017). The package does not account for error dependence.

Spatial regression models

Spatial regression models include several variants. The spatial autoregressive model (i.e., the spatial lag model) aims to account for interdependence among the outcomes of network neighbors (Anselin et al. 1996; O’Malley et al. 2014; O’Malley and Marsden 2008). Suppose M is the adjacency matrix and W is the row-normalized matrix of M , the spatial autoregressive model posits that the ith unit’s outcome is a function of the weighted average outcome of its network neighbors $(Y_{- i} = \sum_{j \neq i} w_{i j} Y_{j} = \frac{\sum_{j \neq i} m_{i j} Y_{j}}{\sum_{j \neq i} m_{i j}})$ . Sometimes the average outcome of network neighbors is simply represented by the best friend’s outcome (An 2015). The model is also called the social influence model or the social contagion model (O’Malley and Marsden 2008).

Y = λ W Y + X β + ε .

In the spatial autocorrelation model (also called the spatial error model), the error terms are assumed to be spatially correlated, $ε = ρ W ε + ν$ , or equivalently, $ε = (I_{n} - ρ W)^{- 1} ν$ (Anselin et al. 1996). The model assumes that the ith unit’s error term depends on a weighted average of the error terms of its network neighbors.

Y = X β + ε .

ε = ρ W ε + ν .

The spatial autoregressive and autocorrelation model combines the spatial autoregressive model and the spatial autocorrelation model by including both direct outcome dependence and error dependence. Note that the two types of dependence do not have to be based on the same dependence network.

Y = λ W Y + X β + ε .

ε = ρ W ε + ν .

There are three major differences between the new model and the spatial regression models. First, the types of network dependence that are modeled are different. The spatial autoregressive and autocorrelation model (the most comprehensive version of the spatial regression models) can account for autoregressive (outcome) dependence and error dependence while the new model can account for mean dependence (which is usually absent in the spatial regression models) and error dependence. In the spatial regression models with a spatial autoregressive process, the parameter $λ$ represents social contagion (O’Malley et al. 2014; VanderWeele and An 2013). The new model does not directly model social contagion, but it may still be used to inform social contagion. Suppose there is indeed social contagion. Then, ignoring social contagion will lead to the error terms of the network neighbors to be correlated. Hence, testing error dependence among network neighbors in the new model will provide a hint for social contagion. In addition, by including mean dependence, the new model can shed light on another form of social contagion: treatment diffusion. Imagine there is an intervention in which the intervention information can be shared among friends. Then, a person’s outcome can depend on whether the person receives the intervention directly and also on whether the person has exchanged treatment information with others. By including features of the treatment diffusion network (viewed as the dependence network in this example) in the model, one can examine the effect of social contagion that is mediated by treatment diffusion.

Second, the error dependence structure is specified differently. In the spatial regression models with an autocorrelation process, the error terms are usually assumed to be $ε = (I_{n} - ρ W)^{- 1} ν$ . This specification makes it hard to understand how two units’ outcomes are correlated. In contrast, in the new model, two units’ outcomes are correlated as long as the two are connected in the dependence network. This specification appears to be more intuitive and also requires no row normalization of the dependence network. Moreover, the new model can represent multiple forms of error dependence at the same time according to the type of ties (e.g., asymmetric vs. mutual vs. triadic) in the dependence network. To note, Leenders (2002) also differentiates several forms of error dependence but mostly according to relational distance or structural equivalence between units. Another difference between this article and Leenders (2002) is that this article provides statistical procedures to estimate and analyze the error dependence.

Finally, the estimation methods are different. The spatial regression models are typically estimated by the ordinary least squares (OLS) combined with the maximum likelihood estimation. The new model is estimated by the FGLS, which is computationally faster and more robust.

Despite the differences, it is possible to combine the two types of models. For example, one may revise the spatial autoregressive and autocorrelation model by including the mean dependence and varying error dependence as how it is specified in the new model.

ERGMs

ERGMs aim to model network formation (Hunter and Handcock 2006; Robins et al. 2007; Snijders et al. 2006; Wasserman and Faust 1994). The probability of observing a network w is assumed to be $P (W = w | θ, X) = \frac{exp {θ^{'} S (w, X)}}{K (θ, W)}$ , where $S (w, X)$ includes the network terms/statistics that represent selected tie formation processes and $K (θ, W)$ a normalizing factor ensuring the probability sum to one. An ERGM is equivalent to a conditional logit model (Hunter and Handcock 2006; Robins, Pattison, and Wang 2009). Namely, conditional on the remaining network ( $w_{i j}^{c}$ ), the log odds of observing a tie ( $w_{i j}$ ) depends on changes in the network statistics ( $δ^{i j} (w, X)$ ) that are caused by the presence of the tie.

logit [P (w_{i j} = 1 | w_{i j}^{c})] = {θ^{'} δ}^{i j} (w, X) .

ERGMs are typically estimated by Monte Carlo Markov chains, except in the pseudolikelihood estimation (Strauss and Ikeda 1990; Wasserman and Pattison 1996; Yamaguchi 2003). As a result, the estimation process is usually computationally intensive and slow, especially for big networks.

Unlike ERGMs, the new model assumes the dependence network is given and uses it to model the correlations in another outcome. That said, the different forms of error dependence presented in this article resemble how endogenous tie formation processes are modeled in ERGMs. Hence, other forms of tie dependence that have been studied in ERGMs may be adopted in the new model to configure the error dependence structure (Morris et al. 2008).

The new model can be used to model selected network features (e.g., indegree) rather than the network itself. Because of the resemblance between the error dependence structure and endogenous tie formation processes, the estimated error dependence structure (e.g., mutual and triadic correlations) may be informative of corresponding endogenous tie formation processes (e.g., reciprocity and transitivity). Hence, the new model may serve as an alternative but computationally faster approach to understanding endogenous tie formation processes.

Estimation

I utilize an FGLS estimator to estimate the model (Greene 2008). This is a two-step procedure. First, I estimate the model by OLS and use the residuals of network neighbors to estimate the correlation coefficient $ρ$ . Second, I use the estimated $\hat{ρ}$ to construct $\hat{Ω}$ and estimate the model as follows:

\hat{θ} = (Z' {\hat{Ω}}^{- 1} Z)^{- 1} Z' {\hat{Ω}}^{- 1} Y,

Var (\hat{θ}) = {\hat{σ}}^{2} {(Z' {\hat{Ω}}^{- 1} Z)}^{- 1}, with {\hat{σ}}^{2} = \frac{e' e}{n - k - 1},

where $Z = (X, F)$ , $θ$ = ( $β, γ$ ), e is the residuals, and k is the number of parameters besides the intercept. Given the estimates of the model, one can recalculate the residuals and repeat the estimation process. Such iterations are often used in practice with the hope to increase the efficiency of the estimates (Greene 2008). Note that the generalized least squares approach to autocorrelation problems has been presented in the literature (Hanushek and Jackson 1977; O’Malley and Marsden 2008). But this approach is rarely used in social network modeling.

The correlation coefficient $ρ$ may be estimated by the below regression (Durbin 1970).

s_{I} = ρ s_{J} + τ .

where s _I and s _J are the standardized residuals of the units connected in M . Standardization of the residuals helps to directly get the correlation coefficient. Standardization occurs within nominators (i.e., the I group) and nominees (i.e., the J group), separately. Whether the dependence network is directed or undirected, for any type of tie, it is used only once in the estimation. For example, for a mutual tie between i and j, the pair e_i and e_j is used only once in the regression. Similarly, units with triadic dependence also appear only once (not three times) in the regression. Simulations show this is crucial for the correct estimation. As such, one can estimate the correlation coefficients for asymmetric, mutual, and triadic error dependence by using the corresponding standardized residuals. Given a correct specification, the FGLS estimator can provide unbiased estimates of the correlation coefficients, which is sufficient to ensure the consistency of the estimated coefficients in the outcome model (Greene 2008).

To make inferences on the estimated error correlation coefficients, one may use the Wald test in the corresponding residual regressions. For example, for equation (11), one can simply use the estimate and the standard error to conduct hypothesis testing. Another approach is to use the Breusch–Godfrey test (Breusch 1978; Godfrey 1978) that is typically used for assessing serial correlations in panel data. The Breusch–Godfrey test shows that $(n - 1) R^{2}$ approximately follows a χ² distribution with one degree of freedom, where n and R² are the number of observations and the R² statistics in a residual regression. The null of no error dependence is rejected if $(n - 1) R^{2} > χ_{1, 1 - p}^{2}$ , where p is the desired significance level.

The downside of these two approaches is that they ignore heteroscedasticity (when the variances of the error terms in the residual regression differ across units) and clustering (when ego units are connected to multiple alters). To address this issue, one may use standard errors clustered by ego units (Cameron and Miller 2015) or even by egos and alters jointly. To note, if the number of clusters is relatively small compared to the number of units within clusters, conventional clustered standard errors may not work as effectively (Hansen 2007).

Resampling methods can also be useful. One is bootstrapping, namely, resample the residuals with replacement (or even within ego blocks), and refit the residual regression many times. Then, the estimates can be used to form a sampling distribution to construct standard errors and perform hypothesis testing. Another approach is the permutation test. Basically, one resamples the predictor without replacement and refits the residual regression many times. The resulting estimates can form a null distribution of no error dependence to benchmark against the originally estimated correlations.

However, my simulations and prior research (Dow, Burton, and White 1982; Langford, Schwertman, and Owens 2001) show that all these procedures hold only approximately. My simulations further show that the inferences behave relatively well under two conditions: (1) the true correlation coefficients are small and (2) the dependence network is sparse (Leung 2020). To note, when the true correlation coefficient $ρ$ is large, sometimes the covariance structure imposed by equation (2) may be theoretically impossible. For example, if units i and j are not connected but both are connected with unit k, namely, these three units form an open triangle, then the correlation between e_i and e_j will be bounded between $2 ρ^{2} - 1$ and 1. When $ρ$ is big enough (say, $> 0.71$ ), the correlation between e_i and e_j will not be zero. On the one hand, this implies that the proposed new model works only legitimately for relatively small $ρ$ (say, $< 0.71$ ). On the other hand, this limitation might be checked empirically by examining the residual correlation between the unconnected units in open triangles. If the correlation is big and statistically significant, then this might be a sign that the model is illy imposed or that there is a missing tie or other undetected contextual confounding between the two connected units. Regardless of the cause, one may address this problem by specifying a separate correlation coefficient for the errors of the unconnected units in open triangles. As to the second condition, the good news is that in practice, the dependence network is usually sparse. This is particularly true when the number of outgoing ties in a dependence network is capped by design. Because of the capping, the dependence network will be sparse and clustering will be limited (Leung 2020). As a result, the Wald test in the residual regression usually performs well.

One may use the Fisher’s (1921) transformation method to test the difference between correlation coefficients. Namely, for two error correlation coefficients, one can obtain a p value based on the z-score $\frac{z_{1} - z_{2}}{\sqrt{1 / (n_{1} - 3) + 1 / (n_{2} - 3)}}$ , where $z_{j} = \frac{1}{2} ln \frac{1 + ρ_{j}}{1 - ρ_{j}}$ represents a transformation of the estimated error correlation coefficient, and n_j the number of ties in the dependence network that fall into the corresponding error dependence. Note that this testing procedure is also approximate because it assumes independence between different types of error dependence.

The model may be simply estimated by OLS. As is known in the literature, the OLS will be unbiased and consistent (Greene 2008). But there are two drawbacks of doing so. First, if the multivariate model is true, then in large samples, the OLS standard errors will be larger and inefficient. Note that in small samples, there is no guarantee that the FGLS will be more efficient than the OLS. Second, sometimes the error dependence may be the center of interest that can inform important social processes, but using the OLS will certainly ignore this.

Sometimes, the covariance matrix $Ω$ may be noninvertible. One may compute the nearest positive definite matrix as an approximation (Higham 2002). Alternatively, one may estimate the model by the OLS and then just test the error dependence based on the residuals (without using them to estimate the outcome model). In this regard, the model estimation is robust.

Additional Comments

Below I discuss possible extensions of the new model and address concerns on its assumptions. First, the model may be extended to incorporate different types of dependence networks. (1) Weighted dependence. In a weighted dependence network, the tie values are continuous or categorical. One option is to binarize the network and then apply the methods outlined above. Another option is to use the weighted versions of asymmetric, mutual, and triadic dependence. For example, mutual dependence occurs when the tie values between two units are equal or close. Triadic dependence happens when two units have mutual dependence not only between themselves but also with other units. In addition, weighted centrality measures (Butts 2008) can be easily included in the regression to represent the mean dependence. (2) Multiplex dependence. When multiple dependence networks are available, one may create a single binary dependence network in which there is a tie between two units as long as there is a tie between them in any of the original networks. Alternatively, one may combine the multiple networks into a weighted one. To account for the mean dependence, one can use centrality measures in the original dependence networks or in the combined network. (3) Multilevel dependence. Outcome correlations may also occur in nested groups such as classes, grades, and schools. The first empirical example below will show how the new model can account for such multilevel dependence and network dependence simultaneously.

Second, the model assumptions may be relaxed in future development. (1) The model assumes the error dependence structure is known. However, this may not be the case. Like for solving a model selection problem, I suggest using theories to guide the specification of the error dependence structure. For directed dependence networks, one may start with more complicated forms of error dependence and gradually reduce it to simpler forms. For example, if triadic dependence is found to be statistically insignificant from mutual dependence, then the two forms of dependence may be combined. If the error dependence structure is misspecified, the FGLS estimate of the error dependence structure will be biased. Although the estimates in the outcome model will still be unbiased, their standard errors and the associated testing and inferences may be incorrect. This can also lead to that FGLS is less efficient than OLS.

In this article, I specify only three forms of error dependence. Future work may investigate other forms of error dependence such as cyclic or cliquish dependence that involves more units or indirectly connected units. Since the number of observations in the covariance matrix grows very quickly as n increases, there usually is a sufficient degree of freedom to estimate complex forms of error dependence. But for model conciseness, a few forms of error dependence usually suffice. Of course, for small networks, one should be careful not to specify too many error correlations.

(2) The model assumes that one unit can only be in one type of error dependence and prioritizes triadic dependence over mutual dependence and further over asymmetric dependence. For example, when a tie falls into both mutual and triadic error dependence, the model assigns the tie to triadic dependence. Future work may change such assignment.

(3) The dependence network may depend on the outcome which will cause a reverse causality issue. For example, suppose the outcome is happiness at school and the dependence network is friendship network. While friendship centrality can affect happiness, happiness may also affect centrality. One method that can help identify the causal effect of centrality is by using instrumental variables namely, variables that can only affect happiness indirectly through their effects on centrality (e.g., some random assignment of classroom seats). More generally, through a two-stage method, one first uses the instrumental variables (and exogenous covariates) to predict the network measures representing the mean dependence and then uses the predicted network measures in the outcome model to estimate the causal effect of the mean dependence (Wooldridge 2010).

Simulations

Error Dependence Only

The first set of simulations focus on examining patterns of error dependence. The number of units is set at 1,000. Half of the units are randomly assigned to a binary treatment ( $X_{1}$ ). A control variable is generated from a normal distribution: $X_{2} \sim N (0, 3)$ . The ties in the dependence network ( M ) are randomly formed with a probability of 0.01. The outcome ( Y ) follows a multivariate normal distribution as specified below. I conducted 500 simulations.

Y \sim MN (μ, σ^{2} Ω), where μ = 3 + 5 \times X_{1} + 7 \times X_{2}, σ^{2} = 4,

Ω_{i j} = \{\begin{array}{l} 1 for i = j, & ​ & ​ \\ ρ = 0.02 for i \neq j and M_{i j} = 1 or M_{j i} = 1, & ​ & ​ \\ 0 for i \neq j and M_{i j} = M_{j i} = 0. & ​ & ​ \end{array}

Panel A in Table 1 shows the simulation results. First, the FGLS provides unbiased estimates for both the covariate coefficients and the error correlation coefficient. Second, the mean standard errors of the estimates are very close to the standard deviations of the estimates. Hence, statistical inferences based on the FGLS are expected to be accurate. Indeed, the coverage rates of the 95 percent confidence intervals are all right at or around the target level.

Table 1.

Simulations With Error Dependence Only.

Variables	A. Homogenous Correlation				B. Heterogenous Correlation
Variables	Bias	SD	SE	Cov	Bias	SD	SE	Cov
Intercept	.003	.097	.097	.950	−.003	.105	.101	.922
X₁	.002	.124	.126	.956	−.002	.125	.126	.944
X₂	.000	.022	.021	.944	.000	.021	.021	.952
Error dependence	−.002	.010	.010	.958
Triadic correlation					−.003	.046	.046	.962
Mutual correlation					−.002	.014	.015	.938
Asymmetric correlation					−.002	.003	.003	.954

Note: N = 1,000. Bias shows the mean of the estimates minus the true parameter values, SD is the standard deviations of the coefficients, SE is the mean standard errors, and Cov is the coverage rates of the 95 percent confidence intervals. Number of simulations = 500.

I also conducted additional simulations (shown in Online Appendix Tables A1 and A2 [which can be found at http://smr.sagepub.com/supplemental/]) to compare OLS and FGLS. For the estimated coefficients (except the intercept) in the outcome model, the FGLS is slightly more efficient than the OLS. There is actually no bias in the OLS estimates, and the statistical inferences are also approximately correct. But the OLS estimates ignore the error correlation that may indicate some significant aspect of the data generation process.

In the second set of simulations, I allow the error correlation coefficients to vary by the forms of error dependence. I specify the correlation coefficients for triadic, mutual, and asymmetric error dependence at 0.02, 0.01, and 0.005, respectively. The ties in the dependence network ( M ) are randomly formed with a higher probability of 0.1.

Ω_{i j} = \{\begin{array}{l} 1 for i = j, \\ ρ_{t} = 0.02 for i \neq j \neq k and M_{i j} = M_{j i} = 1 and \sum_{k} I_{M_{i j k} = Δ} \geq 1, \\ ρ_{m} = 0.01 for i \neq j \neq k and M_{i j} = M_{j i} = 1 and \sum_{k} I_{M_{i j k} = Δ} < 1, \\ ρ_{a} = 0.005 for i \neq j and M_{i j} \neq M_{j i}, \\ 0 for i \neq j and M_{i j} = M_{j i} = 0. \end{array}

Panel B in Table 1 shows the simulation results. Once again, the FGLS provides unbiased estimates of the coefficients and also proper statistical inferences on the estimates. The average standard errors of the estimates are close to the actual standard deviations of the estimates, and the coverage rates of the 95 percent confidence intervals are all near to the nominal level.

Both Error Dependence and Mean Dependence

In these simulations, I allow selected features of the dependence network to affect the outcomes. To mimic that some units are more likely to correlate with others, the probabilities for treated units ( $X_{1} = 1$ ) and control units ( $X_{1} = 0$ ) to send out a dependence tie are set at 0.2 and 0.1, respectively. The outcomes are generated by a multivariate normal distribution as follows:

Y \sim MN (μ, σ^{2} Ω), with μ = 3 + 5 \times X_{1} + 7 \times X_{2} + 4 \times F_{1} + 2 \times F_{2}, σ^{2} = 4.

The mean dependence is represented by $F_{1}$ and $F_{2}$ , namely, the indegree (the number of incoming ties) and outdegree (the number of outgoing ties) in the dependence network. The covariates, $Ω$ , and Y are generated in the same way as in the second simulations shown above.

Table 2 shows the simulation results. The most striking finding is that when the mean dependence is ignored in the regressions, the estimates on all parameters have dramatic bias. The estimated coefficients for the covariates and the error correlations are both way off from the target values. The standard errors are also way bigger than the standard deviations. As a result, the coverage rates of the 95 percent confidence intervals all miss the target level.

Table 2.

Simulations With Both Error and Mean Dependence.

Variables	A. Ignoring Mean Dependence				B. Including Mean Dependence
Variables	Bias	SD	SE	Cov	Bias	SD	SE	Cov
Intercept	846.958	.142	3.527	0.000	.005	.712	.697	.946
X₁	406.974	.139	5.321	0.000	−.003	.397	.409	.960
X₂	−0.722	.022	0.885	1.000	−.002	.020	.021	.952
Mean dependence
Indegree					.000	.008	.008	.944
Outdegree					.000	.007	.008	.964
Error dependence
Triadic correlation	−0.014	.003	0.110	1.000	−.008	.096	.091	.940
Mutual correlation	−0.028	.001	0.019	1.000	−.002	.019	.019	.956
Asymmetric correlation	−0.007	.000	0.004	1.000	−.002	.004	.004	.948

In contrast, the regressions including the mean dependence provide proper estimates. The estimated coefficients for both the covariates and the error correlations are close to the target values. The average standard errors of the estimates are also close to the standard deviations of the estimated coefficients. The coverage rates of the 95 percent CIs are all around the target level. Overall, the simulation results show the importance of accounting for both the mean dependence and the error dependence and the capability of the new model to achieve this.

Examples

Estimating Causal Treatment Effect With Treatment Diffusion

The data for this example are from a smoking prevention intervention the author conducted in 2010–2011 in six middle schools in China. Selected classrooms from each school were assigned to one of four conditions, including control, random intervention targeting random students, central intervention targeting central students who could connect to most other classmates via friendship ties (Borgatti 2006), and group intervention targeting students and their closest friends in their classroom. In each treated classroom, a quarter of students were selected for the intervention. The intervention included distributing brochures to and holding a workshop for the treated students. In total, 3,445 students participated in the experiment.

All students took part in two surveys: one before the intervention and one after. In both surveys, students were asked to respond to 10 questions on attitude toward smoking. Each response not supporting smoking scores five points. The aggregate score for each student is used to measure the student’s attitude toward smoking. The goal is to study whether the intervention has an effect of enhancing treated students’ attitude against smoking. One problem is that students may share the intervention information with others. As such, simply comparing the treated with the untreated may provide a biased estimate of the treatment effect. In addition, students’ attitude may be correlated among those who have shared the intervention information or who are friends because of their communications or sharing unobserved common traits. The traditional way to model the data would be to use a multilevel model that assumes equal correlations among students in the same class, the same grade, or the same school. This assumption is likely inaccurate in reality.

In contrast, the new model provides a more effective way to model the data. Specifically, I use a lagged dependent variable model to analyze the data with the attitude score at the outcome survey as the dependent variable and the baseline attitude score as one of the covariates. Other baseline covariates include sex (1 = boy; 0 = girl), smoking status (1 = yes; 0 = no), academic ranking (1 = ranked top 10 in the class; 0 = otherwise), personality (1 = optimistic; 0 = not optimistic), family economic condition (1 = good; 0 = not good), friendship indegree (i.e., number of incoming friendship ties), friendship outdegree (i.e., number of outgoing friendship ties), and indicators for missing the outcome survey and for grades and schools. To account for nonrandom treatment assignment, I also control for students’ propensity score for receiving the intervention, where I use a logit model to predict the propensity of a student receiving the intervention based on covariates (see Online Appendix Table A3 [which can be found at http://smr.sagepub.com/supplemental/]). In the outcome survey, students were asked to report from which schoolmates they have seen and read the prevention brochure. The responses are used to construct a treatment diffusion network in each school.

I specified three models. In the first one, the error terms are correlated among treatment diffusion partners. I distinguish triadic, mutual, and asymmetric error dependence. In the second specification, to tease out the direct treatment effect from the treatment diffusion effects, I additionally control for two measures based on the treatment diffusion network: (1) information outdegree (i.e., number of students from whom a student has obtained the intervention information) and (2) information indegree (i.e., number of students with whom a student has shared the intervention information). In the third specification, the error terms are correlated between treatment diffusion partners as well as friends.

{Attitude}_{2} = α \times {Attitude}_{1} + Treatment Diffusion \times β + X γ + ε, ε \sim MN (0, σ^{2} Ω) .

Table 3 shows the results. (1) The baseline attitude is positively and significantly correlated with the outcome attitude ( $p < .001$ ). This is true across all three models. (2) The intervention enhanced treated students’ attitude against smoking. However, the effect is not statistically significant at the 5 percent level. But after accounting for treatment diffusion, the effect size increases from 0.14 in model 1 to 0.31 and 0.43 in the latter two models. Such increases (although not statistically significant) hint that the estimated treatment effect may be biased downward if treatment diffusion is not accounted for. (3) The treatment diffusion measures in the outcome models represent the mean dependence. Receiving treatment information from each additional student increases a student’s attitudinal score by about 0.3 points ( $p < 0.01$ ). Hence, although the direct treatment effect is not statistically significant, the indirect treatment effect generated by treatment diffusion turns out to be statistically significant, probably because peer-passed information contains more social pressure for conformity. (4) In the first two models, the estimated correlation coefficient for the asymmetric error dependence (but not for the triadic and the mutual error dependence) is statistically significant (est. = 0.06, $p < .001$ ). The third model shows all three forms of error dependence are statistically significant (all $p < .001$ ). This difference shows the importance of accounting for additional error dependence that is generated by friendship networks (beyond treatment diffusion networks). The mutual error dependence is the largest (0.1) followed by the triadic error dependence (0.08) and then the asymmetric error dependence (0.04). The difference between the mutual and the triadic error dependence is statistically insignificant ( $p = .42$ ), but both of them are statistically significant from the asymmetric error dependence (both $p < .01$ ). Overall, the analyses show that treatment diffusion matters significantly for students’ attitude, and students’ attitude is significantly correlated even after covariates and treatment diffusion have been controlled for.

Table 3.

Estimating Causal Treatment Effect on Attitude With Treatment Diffusion.

Variables	(1) FGLS		(2) FGLS		(3) FGLS		(4) CSE			(5) HLM		(6) FGLS
Variables	Coef.	SE	Coef.	SE	Coef.	SE	Coef.	SE		Coef.	SE	Coef.	SE
Baseline attitude	0.18	0.02***	0.18	0.02***	0.17	0.02***	0.18	0.02***		0.17	0.02***	0.17	0.02***
Treatment status	0.14	0.55	0.31	0.60	0.43	0.60	0.22	0.49		0.29	0.59	0.45	0.60
Treatment propensity	−0.97	1.42	−1.64	1.58	−1.52	1.58	−1.54	1.42		−1.47	1.56	−1.64	1.57
Treatment conditions
T1: Control	−0.59	0.65	−0.46	0.66	−0.55	0.69	−0.61	0.90		−0.63	0.88	−0.65	0.88
T3: Central	0.10	0.57	0.19	0.58	0.20	0.61	0.11	0.84		0.18	0.81	0.24	0.81
T4: Group	−0.66	0.57	−0.62	0.57	−0.61	0.61	−0.74	0.90		−0.75	0.80	−0.63	0.81
Covariates
Smoking	−1.86	0.65**	−1.93	0.65**	−1.81	0.66**	−1.99	0.69**		−2.06	0.64**	−1.91	0.65**
Boy	−0.54	0.44	−0.40	0.44	−0.58	0.48	−0.42	0.52		−0.39	0.42	−0.56	0.46
Ranking	4.09	0.50***	4.14	0.50***	4.01	0.51***	4.17	0.46***		4.22	0.49***	4.11	0.50***
Personality	−0.25	0.38	−0.26	0.38	−0.43	0.38	−0.26	0.33		−0.35	0.38	−0.40	0.38
Family condition	0.39	0.68	0.32	0.68	0.24	0.68	0.27	0.64		0.24	0.68	0.19	0.67
Friendship indegree	0.08	0.05	0.08	0.05	0.08	0.05	0.08	0.05		0.09	0.05	0.08	0.05
Friendship outdegree	0.06	0.08	0.03	0.08	0.05	0.08	0.03	0.09		0.10	0.08	0.08	0.08
No second survey	−35.13	0.58***	−34.54	0.61***	−34.49	0.62***	−34.45	0.68***		−34.28	0.61***	−34.32	0.60***
Grade 8	0.76	0.49	0.63	0.49	0.72	0.54	0.55	0.72
Grade 9	0.81	0.53	0.66	0.53	0.87	0.59	0.63	0.65
School 2	−4.29	0.72***	−4.35	0.72 ***	−4.34	0.80***	−4.22	1.03***
School 3	−1.27	0.65*	−1.34	0.65*	−1.43	0.72*	−1.32	0.91
School 4	−4.90	0.71***	−4.81	0.71***	−4.89	0.79***	−4.74	1.24***
School 5	−2.96	0.75***	−3.08	0.75***	−3.29	0.83***	−3.38	0.98***
School 6	−1.42	0.68*	−1.54	0.68*	−1.58	0.75*	−1.65	0.97
Treatment diffusion
Information outdegree (receiver)			0.29	0.09**	0.27	0.09**	0.29		0.10**	0.27	0.09**	0.27	0.09**
Information indegree (spreader)			−0.04	0.09	−0.03	0.09	0.01		0.08	−0.01	0.09	−0.03	0.09
Error dependence
Triadic	0.08	0.07	0.09	0.07	0.08	0.02***						0.11	0.02***
Mutual	0.04	0.04	0.04	0.04	0.10	0.02***						0.12	0.02***
Asymmetric	0.06	0.01***	0.06	0.01***	0.04	0.01***						0.06	0.01***
P(Triadic = mutual)	0.62		0.50		0.42							0.68
P(Triadic = asymmetric)	0.74		0.67		0.01							0.00
P(Mutual = asymmetric)	0.69		0.58		0.00							0.00
School										2.53	1.75**	0.02	0.00***
Grade										0.00	0.00	0.02	0.00***
Class										3.40	1.04***	0.04	0.00***
Variance(residual)					122.95					119.20		125.85
Hausman Test (χ², p)					1.95	1.00

Note: N = 3,445. In models 1 and 2, outcomes are assumed to be correlated between diffusion partners. In model 3, the correlation also occurs between friends. Hausman test compares model 3 with the OLS estimates. Model 4 has fixed effects for grades and schools, and standard errors are clustered by class. Model 5 is a hierarchical linear model with random effects at the school, grade, and class levels (their variances are shown). Model 6 combines model 3 and model 5 by including correlations between students who are not friends or diffusion partners but are in the same school, grade, or class, respectively. For conciseness, the intercepts are not shown. FGLS = feasible generalized least squares; OLS = ordinary least squares. Abbreviation: CSE, clustered standard errors; HLM, hierarchical linear model.

*p < .05. **p < .01. ***p < .001.

To compare the new model with traditional models, I also fit two other models. The first one (shown in model 4 of Table 3) has fixed effects at the grade and school levels and standard errors clustered at the class level. The second one is a multilevel model (model 5 of Table 3) with random effects at the school, grade, and class levels, respectively. I will compare models 4 and 5 with model 3 to illustrate the major differences. First, the estimated coefficients and their standard errors differ across the models. The differences can go in either direction, being larger or smaller, and that the differences in the estimated coefficients and their standard errors do not necessarily go in the same direction at the same time. None of the differences seem to be statistically significant. But we cannot generalize the results to other cases. It could be the case that the differences happen to be large enough to alter statistical inferences. For example, the coefficients for academic ranking are larger in both models 4 and 5 while the standard errors are both smaller. As a result, it is more likely to reject the null hypothesis in either of these two models. Second, the interpretations of the models are different. Model 4 assumes that students’ attitude is correlated within classroom only, and the correlation is the same between any two students who are in the same classroom. This seems unrealistic either for friends who are in the same classroom or for friends who are in different classrooms but the same school. Model 5 assumes that students’ attitude is correlated by school, grade, and class, respectively, and the correlation is the same for any two students who are in the same cluster. For example, model 5 shows that students’ attitude in the same school and the same class has a covariance of 2.53 ( $p < .01$ ) and 3.4 ( $p < .001$ ), respectively. However, students’ attitudes in the same school or in the same classroom are not necessarily correlated or correlated to the same extent. Hence, the estimated covariances may be only very rough revelations of the true correlations. In fact, as shown in model 3, the attitudes of students who are friends or diffusion partners are statistically significantly correlated, regardless whether they are in the same classroom or the same grade. Multiplying the variance of the residuals by the error correlation coefficients in model 3, we can construct the corresponding covariances to compare with those shown in model 5. My analyses show that the attitudes of the students who are triadic friends, mutual friends, and asymmetric friends have a covariance that is 10, 12.12, and 4.39, respectively. The magnitudes of the covariances are notably larger than those shown in model 5, indicating much stronger correlations in friends’ attitudes.

One way to incorporate features of the multilevel model into the new model is to add correlation coefficients to represent correlations between students who are not friends or diffusion partners but are in the same school, grade, or classroom. See Figure 1 for an illustration. In model 6 of Table 3, I add three additional correlation coefficients to represent correlations between nonfriends or nondiffusion partners who are in the same classroom, in the same grade but a different classroom, and in a different grade but the same school, respectively. The results remain largely the same as in model 3. But patterns of the error dependence are quite different. Model 6 shows that the attitudes of friends or diffusion partners are significantly correlated (all $p < .001$ ) and at a higher level (although not significantly different from those in Model 3). The attitudes of nonfriends or nondiffusion partners who are in the same school, grade, and classroom are also statistically significantly correlated (all $p < .001$ ). As expected, the correlations are substantially smaller than the correlations between friends or diffusion partners.

Figure 1.

The new model with various forms of error dependence. Note: In this example, the new model includes correlation coefficients that can account for triadic, mutual, and asymmetric error dependence ( $ρ_{t}$ , $ρ_{m}$ , and $ρ_{a}$ ) that are based on patterned connections in friendship networks. It also incorporates features of the multilevel model by including correlation coefficients ( $ρ_{1}$ , $ρ_{2}$ , and $ρ_{3}$ ) to represent correlations between students who are not connected but are still in the same classroom, grade, and school.

The new model can be applied to modeling other similar outcomes as well for example, students’ test scores. The test scores can depend on students’ positions in their friendship network (e.g., because central students may get more help from peers and teachers). The test scores can also be correlated among friends as well as among nonfriend students who are in the same classroom, the same grade, or the same school.

Wealth and Network Centrality Among Florentine Families

The data for this example are about wealth and social networks among 16 Florentine families in the Renaissance period collected by John Padgett (Padgett and Ansell 1993). The goal is to study whether family wealth is determined by the political power and the network centrality held by the families and whether wealth is additionally correlated among families with social ties. The data do not allow one to pin down causality, but are useful for illustrating how to incorporate multiple dependence networks and comparing the new model with spatial regression models.

The dependent variable is family net wealth in 1,427 (in thousands of lira). Political power is measured by the number of seats on the civic council held by the families. The two social networks are undirected, including business ties and marriages ties among the 16 families. From the business network, I extract indegree centrality and eigenvector centrality. From the marriage network, I extract indegree centrality, betweenness centrality, and eigenvector centrality. Indegreee measures the number of ties a family has in a network. Betweenness centrality measures the brokerage power of a family, and eigenvector centrality measures the extent to which a family is connected to important others (Wasserman and Faust 1994). To note, in general, one should be careful not to include multiple centrality measures in the same model simultaneously because of potential multicollinearity among the centrality measures.

I specified two models to predict family wealth. In the first model, the dependence network is only based on the business network. Thus, the wealth of two families are correlated if they have a business tie. Because wealth can also spread across families by marriage, in the second model, I also allow family wealth to be correlated if two families have a marriage tie. Therefore, two families’ wealth are correlated if they have a tie in either network. In the second model, I also distinguish two forms of error dependence: triadic versus mutual. Because the two dependence networks are undirected, there is no asymmetric error dependence.

Wealth = Business Net Centrality \times α + Marriage Net Centrality \times β + γ \times Political Power + ε .

Models 1 and 2 in Table 4 show the results. In both models, the more business ties a family has, the wealthier the family is ( $p < .05$ ). However, eigenvector centrality in the business network is significantly negatively correlated with family wealth ( $p < .05$ ). Hence, with everything else being equal, families connected to more central families in the business network are relatively poorer. This could result from a disassortative mixing process where families with fewer business collaborations (and so less wealth) are more likely to seek connections with families with more business collaborations. Degree centrality in the marriage network is positively correlated with family wealth ( $p < .1$ ). This could indicate that marriage ties can increase family wealth or that wealthier families are more attractive marriage partners. Betweenness centrality in the marriage network is negatively and statistically significantly ( $p < .05$ ) correlated with family wealth. As noted in prior research (Breiger and Pattison 1986; Padgett and Ansell 1993), these families were engaging in “faction” conflicts. Hence, this result could be due to that the wealthy families tended to marry within their own factions, which led the less wealthy families to more likely marry across factions and to have higher betweenness in the marriage network.

Table 4.

Wealth and Network Centrality Among Florentine Families.

Variables	(1) FGLS		(2) FGLS		(3) SAA		(4) SAA
Variables	Coef.	SE	Coef.	SE	Coef.	SE	Coef.	SE
Political power (council seats)	0.29	0.39	0.29	0.37	0.65	.21***	0.42	0.17**
Business network centrality
Degree	42.97	11.79***	47.13	15.55**			40.30	7.27***
Eigenvector	−75.01	18.75***	−80.27	26.54**			−68.09	12.29***
Marriage network centrality
Degree	60.07	26.75*	48.60	24.78*			47.47	14.47***
Betweenness	−68.31	22.34**	−63.43	21.37**			−65.10	11.56***
Eigenvector	−31.86	27.36	−20.97	25.62			−18.58	12.83
Error dependence (ρ)	−0.36	0.25			−1.17	.20***	−0.99	0.26***
Triadic			−0.40	0.20*
Mutual			−0.22	0.56
Outcome dependence (λ)					0.43	.27	−0.32	0.23
N	16		16		16		16

Note: Betweenness and eigenvector centrality measures are standardized. Models 3 and 4 are spatial autoregressive and autocorrelation models. For conciseness, the intercepts are not shown. FGLS = feasible generalized least squares. Abbreviation: SAA, Spatial autoregressive and autocorrelation model.

*p < .1. **p < .05. ***p < .01.

The error dependence is not statistically significant, even at the 10 percent level in the first model, whereas in the second model, the triadic error dependence (but not the mutual error dependence) is statistically significant at the 10 percent level. Hence, the multiplex dependence network captures the unobserved wealth correlation across the families better. The error correlations are negative, which may be a result of disassortative mixing by which families with different amounts of wealth are more likely to be connected through social ties.

For comparison, I also fit two spatial autoregressive and autocorrelation models. The weight matrix W is a row-normalized version of the dependence network in which two families have a tie if they have either a business tie or a marriage tie. The first one (model 3 in Table 4), as is typically done in the literature, includes only covariates but not centrality measures of the dependence network. The second one (model 4) includes the centrality measures. In both models, there is significant evidence for error dependence ( $p < .01$ ), but there is no significant evidence for autoregressive outcome dependence ( $p > .1$ ). Compared to the model in this article, the spatial regression models typically (1) omit the mean dependence (i.e., omitting the centrality measures of the dependence network) and (2) assume only one form of error dependence. In contrast, the new model can account for both the mean dependence and the error dependence and also allows for multiple forms of error dependence. The downside of the new model is that it does not model autoregressive outcome dependence.

Are Central Managers More Likely to Be Sought for Advice?

The third example aims to show that the error dependence may be an indication of the effects of omitted variables. The data were collected by Krackhardt (1987). It includes two directed social networks among 21 managers in a company. In the advising network, there is a tie from A to B if A has asked B for advice. In the friendship network, there is a tie from A to B if A has nominated B as a friend. Managers’ attributes used in this article include the length of tenure in the company, level in the corporate hierarchy (1 = manager, 2 = vice president, and 3 = CEO), and departmental affiliation (1–4 with CEO being recoded into 1).

The goal of this example is to study whether managers with many friends are more likely to be sought after for advice. I treat indegree in the advising network as the dependent variable and indegree in the friendship network as one of the predictors. Past research has also shown that people with brokerage positions are more likely to have innovative ideas (Burt 2004). Thus, I also include betweenness centrality in the friendship network as a predictor. I specify two models. In the first model, I use tenure, level, and friendship indegree and betweenness to predict the advising indegree. The error terms are assumed to be correlated among friends. In the second model, I additionally control for the managers’ departmental affiliations.

Advising Indegree = α \times Friendship Indegree + β \times Friendship Betweenness + X γ + ε .

Table 5 presents the results. The first model shows that as compared to the CEO, the vice presidents are significantly more likely to provide advice to others ( $p < .05$ ). Managers with more friends are significantly less likely to be the advice provider ( $p < .001$ ) whereas managers with a higher betweenness centrality are significantly more likely to be sought after for advice ( $p < .001$ ). After these effects being accounted for, advising actions among friends are still significantly correlated (est. = 0.55, $p < .05$ ). The correlation may have been driven by some unmeasured factors that the friends have in common. Indeed, in the second model, once departmental affiliations are controlled for, the error dependence is no longer statistically significant ( $p > .05$ ). Hence, a significant error dependence sometimes can be a signal that important covariates have been omitted from a model.

Table 5.

Advising and Friendship Among Corporate Managers.

Variables	(1) FGLS		(2) FGLS
Variables	Coef.	SE	Coef.	SE
Tenure	0.01	0.01	0.03	0.02
Level (1)
2	0.78	0.27*	1.37	0.23***
3	1.52	1.05	0.34	0.59
Friendship network
Indegree	−0.37	0.04***	0.04	0.15
Betweenness	0.60	0.14***	−0.21	0.15
Dept (1)
2			−1.15	0.20***
3			−0.40	0.32
4			0.41	0.27
Error dependence	0.55	0.11***	−0.25	0.13
N	21		21

Note: All centrality measures are standardized. For conciseness, the intercepts are not shown. FGLS = feasible generalized least square.

*p < .05. **p < .01. ***p < .001.

Predicting Popularity Among High School Students

The data for the last example include an undirected friendship network of 1,461 students in a U.S. high school and student covariates such as sex (0 = female; 1 = male), race (0 = black; 1 = white; 2 = others), and grade (7–12; Goodreau et al. 2008; Handcock et al. 2003; Resnick et al. 1997). The goal of this example is to predict students’ indegree and closeness centrality in the friendship network, respectively. The error terms are assumed to be correlated among friends, and mutual and triadic error dependence are differentiated. The specification of the error dependence allows the new model to shed lights on endogenous tie formation processes.

Models 1 in Table 6 predicts friendship indegree. The results show that female students, white students, and students in the eighth grade have significantly more friends than the reference groups (all $p < .01$ ). The estimate on the triadic error dependence is 0.27 $p < .001$ ) and differs significantly ( $p < .001$ ) from the estimate on mutual error dependence (estimate = 0.04, $p > .05$ ). The significant triadic error dependence suggests that the number of friends a student has is significantly correlated with the number of friends the student’s friend’s friend has.

Table 6.

Predicting Popularity Among School Students.

Variables	(1) Indegree		(2) Closeness		Variables	(3) ERGM
Variables	Coef.	SE	Coef.	SE	Variables	Coef.	SE
Male	−0.22	0.05***	−0.12	0.05*	Male	−0.18	0.05***
Race (black)					Race (black)
White	0.39	0.07***	0.39	0.07***	White	0.39	0.06***
Other	0.11	0.10	0.18	0.10	Other	0.13	0.11
Grade (7)					Grade (7)
8	0.28	0.10**	0.44	0.10***	8	0.26	0.10**
9	−0.14	0.09	−0.35	0.09***	9	−0.10	0.09
10	−0.03	0.09	0.01	0.09	10	0.04	0.07
11	0.07	0.09	−0.01	0.09	11	0.05	0.09
12	−0.10	0.10	−0.14	0.10	12	−0.08	0.10
Error dependence					Network structure
Triadic	0.27	0.05***	0.98	0.01***	GWESP	2.19	0.08***
Mutual	0.04	0.04	0.97	0.01***	GWDSP	−0.35	0.05***
P(Triadic = Mutual)	0.00		0.51		GWDEGREE	−1.14	0.17***
Model time (seconds)	4.47		3.88			55.24
N	1,461		1,461

Note: Indegree and closeness are standardized. Model 3 models tie formation processes. For conciseness, the coefficients for the intercepts (and edges) are not shown. ERGM = Exponential random graph model. Abbreviation: GWESP, Geometrically weighted edge-wise shared partners; GWDSP, Geometrically weighted dyad-wise shared partners; GWDEGREE, Geometrically weighted degree distribution.

*p < .05. **p < .01. ***p < .001.

Model 2 in Table 6 predicts closeness centrality. Female students, white students, and students in the eighth grade are socially closer to peers in their school than their counterparts are (all $p < .05$ ) while compared to students in the seventh grade students in the ninth grade are socially more distant from peers ( $p < .001$ ). Triadic and mutual error dependence are both substantial (with values close to unity) and statistically significant (both $p < .001$ ), and the two types of error dependence do not differ significantly ( $p = .51$ ). The substantially larger and more significant error dependence in model 2 is expected because by construction, closeness centrality is to be correlated more strongly among friends than indegree centrality is.

Model 3 in Table 6 shows the results of an ERGM that is fit on the friendship network. Similar to model 1, this model shows that female students, white students, and students in the eighth grade have significantly more friends than their counterparts (all $P < 0.01$ ). The positive and significant coefficient for “geometrically weighted edge-wise shared partners (GWESP)” ( $P < 0.001$ ) indicates that friendship ties are significantly more likely to be transitive, that is, friends of friends are more likely to be friends. The negative and significant coefficient for “GWDSP” ( $p < .001$ ) indicates that friendship ties that do not close a triangle are significantly less likely to occur. The results on “GWDSP” and “GWESP” together provide strong evidence that friendship ties tend to be transitive. According to Hunter (2007) and Lusher, Koskinen, and Robins (2013), a negative coefficient for “geometrically weighted degree distribution (GWDEGREE)” (as it is in this case, est. = −1.14, $p < .001$ ) indicates preferential attachment, namely, popular students tend to attract disproportionately more friendship ties than others.

In short, the new model is able to reveal the same set of covariates that significantly predict student popularity as in the ERGM. The significant triadic error dependence in model 1 is also indicative of friendship transitivity. However, we should also be aware that the new model and ERGMs have some fundamental differences. The new model is developed to model selected macro features of a social network while ERGMs specifically aim at modeling micro tie formation processes. The former takes a vector as the dependent variable while the latter a whole network. The new model is computationally faster and more stable but lacks the ERGM’s capability of modeling covariate effects on incoming and outgoing ties simultaneously (for directed networks) and more nuanced endogenous tie formation processes. Nonetheless, the new model and ERGMs can be complementary to one another. When a vector of network feature instead of the whole network is more suitable as the dependent variable, the new model is more useful.

Conclusion and Discussion

In this study, I introduce a new multivariate regression model for analyzing outcomes with network dependence. I highlight the importance of accounting for both mean dependence and error dependence. Ignoring mean dependence may lead to severe bias in the estimated covariate effects. Ignoring error dependence does not impair the consistency of the estimated covariate effects. But inferences on the estimates may be less efficient, and extra outcome correlations are ignored that may be informative of important data generation processes.

I proposed an FGLS estimator to estimate the new model. The estimator is shown to be computationally fast and robust. I also show the distinct features of the new model in comparison to previous models such as multilevel models, spatial regression models, and ERGMs. In particular, one may view the current model as a modification of the spatial error model with the error structure in the current model being specified in a more intuitive and flexible way. To reiterate, the new model is not meant to replace prior models as they have different strengths depending on the context.

Simulation results suggest that the new model and method provide unbiased estimates and correct inferences on both the covariate effects and the error dependence. The four empirical examples illustrate the wide applicability of the new model. To facilitate future research, I have incorporated the model and method into an R package “fglsnet” for public use (see the author’s website for more information).

Of course, the new model and method can be extended in several ways. I have discussed some of the extensions when introducing the model above. Here I I would like to point out two more extensions. First, more work is needed to check the robustness and consistency of the model and method in different scenarios. Second, sometimes data on the dependence network may be unavailable. In that case, one may need to locate data that can approximate the dependence network. For example, one may use friendship network to approximate treatment diffusion network assuming diffusion tends to follow friendship ties. One can also use the latent space model (Hoff, Raftery, and Handcock 2002) to impute the dependence network based on similarity between units.

Supplemental Material

Supplemental Material, sj-docx-1-smr-10.1177_00491241211031263 - A Tale of Twin Dependence: A New Multivariate Regression Model and an FGLS Estimator for Analyzing Outcomes With Network Dependence

Supplemental Material, sj-docx-1-smr-10.1177_00491241211031263 for A Tale of Twin Dependence: A New Multivariate Regression Model and an FGLS Estimator for Analyzing Outcomes With Network Dependence by Weihua An in Sociological Methods & Research

Footnotes

Acknowledgments

The author would like to thank the Graduate School of Arts and Sciences, the Multidisciplinary Program in Inequality and Social Policy, the Fairbank Center for Chinese Studies, and the Institute for Quantitative Social Science, all at Harvard University, for the financial support for data collection. The author would also like to thank the reviewers for their helpful comments.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Weihua An

Supplemental Material

The supplemental material for this article is available online.

References

Weihua

. 2015. “Instrumental Variables Estimates of Peer Effects in Social Networks.” Social Science Research 50:382–94.

Anselin

Luc

Bera

Anil K.

Florax

Raymond

Yoon

Mann J.

. 1996. “Simple Diagnostic Tests for Spatial Dependence.” Regional Science and Urban Economics 26:77–104.

Borgatti

Stephen P.

2006. “Identifying Sets of Key Players in a Network.” Computational, Mathematical and Organizational Theory 12(1):21–34.

Breiger

Ronald L.

Pattison

Philippa E.

. 1986. “Cumulated Social Roles: The Duality of Persons and Their Algebras.” Social Networks 8:215–56.

Breusch

T. S.

1978. “Testing for Autocorrelation in Dynamic Linear Models.” Australian Economic Papers 17:334–55.

Burt

Ronald S.

2004. “Structural Holes and Good Ideas.” American Journal of Sociology 110:349–99.

Butts

C. T.

2008. “Social Network Analysis with SNA.” Journal of Statistical Software 24:1–51.

Cameron

A. Colin

Gelbach

Jonah B.

Miller

Douglas L.

. 2011. “Robust Inference with Multiway Clustering.” Journal of Business & Economic Statistics 29:238–49.

Cameron

A. Colin

Miller

Douglas L.

. 2015. “A Practitioner’s Guide to Cluster-robust Inference.” Journal of Human Resources 50:317–72.

10.

Dow

Malcolm M.

Burton

Michael L.

White

Douglas R.

. 1982. “Network Autocorrelation: A Simulation Study of a Foundational Problem in Regression and Survey Research.” Social Networks 4:169–200.

11.

Durbin

1970. “Testing for Serial Correlation in Least-Squares Regressions when Some of the Regressors Are Lagged Dependent Variables.” Econometrica 38:410–21.

12.

Fisher

R. A.

1921. “On the Probable Error of a Coefficient of Correlation Deduced from a Small Sample.” Metron 1:3–2.

13.

Freeman

L. C.

1978. “Centrality in Social Networks: Conceptual Clarification.” Social Networks 1:215–39.

14.

Gil-Mendieta

Schmidt

(1996). The political network in Mexico. Social Networks, 18(4), 355–381.

15.

Godfrey

L. G.

1978. “Testing against General Autoregressive and Moving Average Error Models when the Regressors Include Lagged Dependent Variables.” Econometrics 46:1293–301.

16.

Goodreau

Steven M.

Handcock

Mark S.

Hunter

David R.

Butts

Carter T.

Morris

Martina

. 2008. “A Statnet Tutorial.” Journal of Statistical Software 24(9):1–26.

17.

Greene

William H.

2008. Econometric Analysis. 6th ed. Hoboken, NJ: Pearson Prentice Hall.

18.

Guo

Guang

Zhao

Hongxin

. 2000. “Multilevel Modeling for Binary Data.” Annual Review of Sociology 26:441–62.

19.

Handcock

M. S.

Hunter

D. R.

Butts

C. T.

Goodreau

S. M.

Morris

. 2003. “statnet: Software Tools for the Statistical Modeling of Network Data.” Retrieved July 30, 2021 (http://statnet.org).

20.

Hansen

Christian B.

2007. “Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data when T Is Large.” Journal of Econometrics 141:597–620.

21.

Hanushek

Eric A

Jackson

John E

. 1977. “Models with Discrete Dependent Variables.” Pp. 179–216 in Statistical Methods for Social Scientists. edited by Peter

H. Rossi

. Cambridge, MA: Academic Press.

22.

Higham

Nicholas J.

2002. “Computing the Nearest Correlation Matrix–A Problem from Finance.” IMA Journal of Numerical Analysis 22:329–43.

23.

Hoff

Peter D

Raftery

Adrian E.

Handcock

Mark S.

. 2002. “Latent Space Approaches to Social Network Analysis.” Journal of the American Statistical Association 97:1090–98.

24.

Hunter

David R.

2007. “Curved Exponential Family Models for Social Networks.” Social Networks 29:216–30.

25.

Hunter

David R.

Handcock

Mark S.

. 2006. “Inference in Curved Exponential Family Models for Networks.” Journal of Computational and Graphical Statistics 15(3):565–83.

26.

Krackhardt

David

. 1987. “Cognitive Social Structures.” Social Networks 9:109–34.

27.

Langford

Eric

Schwertman

Neil

Owens

Margaret

. 2001. “Is the Property of Being Positively Correlated Transitive?” The American Statistician 55:322–25.

28.

Leenders

Roger Th AJ.

2002. “Modeling Social Influence through Network Autocorrelation: Constructing the Weight Matrix.” Social Networks 24:21–47.

29.

Leifeld

Philip

Cranmer

Skyler J.

. 2017. “TNAM: Temporal Network Autocorrelation Models.” Retrieved July 30, 2021. (https://cran.r-project.org/src/contrib/Archive/tnam/).

30.

Leung

Michael P.

2020. “Treatment and Spillover Effects under Network Interference.” Review of Economics and Statistics 102:368–80.

31.

Lusher

Dean

Koskinen

Johan

Robins

Garry

. 2013. Exponential Random Graph Models for Social Networks: Theory, methods, and Applications. New York: Cambridge University Press.

32.

Morris

Martina

Handcock

Mark S.

Hunter

David R.

. 2008. “Specification of Exponential-family Random Graph Models: Terms and Computational Aspects.” Journal of Statistical Software 24:1548.

33.

Mustillo

Thomas

Mustillo

Sarah A.

. 2012. “Party Nationalization in a Multilevel Context: Where’s the Variance?” Electoral Studies 31:422–33.

34.

O’Malley

A. James

Elwert

Felix

Niels Rosenquist

Zaslavsky

Alan M.

Christakis

Nicholas A.

. 2014. “Estimating Peer Effects in Longitudinal Dyadic Data Using Instrumental Variables.” Biometrics 70:506–15.

35.

O’Malley

James A.

Marsden

Peter V.

. 2008. “The Analysis of Social Networks.” Health Services and Outcomes Research Methodology 8(4):222–69.

36.

Padgett

John F.

Ansell

Christopher K.

. 1993. “Robust Action and the Rise of the Medici, 1400-1434.” American Journal of Sociology 98:1259–319.

37.

Rabe-Hesketh

Sophia

Skrondal

Anders

. 2008. Multilevel and Longitudinal Modeling Using Stata. College Station, TX: STATA Press.

38.

Raudenbush

Stephen W.

Bryk

Anthony S.

. 2002. Hierarchical Linear Models: Applications and Data Analysis Methods. Thousand Oaks, CA: Sage.

39.

Resnick

M. D.

Bearman

P. S.

Blum

R. W.

Bauman

K. E.

Harris

K. M.

Jones

Tabor

Beuhring

Sieving

R. E.

Shew

Ireland

Bearinger

L. H.

Udry

J. R.

. 1997. “Protecting Adolescents from Harm: Findings from the National Longitudinal Study on Adolescent Health.” Journal of the American Medical Association 278:823–32.

40.

Robins

Pattison

Wang

. 2009. “Closure, Connectivity and Degrees: New Specifications for Exponential Random Graph (p*) Models for Directed Social Networks.” Social Networks 31(2):105–17.

41.

Robins

Snijders

Wang

Handcock

Pattison

. 2007. “Recent Developments in Exponential Random Graph (p*) Models for Social Networks.” Social Networks 29(2):192–215.

42.

Sampson

Robert J.

Raudenbush

Stephen W.

Earls

Felton

. 1997. “Neighborhoods and Violent Crime: A Multilevel Study of Collective Efficacy.” Science 277:918–24.

43.

Snijders

Tom A. B.

Bosker

Roel J.

. 2012. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. 2nd ed. Thousand Oaks, CA: Sage.

44.

Snijders

Tom A. B.

Pattison

P. E.

Robins

Garry L.

Handcock

Mark S.

. 2006. “New Specifications for Exponential Random Graph Models.” Sociological Methodology 36:99–153.

45.

Strauss

David

Ikeda

Michael

. 1990. “Pseudolikelihood Estimation for Social Networks.” Journal of the American Statistical Association 85(409):204–12.

46.

Suh

Chan S.

Shi

Yongren

Brashears

Matthew E.

. 2017. “Negligible Connections? The Role of Familiar Others in the Diffusion of Smoking among Adolescents.” Social Forces 92:423–48.

47.

VanderWeele

Tyler J.

Weihua

. 2013. “Social Networks and Causal Inference.” Pp. 353–74 in Handbook of Causal Analysis for Social Research, edited by Morgan

. New York: Springer.

48.

Wasserman

Stanley

Faust

Katherine

. 1994. Social Network Analysis: Methods and Applications. Cambridge, MA: Cambridge University Press.

49.

Wasserman

Stanley

Pattison

Philippa E.

. 1996. “Logit Models and Logistic Regressions for Social Networks: I. An Introduction to Markov Graphs and p*.” Psychometrika 61(3):401–25.

50.

Wooldridge

Jeffrey M.

2010. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: The MIT Press.

51.

Yamaguchi

Kazuo

. 2003. “A Liang–Zeger Method for Modeling Dyadic Interdependence in the Analysis of Social Networks.” Sociological Methodology 33:343–80.

52.

Zeileis

Achim

. 2006. “Object-oriented Computation of Sandwich Estimators.” Journal of Statistical Software 16:1–16.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB