Comparing Egocentric and Sociocentric Centrality Measures in Directed Networks

Abstract

Egocentric networks represent a popular research design for network research. However, to what extent and under what conditions egocentric network centrality can serve as reasonable substitutes for their sociocentric counterparts are important questions to study. The answers to these questions are uncertain simply because of the large variety of networks. Hence, this paper aims to provide exploratory answers to these questions by analyzing both empirical and simulated data. Through analyses of various empirical networks (including some classic albeit small ones), this paper shows that egocentric betweenness approximates sociocentric betweenness quite well (the correlation is high across almost all the networks being examined) while egocentric closeness approximates sociocentric closeness only reasonably well (the correlation is a bit lower on average with a larger variance across networks). Simulations also confirm this finding. Analyses further show that egocentric approximations of betweenness and closeness seem to work well in different types of networks (as featured by network size, density, centralization, reciprocity, transitivity, and geodistance). Lastly, the paper briefly presents three ideas to help improve egocentric approximations of centrality measures.

Keywords

egocentric networks measurement error sampling betweenness closeness

Introduction

An egocentric network is a sub-network centered on a focal unit that typically includes the ties between the ego and the alters and sometimes also the ties among the alters. Egocentric networks have been used widely in prior research, for the efficiency in data collection, the need to have more details on ego ties, and other reasons (An and Western 2019; Bian 1997; Bian and Li 2012; Burt 1984; Desmond and An 2015; Dowd and Pinheiro 2013; Lin 2002; Marsden 1990; McPherson, Smith-Lovin, and Brashears 2006; Paik and Sanchagrin 2013; Perry 2011; Perry and Pescosolido 2012, 2015; Perry, Pescosolido, and Borgatti 2018; Pescosolido 1991, 1992, 2006; Pinheiro and Dowd 2009; Song 2011; Small 2007; Small and Sukhu 2016; Song and Lin 2009; Yuan and An 2017). With the rise of social media such as Facebook and Twitter, egocentric network data also have become more available than before. However, to what extent and under what conditions egocentric network centrality measures can resemble their sociocentric counterparts are important questions to study in order to better understand the advantages and the limitations of using egocentric network data for social network research.

In a classic paper, Marsden (2002) studies this issue in binary, undirected networks. As the author points out, degree centrality is the same in both egocentric and sociocentric networks and closeness centrality is uninformative in egocentric networks because the ego is connected to all alters in binary, undirected egocentric networks. Through analyzing various datasets, Marsden further shows that there is a high level of correlation (ranging from 0.83 to 0.99) between egocentric betweenness and sociocentric betweenness, which implies that the former may be a reliable approximation to the latter. Marsden (2002) also shows that nodes having high hub centrality (i.e., nodes that connect to many peripheral ones) or high bridge centrality (i.e., nodes connecting to a few central nodes) tend to have more divergence between their egocentric betweenness and sociocentric betweenness. What remains unknown is whether this finding will hold in directed networks and what macro network features are correlated with the degree of correspondence between egocentric and sociocentric centrality.

Everett and Borgatti (2005) is another important study on this issue, still in undirected networks. Based on simulations and analyses of selected empirical datasets, the authors find that network density has a curvilinear relationship with the correlation between egocentric and sociocentric betweenness. As network density arises, the correlation decreases and then increases. In their results, the relation between network size and the correlation is less clear-cut. The research by Everett and Borgatti (2005) may be extended in a few directions. First, one may study the issue in directed networks (including directed, weighted networks). Second, one may examine how multiple network features simultaneously moderate the egocentric approximation. More network features besides network size and density can be examined and it is more appropriate to include them altogether to predict the quality of egocentric approximations, as it is possible that some of the patterns based on network size and density may be driven by other network features, such as reciprocity and transitivity (Hinne 2011). Third, one may examine the issue using more empirical networks in order to cross-validate the results.

In this paper, I study this topic in directed networks. Although prior studies have studied the topic in undirected networks, it is unclear whether the findings will hold in directed networks. This is especially important considering that many undirected networks are actually converted from directed networks by symmetrizing directed ties (to resolve contradictions or inconsistencies in reporting) or due to that the name generators do not solicit directed ties (despite the actual relations are directed in nature). In addition, with directed networks, one may examine many variants of betweenness, such as those based on endpoints (Brandes 2008) or Borgatti's proximal source betweenness (Borgatti and Everett 2006). Furthermore, in directed networks closeness centrality is no longer uninformative.

Mathematically speaking, the correspondence between egocentric and sociocentric centrality measures can go to either extreme, from perfect correlation (e.g., in cycles) to complete independence (e.g., in paths). However, it is still important to examine the correspondence using a diverse set of empirical social networks in order to gain a realistic sense of the correspondence. Of course, the answers based on such exercises will be contingent on the sample of networks being examined and so may not apply universally.

The paper proceeds as follows. First, I will introduce and compare sociocentric and egocentric network designs. Then I will introduce mathematical definitions of betweenness and closeness centrality in directed networks. In section 4, I will compute and compare egocentric and sociocentric centrality measures using various empirical networks. In section 5, I will investigate how network features are associated with the quality of egocentric approximations to sociocentric centrality measures using both empirical networks and simulated networks. Lastly, I will conclude and briefly discuss three ideas to improve the egocentric approximations.

Sociocentric vs. Egocentric Network Designs

In a sociocentric network design, typically members of a group report their social ties to all other members within the same group. The reported ties are usually directed, e.g., if A nominates B as a friend it does not necessarily indicate B also nominates A as a friend. The resulting network has a clearly defined boundary and is usually used to study group dynamics within a particular group.

Egocentric networks are usually collected by asking individuals from a population (i.e., egos) to report their social ties to alters while reports from the alters are typically not solicited. Egos in egocentric networks are usually randomly sampled from a population while it is usually not the case in sociocentric network designs. In the latter case, random sampling tends to operate at the group level in order to randomly sample networks, which is more difficult to do. Sociocentric network designs may also entail higher respondent burden because of the larger number of alters to be enumerated and reported on.

Hence, in practice, egocentric network data have been used as an economic alternative to sociocentric network data without much considering the impact of missing ties (including ties initiated by alters and indirect ties). A question then is under what conditions egocentric network measures (e.g., centrality measures) can approximate their sociocentric counterparts. The answer to this question first depends on the specific features of egocentric network designs.

Figure 1 presents an example of a sociocentric network. As shown in Figures 2–4, depending on tie directionality, whether reports of alter ties are allowed, and how network boundary is drawn, egocentric networks can take several forms. In Figure 2A, egos (with node A being an example) are only required to report their own ties to alters. This design ignores ties from and among the alters. Figure 2B is the same as Figure 2A except that it ignores tie directionality and assumes that all ties are undirected.

Figure 1.

An example of a sociocentric network.

Figure 2.

Ego reports of self-ties.

Figure 3.

Ego reports of both self-ties and alter ties.

Figure 4.

Cognitive ego networks.

In Figure 3A, egos are required to report not only their own ties to alters but also ties from and among the reported alters. Figure 3B is the same as Figure 3A except that it ignores tie directionality. There could also be some intermediate cases here. For example, in Figure 3C node A is required to report its own directed ties to alters but undirected ties from the alters because the directionality of the alter-sending ties might be too difficult for the ego to recall accurately. In Figure 3D node A is required to report directed ties that are pertaining to A (including those from alters to the ego) but undirected ties among the alters because A may be more aware of the directionality of the former ties than of the latter ones. Note that because the alter-sending ties are reported based on the ego's perception, the accuracy of these reports is not guaranteed.

In Figure 4A, the network boundary is explicitly set, for example, with a roster of names provided to the egos. Then node A is requested to report its own ties to alters and also its perception of the ties from and among the alters. This design elicits an ego-perceived social network (Krackhardt 1987). The strength of this design is that it can include alters who are not nominated by the ego but in the ego's perception nominate the ego as a contact, for example node C. Figure 4B is the same as Figure 4A except that it ignores tie directionality.

Now going back to Figure 1, Node A's full ego network (FEN) is circled by the dashed line. The FEN can be obtained based on an ego-perceived social network or as a sampled ego network from the full sociocentric network. The FEN retains the most complete ego network information including tie directionality and ties from both egos and alters. Hence, studying the FEN provides an upper bound on how well egocentric network measures can approximate their sociocentric counterparts. If the quality of approximations is low for the FEN, then it will not be better in other types of ego networks. In particular, if alter-initiated ties are unavailable, then the indegree information will be missing and in theory betweenness and closeness will be either unavailable or much off from their sociocentric counterparts.

One may argue that egocentric networks and sociocentric networks often serve different purposes in practice (Crossley et al. 2015; Marsden 2005, 2011). For example, one may be interested in studying family internal communication networks and select a family member to report family internal communications. The data reported by the selected family member can be used to form an ego-perceived communication network. One problem with this approach is that many of the reported communications lack confirmations from alters and so may suffer from reporting bias. It may also miss communication ties among other family members. Similarly, an ego may be asked to report contacts with whom they have discussed important affairs and also discussion ties among the reported alters. Such ego networks are useful for studying how ego-perceived personal networks affect ego behaviors. But still, the reported ties lack alter confirmations and may miss many actual ties. As a result, the predictive power of the ego network measures may decline or varnish once corresponding sociocentric network information is accounted for. Hence, it is important to examine the validity of ego network measures that are based on alter-initiated ties, indirect ties, or macro network features.

Centrality Measures in Directed Networks

Given a directed network with n nodes, let $w_{i j} = 1$ if there is a tie from i to j and $w_{i j} = 0$ if there is not such a tie. Betweenness centrality measures a node's brokerage power or roughly speaking, the number of times a node serves as a bridge between others (Butts 2008; Freeman 1979). In one standard version, it is defined as $B_{i} = \sum_{j k} (g_{j k}^{i} / g_{j k})$ , where $g_{j k}$ is the number of the shortest paths from node j to k and $g_{j k}^{i}$ is the number of such paths passing through unit i. For other versions of the betweenness measure, please consult Borgatti and Everett (2006) and Brandes (2008). Closeness centrality measures how fast a unit can reach to others in the network. In one version (Gil-Mendieta and Schmidt 1996), it is defined as $C_{i} = \sum_{j} d_{i j}^{- 1} / (n - 1)$ , where $d_{i j}$ is the length of the shortest path from i to j and n is the number of nodes in the network. When two nodes are un-connected, their distance is treated as infinite. A nice property of this version of the closeness centrality is that it is well defined even in fragmented, disconnected networks.

Borgatti and Everett (2006) point out that both betweenness and closeness centrality measure a node's walking capability in a network and both summarize a node's contribution to the cohesiveness of the network. They show that closeness centrality is a radial measure that assesses walks that radiating from or to a given node and summarizes the node's capability of connecting with other nodes in the network. Betweenness centrality is a medial measure that is based on the number of shortest walks passing through a given node and measures the level of network fragmentation that results from removing the node. Because of the differences in the two measures, a node with a high betweenness centrality does not necessarily have a high closeness centrality. For example, a node serving as the bridge between different subgroups can have a high betweenness score, and yet can be marginal in each subgroup. Schoch, Valente, and Brandes (2017) shows that in more centralized (undirected) networks, the two centrality measures will likely be more closely correlated. Borgatti and Everett (2006) also point out that network structures (i.e., a core-periphery structure) can affect the interpretability of closeness and betweenness centrality. For example, closeness makes less sense in a network with multiple cores while betweenness may still be interpretable and meaningful in such networks.

If the network is weighted, let $w_{i j}$ indicate the strength of the tie from i to j. The adjacency matrix needs to be transformed (e.g., by taking the inverse of $w_{i j}$ for $w_{i j} \neq 0$ ) to be consistent with that a shorter path indicates a shorter distance between nodes. Then betweenness and closeness centrality can be calculated similarly as in the binary case (Butts, 2008; O’Malley and Marsden 2008; Wasserman and Faust, 1994; Yang, Keller, and Zheng 2016).

Still use Figure 1 for illustration. Let's focus on node A, whose egocentric network is circled by the dashed line. A's betweenness centrality score is 2 (being the bridge from C to B and from B to E) in both the full network and the egocentric network. A's closeness centrality score is 0.67 in the full network and 0.5 in its egocentric network. In fact, we can perform the same calculations for all other nodes. Table 1 shows the results. In this simple example, the betweenness centrality and the closeness centrality in the egocentric networks are strongly correlated with their sociocentric counterparts, despite some numerical differences in closeness centrality. The correlation between the betweenness centrality measures is 1 and the correlation between the closeness centrality measures is 0.95. However, given the simplicity of this example, we cannot generalize the results too broadly.

Table 1.

Egocentric and Sociocentric Centrality Measures in the Example Network.

	A. Betweenness		B. Closeness
Nodes	(1) Sociocentric	(2) Egocentric	(3) Sociocentric	(4) Egocentric
A	2	2	0.5	0.67
B	0	0	0.5	1
C	0	0	0.75	1
D	0	0	0	0
E	0	0	0	0
Correlation	1.00		0.96

In general, one may expect egocentric betweenness to be smaller than corresponding sociocentric betweenness. This is because egocentric betweenness ignores a node's brokerage roles between nodes outside of the ego network. For most nodes that are not involved in any bridges between clusters, their ego network tend to already contain most (if not all) of their brokerage roles. Hence, for them the two betweenness measures are likely very close. A major difference may arise for nodes that connect different clusters in a network. Therefore, as long as the number of such cross-cluster connectors is small, the correlation between egocentric and sociocentric betweenness should be reasonably high. In other words, if the network is polycentric (which implies there are many clusters in the network and the network tends to be decentralized) or if the network is big (which makes it more likely that there are more cross-cluster connectors), then the correlation between egocentric and sociocentric betweenness may be relatively low.

In terms of closeness, one may expect that egocentric closeness be larger than sociocentric closeness. This is because an ego is usually closer to nodes in its neighborhood than to nodes outside of its neighborhood. The difference will likely be larger if there are more nodes outside of the ego's neighborhood that are indirectly connected to the ego. This implies that any network features that can affect the extent of indirect connections may affect the magnitude of the difference. For example, the difference may be larger in denser networks or networks with more reciprocal ties (which make shortest paths to be more likely to exist). Because this difference applies to most nodes, one may expect that the correlation between egocentric and sociocentric closeness be smaller than that between egocentric and sociocentric betweenness.

The above conjectures are mostly formed upon the mathematical definitions of the centrality measures. One may also use simulations to study the issue (Costenbader and Valente 2003; Everett and Borgatti 2005). Simulations are useful because one may manipulate different network parameters (e.g., size and density) to produce networks with different typologies. However, simulations also face two limitations. First, simulated networks do not necessarily have the realistic network topologies or represent the most prevalent types of real-world networks. Second, because there are numerous combinations of the various parameters that govern the network generation process, it is difficult to simulate all the different combinations. Hence, findings from simulated networks largely depend on the specific simulation set-up and it is unclear to what extent they are applicable to real-world networks. In light of these issues, this study will employ both empirical data and simulated data in order to provide a more complete understanding on the subject.

Comparing Egocentric and Sociocentric Centrality Measures in Directed Networks

Ideally, one would sample networks from the population of networks and then use the sampled, representative networks to study the research question. However, the universe of networks is hardly ever defined. Hence, following the precedent in the literature (Marsden 2002), in this paper I will first compare egocentric and sociocentric network centrality measures in directed networks using various empirical networks, some of which are small but classic that have been studied repeatedly in the literature. Specifically, I will compare the two types of centrality measures by examining their differences, correlations, and predictive power.

The first few datasets are based on the Adolescent Smoking and Network Research (ANSR) that the author conducted in 2010–2011 in 90 classes of six middle schools in China. The first dataset is the friendship network data at wave one that was collected in late 2010, in which each student was asked to report up to ten of their closest friends in school. To help increase the number of units of analysis, I have sub-set the friendship network at the class level and will mostly focus on comparing centrality measures in the class friendship networks.

Figure 5 shows the difference between egocentric and sociocentric centrality measures by class. As expected, egocentric betweenness severely under-estimates sociocentric betweenness (by 50 points at the median) while egocentric closeness greatly over-estimates sociocentric closeness (by 0.4 at the median). Hence, it may be inappropriate to simply substitute egocentric centrality measures for their sociocentric counterparts. However, two sets of measures can still be highly correlated despite their values differ greatly. Hence, as in Marsden (2002), below I will analyze the Pearson correlations between the two types of centrality measures.

Figure 5.

Differences between egocentric and sociocentric centrality measures in the class friendship networks at wave 1.

The first row in Table 2 shows the correlations between egocentric and sociocentric centrality measures in the six school friendship networks (with 682 nodes on average). The correlations in the betweenness measures have a mean 0.685 and the correlations in the closeness measures have a mean 0.626. Both correlations are reasonable high. Row 2 shows the results for the class friendship networks at wave 1. The average network size is 51. The correlations in the betweenness measures and in the closeness measures have a mean 0.725 (SD = 0.143) and 0.691 (SD = 0.155), respectively. Hence, the correlations between egocentric and sociocentric centrality measures are also relatively high in the class friendship networks at wave 1.

Table 2.

Correlations of Egocentric and Sociocentric Centrality Measures in Directed Networks.

		(1) Nodes	(2) Betweenness		(3) Closeness
ID	Dataset	Mean	Pearson	Kendall	Pearson	Kendall
1	ANSR: Friendship (6 Schools, Wave 1)	682	0.685	0.601	0.626	0.257
2	ANSR: Friendship (90 Classes, Wave 1)	51	0.725	0.692	0.691	0.409
3	ANSR: Friendship (90 Classes, Wave 2)	51	0.775	0.749	0.886	0.566
4	ANSR: Friendship (90 Classes, Wave 2, Weighted)	51	0.758	0.738	0.806	0.535
5	ANSR: Cigarette Exchange (90 Classes, Wave 1)	51	0.992	0.999	0.934	0.980
6	ANSR: Information Exchange (90 Classes, Wave 2)	51	0.915	0.946	0.857	0.813
7	Bank Wiring Room: Helping	14	0.830	0.767	0.798	0.439
8	Bank Wiring Room: Job Trading (Weighted)	14	NA	1.000	0.604	0.904
9	Krackhardt Managers: Friendship	21	0.804	0.839	0.921	0.406
10	Krackhardt Managers: Advising	21	0.991	0.886	0.950	0.854
11	Lazega Lawyers: Advice	71	0.925	0.784	0.742	0.494
12	Lazega Lawyers: Cowork	71	0.966	0.855	0.721	0.169
13	Lazega Lawyers: Friendship	71	0.951	0.809	0.895	0.438
14	Newcomb Fraternity (Week 1)	17	0.846	0.664	0.477	0.444
15	Newcomb Fraternity (Week 2)	17	0.873	0.820	0.262	0.142
16	Newcomb Fraternity (Week 3)	17	0.715	0.590	0.270	0.212
17	Newcomb Fraternity (Week 4)	17	0.870	0.778	0.455	0.355
18	Newcomb Fraternity (Week 5)	17	0.871	0.788	0.111	0.016
19	Sampson Monastery (Wave 1, Weighted)	18	0.633	0.823	0.053	−0.088
20	Sampson Monastery (Wave 2, Weighted)	18	0.732	0.487	0.187	0.074
21	Sampson Monastery (Wave 3, Weighted)	18	0.527	0.527	0.311	0.247

Note: Some cigarette exchange networks in the ANSR are sparse or even empty. The resulting betweenness or closeness scores are or close to zero. The job trading network in the bank wiring room is sparse; all the betweenness scores are zero and so the correlation is unavailable.

I also provided similar information for four other datasets collected in the ANSR, including friendship networks in the 90 classes at wave 2 (collected in early 2011), weighted friendship networks at wave 2 where a tie indicates the strength of friendship (ranging from 1 to 10 with 10 being the strongest tie), cigarette exchange networks at wave 1 where a tie indicates one student reported having exchanged cigarettes with another, and information exchange networks at wave 2 where a tie indicates a student reported having obtained intervention information from another on a smoking prevention intervention implemented between the two waves of surveys. The mean correlations in the betweenness centrality range from 0.76 to 0.99 while the mean correlations in the closeness centrality are between 0.86 and 0.93. Also see Figure 6 for summary information on selected networks.

Figure 6.

Correlations between egocentric and sociocentric centrality measures in the class friendship networks.

A few preliminary patterns are worth noting. (1) The correlations seem to be negatively associated with network density. As network density becomes smaller, the correlation gets larger. In the data, the wave-2 friendship networks are relatively sparser than the wave-1 friendship networks and the exchange networks are even sparser than the friendship networks. Accordingly, we observe that the correlations in the two types of centrality measures are relatively larger in the wave-2 friendship networks and even larger in the exchange networks. (2) Compared to the binary friendship networks at wave 2, the correlations in the weighted friendship networks are similar. Hence, it appears that weights do not matter much for the correlations, at least in this example. (3) The correlations between egocentric and sociocentric centrality measures appear to be negatively associated with network size. For example, for the wave-1 friendship networks, the average correlation in betweenness centrality in the school friendship networks is 0.685, which is smaller than the counterpart in the class friendship networks. Similarly, the average correlation in the closeness centrality in the school friendship networks is 0.626, which is also smaller than the counterpart in the class friendship networks.

I provided similar analyses for 15 other datasets collected in various contexts, including the helping and job trading networks in a bank wiring room (Roethlisberger and Dickson 1939), the friendship and advice-seeking networks among 21 managers in a high-technology company (Krackhardt 1987), the advice, co-working, and friendship networks among 71 Lawyers (Lazega 2001), friendship networks in a fraternity for five weeks (Newcomb 1961), and the liking network among a group of monks in a monastery (Sampson 1969).¹ All these networks are directed. Across these 15 networks, the average correlation between egocentric and sociocentric betweenness is 0.82 with a range (0.53, 0.99) and the average correlation between egocentric and sociocentric closeness is 0.54 with a range (0.05, 0.95). Therefore, the correlation between the betweenness centrality measures is higher on average than that between the closeness centrality measures.

Schoch, Valente, and Brandes (2017) show that the Kendall rank correlation is an effective method for examining correlations between different types of centrality measures in the same network. This study aims to examine correlation in the same type of centrality measures between different networks and so the Kendall rank correlation method may not be equally applicable. But as a robustness check, I also report the Kendall rank correlations in Table 2. The Kendall rank correlation coefficients are smaller in most of the networks being studied here and present a larger divergence between egocentric and sociocentric measures. Nonetheless, we still observe a higher correlation between the betweenness measures than between the closeness measures.

Besides examining numerical differences and correlations between the two types of centrality measures, a third and new strategy to assess the correspondence between egocentric and sociocentric centrality measures is to compare their predictive power. To that end, I use the two types of centrality measures from the wave-1 friendship networks in the 90 classrooms to predict the sociocentric centrality measures in the wave-2 friendship networks in the ANSR through hierarchical linear models. To ease interpretation of the results, I have standardized the centrality measures. The models include a list of student covariates, including sex (1 = boy; 0 = girl), age, height, smoking status (1 = yes; 0 = no), academic ranking (1 = top ten in the class; 0 = otherwise), parental occupation (0 = both parents are farmers; 1 = otherwise), parental education (0 = neither parent has high school level of education; 1 = otherwise), personality (1 = optimistic; 0 = not optimistic), family economic condition (1 = good; 0 = not good), an indicator for whether a student missed the second survey (1 = missing; 0 = no), and the student's treatment status in the smoking prevention intervention implemented between the two surveys (1 = treated; 0 = untreated). Random effects for school, grade, and class are included to capture effects from unmeasured factors at these levels and to reflect the nesting feature of the data. As robustness checks, I also fitted two other models to capture correlations in the dependent variable across units. One model includes fixed effects specified at school and grade levels and standard errors clustered by class. The other model is the same as the fixed effects model except that the errors of two students are allowed to be correlated if the two students are directly connected in the school friendship network. The model is estimated by generalized least squares (GLS). In each of these models, I first use sociocentric network centrality at the first wave along with covariates to predict sociocentric network centrality at the second wave. Then for each of these models, I replace the sociocentric network centrality at the first wave by its egocentric counterpart at the first wave. If the two coefficients for the wave-1 sociocentric and wave-1 egocentric network centrality measures are similar, then it indicates that the two have similar predictive power.

Panel A of Table 3 shows the results for predicting the wave-2 sociocentric betweenness. The predictive power of egocentric and sociocentric betweenness centrality at Wave 1 is about the same. Increasing the wave-1 sociocentric or egocentric betweenness centrality by one standard deviation is associated with an increase in the wave-2 sociocentric betweenness by 0.17 or 0.16 standard deviations. The coefficients are close in size, both statistically significant at the 0.001 level, and are statistically indistinguishable (P < 0.05).

Table 3.

Predicting Wave-2 Centrality Measures Using Wave-1 Sociocentric or Egocentric Centrality Measures.

	A. Wave-2 Betweenness				B. Wave-2 Closeness
	(1) Sociocentric		(2) Egocentric		(3) Sociocentric		(4) Egocentric
Variables	Est.	SE	Est.	SE	Est.	SE	Est.	SE
Wave-1 Betweenness	0.17	0.01***	0.16	0.01***
Wave-1 Closeness					0.18	0.01***	0.08	0.01***
Boy	0.04	0.03	0.05	0.03	−0.11	0.02***	−0.15	0.02***
Age	0.00	0.02	0.01	0.02	0.02	0.01	0.02	0.01
Height	0.00	0.00	0.00	0.00	−0.00	0.00	−0.00	0.00
Smoking	−0.00	0.05	−0.00	0.05	−0.14	0.03***	−0.15	0.03***
Academic Ranking	0.12	0.03***	0.11	0.03***	0.07	0.02***	0.09	0.02***
Personality	−0.02	0.03	−0.02	0.03	0.03	0.02	0.03	0.02
Parental Occupation	0.01	0.03	0.01	0.03	−0.01	0.02	−0.00	0.02
Parental Education	0.07	0.03*	0.07	0.03*	−0.02	0.02	−0.01	0.02
Family Econ Condition	0.00	0.04	0.01	0.04	0.03	0.02	0.04	0.02
Missing 2nd Survey	−0.57	0.04***	−0.56	0.04***	−1.87	0.03***	−1.89	0.03***
Treated	0.07	0.04	0.07	0.04	0.05	0.02*	0.05	0.03
Intercept	−0.20	0.35	−0.33	0.36	0.13	0.23	0.14	0.23
N	4,094		4,094		4,094		4,094

Note: Hierarchical linear models are used with random effects specified at school, grade, and class levels. Centrality measures are standardized. Significance code: *, P < 0.05; **, P < 0.01; ***, P < 0.001.

Panel B shows that egocentric and sociocentric closeness centrality at Wave 1 are both significant predictors of sociocentric centrality at wave 2 (both P < 0.001). However, the sizes of the estimates differ notably. For every one standard deviation of increase in the sociocentric closeness centrality at wave 1 there are 0.18 standard deviations of increase in the sociocentric closeness centrality at wave 2, whereas for every one standard deviation of increase in the egocentric closeness centrality at wave 1 there are only 0.08 standard deviations of increase in the sociocentric closeness centrality at wave 2. The former estimate is more than twice as large as the latter and the difference between the two is statistically significant at the 5% level. Overall, these results indicate that the difference between egocentric and sociocentric betweenness centrality measures may be negligible while the difference between the closeness centrality measures can be notable.

Table A1 present results based on the fixed effects model and the GLS model. The results are similar to the hierarchical linear model (HLM) results. Of course, a caveat is that the findings are contingent on the model specifications.

Predicting the Correlations in the Centrality Measures Based on Network Features

One remaining question is in what types of networks egocentric centrality measures can approximate their sociocentric counterparts well. To study this question, I use a series of network features to predict the correlations between egocentric and sociocentric centrality measures. I focus on analyzing the class friendship network data from the ANSR, for two reasons. First, the two waves of data provides 180 data points and so the sample size is large enough for regression analyses. Second, using data from one study ensures consistency in study designs and measurement of variables. Otherwise, difference in data measurement itself can make it difficult to draw any conclusions.

Specifically, the dependent variable is the correlation between egocentric and sociocentric centrality measures in the 90 classes at two waves (180 data points). The network features include the following. (1) Network size (unit = 10), namely, the number of students in each class. (2) Network density, which measures the connectedness of a network, defined as the proportion of observed ties out of all possible ties: $D^{g} = \sum_{i j} \frac{w_{i j}}{n (n - 1)}$ , where g indicates the gth network. (3) Centralization, i.e., the extent to which ties are concentrated. It is defined as $C^{g} = \sum_{i} | \frac{M - C_{i}}{(n - 2) (n - 1)} |$ , where $C_{i}$ is the individual indegree and M the maximum indegree in a network. The centralization coefficient $C^{g}$ is bounded between 0 and 1 with larger values indicating more centralization. (4) Reciprocity, i.e., the proportion of mutual ties out of all observed ties, defined as $R^{g} = \frac{\sum_{i j} w_{i j} w_{j i}}{\sum_{i j} w_{i j}}$ for binary networks. (5) Transitivity, i.e., the proportion of transitive triangles out of all possible triangles, defined as $T^{g} = \frac{\sum_{i j k} w_{i j} w_{j k} w_{k i}}{\sum_{i j k} w_{i j} w_{j k}}$ . (6) Network diameter, measured as the maximum geodistance among the connected nodes. (7) Mean geodistance, measured as the average of the nonzero geodistance among the connected nodes. Please see Butts (2008) and Wasserman and Faust (1994) for more details on the definitions of these terms. The regression also includes an indicator for wave 2 to account for any difference in the outcomes across the two waves. To account for the fact that the correlation coefficients are constrained between −1 and 1, I use a Tobit regression (Greene 2008).

y_{g}^{*} = x_{g}^{'} β + ε_{g}, ε_{g} \sim N (0, σ^{2})

y_{g} = {\begin{matrix} y_{g}^{*} & i f & - 1 < y_{g}^{*} < 1 \\ - 1 & i f & y_{g}^{*} \leq - 1 \\ 1 & i f & y_{g}^{*} \geq 1 \end{matrix}

In the model,

y_{g}

indicates the observed correlation between egocentric and sociocentric centrality measures in the gth network,

y_{g}^{*}

a latent construct for

y_{g}

x_{g}

the covariates plus a constant, and

ε_{g}

a random error. The model includes the network features described above. To capture any curvilinear relations between these features and the outcome, it also includes the squared terms of these features. To provide a concise model, I use a stepwise regression procedure to include only those squared terms that are statistically significant at the 5% level.

Table 4 shows the regression results. Panel A suggests that the (latent) correlation between egocentric and sociocentric betweenness has a U-shaped relation with both network size and network density. As network size or density increases, the correlation first decreases and then increases (as indicated by the positive coefficients for the squared terms). The nonlinear impact of network size is unexpected (as it was thought as linear). The finding on network density is consistent with Everett and Borgatti (2005). Centralization has a n-shaped relation with the correlation. As centralization increases, the correlation first increases, but then decreases after reaching a certain threshold. Both transitivity and mean geodistance have a statistically significant negative relation with the correlation. What this means is that the egocentric approximation of betweenness may be less accurate in polycentric networks that have multiple tightly-knit subnetworks and a few ties spanning across the subnetworks.

Table 4.

Predicting Correlations of Egocentric and Sociocentric Centrality Measures Based on Features of Empirical Networks.

	A. Betweenness		B. Closeness
Variables	Est.	SE	Est.	SE
Size (unit = 10)	−0.20	0.07**	−0.01	0.01
Size Square	0.02	0.01*
Density	−2.63	1.02*	−1.76	0.53**
Density Square	6.34	2.90*
Centralization	2.66	0.87**	0.08	0.24
Centralization Square	−5.25	2.09*
Reciprocity	0.92	0.55	−1.77	0.55**
Reciprocity Square
Transitivity	−0.37	0.16*	0.27	0.16
Transitivity Square
Diameter (Max Geodistance)	−0.02	0.01	−0.01	0.01
Diameter Square
Mean Geodistance	−0.08	0.04*	0.01	0.04
Mean Geodistance Square
Wave 2	0.02	0.02	0.19	0.02***
Intercept	0.90	0.50	2.20	0.50***
Observations	180		180

Note: The empirical networks are from the ANSR. Tobit models are used, which can account for that the dependent variables (correlations) are constrained to be in [−1,1]. The squared terms with p-value larger than 0.05 are dropped for model conciseness. Significance code: *, P < 0.05; **, P < 0.01; ***, P < 0.001.

Panel B of Table 4 shows the results for closeness centrality. It appears that the types of networks that allow good egocentric approximations of closeness are quite different from that for betweenness centrality. First, neither network size nor network centralization matters significantly, but both network density and reciprocity matter significantly. Networks with a low density and a low level of reciprocity tend to have better egocentric approximations of closeness centrality. This result makes intuitive sense. In sparse networks the connected alters for most nodes will tend to be local. Similarly, a low level of reciprocity helps restrict the spanning of the shortest paths, which helps localize the closeness centrality. Second, none of the squared terms is statistically significant at the 5% level and so is omitted from the final model. Third, the coefficient for transitivity is statistically significant at the 10% level but not at the 5% level. As transitivity usually is an indication that ties tend to cluster locally, this result seems to suggest that for networks with stronger local clustering, the egocentric approximation of closeness centrality may work better. Lastly, the egocentric approximation of closeness centrality is relatively better in the friendship networks at wave 2, as indicated by the positive and significant estimates on the indicator of wave 2.

I also conducted simulations to study how different network features moderate the correlations between egocentric and sociocentric centrality measures. On one hand, the simulated networks are generated according to very general conditions so that they may represent a broad range of networks. On the other hand, it should be recognized that these simulated networks are synthetic in nature. Therefore, we do not know to what extent they represent realistic networks. Although they may represent a broad range of networks, it is unclear whether the distribution of the simulated networks reflect the prevalence of these networks in reality. In addition, when using the features of the simulated networks to predict the correlations between egocentric and sociocentric centrality measures, it is assumed that the effects of the predictors are the same across different types of networks, which may not be true either. Given these considerations, the results based on the simulated networks may differ from those based on the empirical networks. Hence, to have any guidance for future research, a conservative approach might be to look at the common patterns across these results.

Specifically, I simulate three sets of random networks that represent small, medium, and large (relatively speaking) networks, respectively. For the small networks, the number of nodes in each network is drawn from a uniform distribution [10, 30]. For the medium networks, it is drawn from a uniform distribution [40, 60], and for the large networks it is drawn from a uniform distribution [90, 110]. For each set of these networks, three levels of network density are specified in order to generate networks with sparse, medium and dense connections. The low density parameter is drawn from a uniform distribution [0.1, 0.3], the medium density parameter is drawn from a uniform distribution [0.3, 0.5], and the high density parameter is drawn from a uniform distribution [0.5, 0.7]. Hence, there are nine broad types of networks in total depending on the network size and density. For each of these nine type of networks, 200 networks are simulated and in total 1,800 networks are simulated. Because of the varying network size and density, there is also a sufficient degree of variation in other aspects of the network topology. The simulations show that the correlation between egocentric and sociocentric betweenness is very high, with a mean of.91 (SD = .06) while the correlation between egocentric and sociocentric closeness is a bit lower but still relatively high, with a mean of .81 and a larger variation (SD = .08). This is consistent with the prior result based on the empirical networks.

I then use the same network features and the same model that are used in Table 4 to predict the correlations between egocentric and sociocentric centrality measures. The regression results (Table 5) are quite different from those based on the empirical networks. Because neither the simulated networks nor the empirical networks are produced by random sampling, it is hard to tell which set of results are more legitimate. Hence, I will focus on the results that are consistent across the two tables (broadly speaking, focusing on the signs instead of statistical significance) in order to make some concrete and probably conservative guidance for future research. First, across the two tables, the egocentric approximation of betweenness seems to have a U-shaped relationship with network size, a positive relationship with network centralization (for its lower range) and a negative relationship with transitivity (for its lower range), network diameter, and the mean geodistance (for its higher range). Second, the egocentric approximation of closeness seems to have a negative relationship with network size (for its lower range), density, reciprocity (for its lower range), diameter (for its lower range) and a positive relationship with transitivity (for its higher range) and the mean geodistance (for its lower range). Overall, these results suggest that the types of networks that will produce good egocentric approximations of betweenness and closeness are different. One implication might be that even though some networks may produce good egocentric approximation of betweenness, they may not produce good egocentric approximation of closeness at the same time.

Table 5.

Predicting Correlations of Egocentric and Sociocentric Centrality Measures Based on Features of Simulated Networks.

	A. Betweenness		B. Closeness
Variables	Est.	SE	Est.	SE
Size (unit = 10)	−0.0039	0.0020	−0.0116	0.0032***
Size Square	0.0004	0.0001**	0.0007	0.0002**
Density	0.2569	0.1660	−0.5163	0.1365***
Density Square	−0.5805	0.2318*
Centralization	0.0354	0.0263	−0.2685	0.0415***
Centralization Square
Reciprocity	−1.5899	0.1641***	−1.4205	0.2635***
Reciprocity Square	1.4281	0.1415***	0.8794	0.2261***
Transitivity	−0.1613	0.1323	−0.1550	0.1431
Transitivity Square	0.7831	0.2135***	1.0730	0.1675***
Diameter (Max Geodistance)	−0.0193	0.0032***	−0.0111	0.0131
Diameter Square			0.0046	0.0017**
Mean Geodistance	0.3809	0.0584***	0.6220	0.1075***
Mean Geodistance Square	−0.0895	0.0120***	−0.1623	0.0247***
Intercept	0.9342	0.0919***	0.8935	0.1463***
Observations	1,800		1,800

Note: Tobit models are used, which can account for that the dependent variables (correlations) are constrained to be in [−1,1]. The squared terms with p-value larger than 0.05 are dropped for model conciseness. Significance code: *, P < 0.05; **, P < 0.01; ***, P < 0.001.

Conclusion and Discussion

Egocentric networks represent a popular design for social network research that has been used widely in previous research. Often it is interesting and even critical to know to what extent network measures derived from egocentric networks resemble their sociocentric counterparts. Depending on the specific design of egocentric networks, some sociocentric network measures may be retained in their entirety in egocentric networks while others not. For example, in the best case where the full ego network information is available (i.e., ties from both egos and immediate alters are available), network measures like degree, local network density, and local network composition will be the same in both egocentric and sociocentric networks. However, network measures that depend on macro network features or indirect ties of the network may be different, such as betweenness centrality and closeness centrality. Marsden's seminal work (2002) shows that in undirected networks this problem may be not that concerning as egocentric betweenness approximates sociocentric betweenness very well in various networks he examined. Recall that in undirected egocentric networks, closeness centrality is uninformative because the ego is connected to all alters and the shortest path between the ego and any alter is just and always one.

This study extends the prior work to directed networks. Two major findings stand out. First, the quality of egocentric approximations differs greatly between betweenness centrality and closeness centrality. The correlation between egocentric and sociocentric betweenness is usually around or above 0.8 across the many empirical networks examined. However, the correlation between egocentric and sociocentric closeness is notably lower (with a mean about 0.6 and a larger variance across networks) and can be as low as below 0.3. The same pattern is shown when using egocentric or sociocentric centrality measures in the ANSR friendship networks at wave 1 to predict their sociocentric counterparts at wave 2, where the predictive power of egocentric and sociocentric betweenness is about the same while the predictive power of egocentric and sociocentric closeness differs significantly. Simulations also show a similar pattern, except that the correlation for either betweenness and closeness measure is higher. Overall, the results suggest that egocentric centrality measures approximate their sociocentric counterparts quite well for betweenness and often (not always) well for closeness.

Second, egocentric approximations of betweenness and closeness seem to work well in different types of networks. For example, analyses of the empirical networks show that egocentric approximations work better for betweenness centrality in smaller networks, sparser networks (up to a certain point), and more centralized networks (up to a certain point) and for closeness centrality in sparser networks or networks with fewer reciprocal ties. However, results based on simulated networks show somewhat different patterns. Given that it is hard to conclude which set of results between the empirical and the simulated is more legitimate, I offered some guidance based on the common patterns of the two sets of results.

This study also has some limitations. First, the empirical findings are mostly based on small and face-to-face network data. In large networks, the correlations between egocentric and sociocentric betweenness and closeness might be smaller than in small networks because simply having more nodes in large networks might bring more “noise” or uncertainty in nodes’ sociocentric centrality. Second, we do not know whether the findings in this study will hold for networks that cover different populations, or different types of nodes or relationships. Third, this study only examined betweenness and closeness centrality while other network statistics like clustering coefficient could be interesting to look at as well. Fourth, future research may conduct more simulations to study how egocentric approximations perform in specific prototypes of networks (e.g., vertex-transitive graphs, in which all vertices are equicentral locally and globally, and threshold graphs in which rank-orders are the same globally and locally because neighborhoods are nested), beyond the type of Erdős–Rényi random networks this study has simulated.

In the end, I present three brief ideas that could help improve egocentric approximations (especially with regard to closeness centrality). First is to let respondents report their sense of closeness to alters. The drawback with this approach is that it still lacks alter confirmations (which can cause self-reporting bias) and may increase respondent burden. The second idea is to expand egocentric networks to include nodes (and ties) in the second- or higher-degree zones, namely, nodes that are indirectly connected to an ego. This approach helps get a closer measurement of sociocentric centrality while not having all the burden to collect full sociocentric network data. Third, if egocentric network data can be used to form a subnetwork (e.g., when egos are sampled from a social unit like a school), then researchers may use the subnetwork to compute quasi-sociocentric network centrality measures and use them to approximate the sociocentric ones. The approximation should be better because usually the subnetwork will provide more information about the underlying full network (Costenbader and Valente 2003).

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author is grateful for the financial support for data collection provided by the Graduate School of Arts and Sciences, the Multidisciplinary Program in Inequality and Social Policy, the Fairbank Center for Chinese Studies, and the Institute for Quantitative Social Science, all at Harvard University.

ORCID iD

Weihua An

Notes

Author Biography

Dr. Weihua An is Associate Professor of Sociology and Quantitative Theory and Methods and associated faculty of The East Asian Studies Program, The Goizueta Business School, and The Rollins School of Public Health at Emory University. He received a PhD in Sociology and an AM in Statistics from Harvard University and was a doctoral fellow and a postdoc fellow at Harvard Kennedy School. His research advances theories and methods for network analysis and causal inference with applications to studying inequality and social policy, health, and organizations. He has published in top methodological and substantive journals in sociology and has created multiple R packages for statistical analysis.