Abstract
This study examines the evolution of social networking sites (SNSs) from a networked audience duplication perspective. Guided by social network theory, the theory of double jeopardy, and niche theory, this study proposes an integrated framework to explain the evolution of SNS choices of the US audience between 2016 and 2019. Shared traffic data were retrieved from comScore’s Media Metrix Multi-Platform database. The empirical results of the separable temporal exponential random graph model (STERGM) confirm that preferential attachment, audience size, and niche width significantly drive the likelihood of tie formation and dissolution in the evolving audience duplication network. These effects hold true even when other endogenous structural features and exogenous nodal attributes are taken into account. Theoretical implications for the networked media landscape are discussed.
Social networking sites (SNSs) allow audiences to construct service-specific profiles, establish and maintain social relationships, and spread user-generated content through their networks (boyd and Ellison, 2007). Since their emergence in the late 1990s, SNSs have gained considerable popularity among media users around the globe and continued to shift in their socio-technical features and affordances. The changing media landscape has also spurred communication scholars to investigate the evolution of SNSs. This stream of research conceptualizes SNSs as an organizational form that shares common features and identities (Weber et al., 2016) and predicts that SNSs will be substituted by newer media forms that are functionally superior (Barnett, 2011). It is assumed that the emergence, development, and transformation of SNSs is a result of their interactions with audiences, advertisers, traditional media, investors, entrepreneurs, and other stakeholders in the media marketplace. Nonetheless, this implicit relational thinking has yet to be directly examined through network analytics. In addition, previous research overlooks the importance of the mutually constitutive relationship between SNSs and audiences in evolutionary processes. Despite the legitimate status of SNSs in the media industry, little is known about longitudinal changes in audience centralization and fragmentation within the market category. This issue is worthy of investigation because SNSs are related to each other through audiences selecting them.
This study investigates the evolution of SNSs from a networked audience duplication perspective. Audience duplication refers to the degree to which two media outlets share audiences. It is often conceptualized as the media-level projection of two-mode networks between media outlets and audiences (Ksiazek, 2011). In social network terminology, two-mode networks reflect the mutually constitutive relationship between two distinct sets of nodes (e.g. people and groups; Breiger, 1974). In the present case, the audience duplication network aggregates SNS choices of users at the media level. For instance, Facebook and Twitter are treated as network nodes, and a tie between them indicates the presence of shared audiences.
The purpose of this study is to explain the formation and dissolution mechanisms of audience duplication among SNSs. Guided by social network theory (Barabási and Albert, 1999), the theory of double jeopardy (McPhee, 1963), and niche theory (Hannan and Freeman, 1977), this study proposes that tie changes in the audience duplication network result from both endogenous structural features and exogenous nodal attributes. Dynamic network modeling is employed to analyze the comScore dataset that captures SNS choices of the US audience across desktops, smartphones, and tablets between 2016 and 2019. The empirical results confirm that preferential attachment, audience size, and niche width are significant predictors of network evolution.
Literature review
Evolution of audience duplication: a network perspective
Communication scholars have conceptualized media systems as networks (Ognyanova and Monge, 2013). One stream of research has explored the evolution of media systems by tracking longitudinal changes in the global hyperlink (Barnett et al., 2016) and news (Barnett et al., 2015) networks. These studies uncover the structure of Internet infrastructure and content through a media-centric perspective. Another line of research advocates a user-centric approach and focuses on audience behavior in the current high-choice media environment. Social networks among media audiences are found to be associated with shared media experiences (Riles et al., 2018). Audience duplication data have also been treated as networks in media systems (Ksiazek, 2011; Mukerjee et al., 2018). Specifically, media outlets are represented as nodes. Tie presence indicates that the level of audience duplication between a pair of nodes exceeds random chance or a certain threshold. Empirical evidence has revealed that shared traffic among websites can be explained by cultural identity and political economy (Wu and Taneja, 2016), language and geographic similarity (Taneja and Webster, 2016), generation gaps (Taneja et al., 2018), and technological infrastructure (e.g. hyperlink networks; Taneja and Webster, 2016).
Prior work has analyzed the extent to which audiences of leading SNSs overlap with those of other websites (Taneja et al., 2018). Considering that most existing literature focuses on top web domains (e.g. the world’s 1000 most popular websites; Taneja and Webster, 2016; Wu and Taneja, 2016) and that there is much variation in audience size among SNSs, what remains largely unaddressed is how the patterns of audience duplication among SNSs are different from those among the most popular sites. Answering this question is important because structural configurations of audience duplication may vary from one website type to another (Webster and Taneja, 2018). Mapping these structural features can also help to directly assess levels of audience fragmentation and centralization in the evolving SNS landscape.
In addition, while the evolution of shared traffic has also been examined from a network perspective (Majó-Vázquez et al., 2017, 2019; Wu and Taneja, 2016), few attempts have been made to identify predictors of tie changes through inferential network analysis. Network evolution proceeds through two fundamental changes: (a) tie formation and (b) tie dissolution (Xu et al., 2020). From the perspective of media substitution theory (Lin, 2004; Nam and Barnett, 2010), tie formation in the audience duplication network signals that two media outlets supplement each other. By contrast, the dissolution of existing duplication ties implies the displacement of one outlet by another due to superior socio-technical features. The rate of evolution in the audience duplication network is low if existing ties are highly stable and new ties are rarely created. Dynamic network modeling can capture both formation and dissolution stages in evolutionary processes. As the factors that constrain tie formation can be different from those that constrain tie dissolution (Krivitsky and Goodreau, 2019), modeling both formation and dissolution mechanisms of audience duplication offers a more complete view of network evolution.
This study contributes to the existing literature by exploring the factors that drive the evolution of audience duplication among SNSs. Network scholars treat both endogenous structural features and exogenous nodal attributes as potential drivers of network evolution (Krivitsky and Goodreau, 2019; Monge and Contractor, 2003). Endogenous structural features refer to “characteristics of the relations within the network that are themselves used to explain the structural tendencies of that relation” (Monge and Contractor, 2003: 55). Previous research on audience duplication has yet to consider such network self-organization mechanisms (e.g. preferential attachment). Exogenous nodal attributes focus on how properties outside a focal network (in the present case, an SNS’s audience size and niche width) shape network formation and dissolution.
Theory of preferential attachment
Preferential attachment has been recognized as an underlying self-organization mechanism of network formation. The theory of preferential attachment (Barabási and Albert, 1999) states that already well-connected nodes have advantages over poorly connected nodes in attracting additional ties, which may result in an increasingly skewed distribution of degree centrality (measured as a node’s number of direct ties) over time. This rich-get-richer phenomenon has been observed in a wide range of empirical networks (Barabási and Albert, 1999). Communication scholars have also documented evidence of preferential attachment in political (Peng et al., 2016), organizational (Weber, 2012), and computer-mediated (Stephens et al., 2016) communication networks. However, it remains unclear whether the evolving audience duplication network exhibits the same pattern. Given abundant empirical evidence from network research, this study expects that preferential attachment constitutes a key mechanism in the emergence of audience duplication among SNSs. Thus, it is hypothesized as follows:
Audience size, long tail, and the theory of double jeopardy
Audience size reveals the popularity of a media outlet (Taneja et al., 2018; Taneja and Webster, 2016). Prior work has demonstrated that the media market remains highly concentrated even with the increased abundance of media choices afforded by digital technology (Webster, 2014). Specifically, performance metrics (e.g. monthly reach) of discrete media outlets or offerings conform to long-tail patterns, such that top choices explain the overwhelming majority of the total traffic, while those in the tail receive limited audience attention (Webster, 2014; Webster and Ksiazek, 2012). However, the uneven distribution of popularity does not indicate the degree to which audience members move across available media options. Prior work has started to explore the relationship between audience size and audience duplication in the digital environment. For instance, Webster (2014) offered evidence of a “massively overlapping culture” characterized as very high levels of audience duplication among news and entertainment media, suggesting that “the audience of any given outlet, popular or not, will overlap with other outlets at a similar level” (Webster and Ksiazek, 2012: 49). Mukerjee et al. (2018) challenged this claim by demonstrating that the audience duplication network consisting entirely of news outlets exhibited a core-periphery structure in which “a small core of outlets that act as the primary source and then a much larger periphery of secondary sources, distributed across several layers of decreasing reach” (p. 43). They found that the pattern of massive overlap was observed only at the core of the network. These inconsistent results may be explained by differences in (a) website types included in the empirical analysis and (b) methodological approaches to constructing audience duplication networks (Webster and Taneja, 2018). This study contributes to this line of research by utilizing advanced statistical tools for inferential network analysis to test the relationship between an SNS’s audience size and its tendency to share traffic with others over an extended period of time. This leads to the following research question:
Communication network scholars have emphasized that the impact of a nodal attribute on network change should be analyzed at both actor and dyadic levels (Monge and Contractor, 2003). In the present case, the likelihood of tie formation and dissolution among SNSs can be driven by their pairwise similarity in audience size. This hypothesis is guided by the theory of double jeopardy (McPhee, 1963) and its extension to the digital media marketplace (Elberse, 2013; Taneja, 2020). This theory states that consumers of brands with a small market share are less loyal than those of popular brands (McPhee, 1963). Consistent with this prediction, media researchers have refuted the myth of the “small but loyal” audiences of media products in the long tail (Napoli, 2003; Taneja, 2020). In fact, such alternatives have a double disadvantage in the media market: First, most people have little familiarity with these options; second, unpopular offerings are appreciated less than popular ones even when audiences develop knowledge of the former. Empirical evidence has confirmed that popular websites attract heavy users, and this popularity-based loyalty holds uniformly over diverse website categories (Taneja, 2020). More importantly, an item in the long tail is typically used in conjunction with a hit product. In other words, those who venture into the long tail are also consumers of items designed for mass appeal (Elberse, 2008, 2013). These findings suggest an increased probability of shared traffic between SNSs that are distant to each other in the popularity distribution. At the same time, consumers of unpopular items are hard to please because they are familiar with alternatives in a specific area (Webster, 2014). They tend to give significantly lower ratings to unpopular items than to best-selling items (Elberse, 2008, 2013). This gap in appreciation poses a threat to the stability of shared traffic between SNSs that differ significantly in popularity. Thus, it is predicted as follows:
Niche theory and niche width
Niche theory defines an ecological niche as a position in the multidimensional resource space that supports the survival of a biological species (Hannan and Freeman, 1977). Communication scholars have applied this ecological term to explain organizational interdependence, segregation, positioning, and survival in the evolving media landscape (e.g. Dimmick, 2003; Weber et al., 2016). The concept of niche has also gained much legitimacy among media practitioners, which can be reflected in the widespread usage of phrases such as “niche media,” “niche content,” and “niche products” designed to target small or narrowly defined audiences (e.g. Anderson, 2006). Opposite from the mass, the niche is generally understood as a small market segment defined in terms of demographic, psychographic, and behavioral characteristics of target consumers. However, this distinction between the mass and the niche is conceptually problematic because even a mass-appeal producer occupies niches in the resource environment. According to niche theory, mass-appeal content providers rely on a wide range of niches in the environment, while the so-called niche offerings have narrowly bounded niches (Dimmick, 2003). To ensure conceptual clarity, ecological researchers typically use niche width (measured as the total number of niches an actor inhabits out of the total number of niches that exist) to capture the volume of the niche space covered by an actor (Carroll et al., 2002). As media organizations compete for audience attention in the marketplace (Napoli, 2003), the multidimensional niche space in the media industry can be reflected by salient audience characteristics such as age, occupation, needs, interests, and genre preferences (Dimmick, 2003; Weber et al., 2016). Generalism (niche broadness) and specialism (niche narrowness) are the two ideal types in the spectrum of niche width. A media outlet that targets the so-called niche market is a specialist because it depends on a narrow range of niches for survival. By contrast, a generalist caters to the taste of the general public.
Media offerings in the long tail are often treated as “niche products” (Anderson, 2006; Webster, 2014). While many smaller, newer media outlets are at the tail end of the distribution, an outlet’s niche width is not necessarily associated with its audience size. Given the uneven distribution of resource richness in the niche space (Carroll et al., 2002), a narrowly defined target does not mean that a focal site cannot attract many users. For instance, LinkedIn is an employment-oriented specialist in the SNS industry (Weber et al., 2016) while gaining much popularity among working professionals. Thus, equating options in the long tail with niche products hinders our understanding of niche processes in the media market. To better capture the theoretical underpinnings of an ecological niche, this study tests the impact of niche width on the evolution of audience duplication among SNSs.
Communication scholars have demonstrated that niche width is a significant predictor of the creation of inter-organizational connections among non-governmental organizations (NGOs; Lee and Monge, 2011; Shumate and Lipp, 2008), but it remains unknown how far this finding generalizes beyond NGOs to other organizational types. While the existing media literature has confirmed that niche width (generalism vs specialism) well explains the coexistence of old and new media (Dimmick, 2003), the frequency of news coverage of local organizations (Lowrey and Kim, 2016), and audience appeal (Nelson, 2018), little attention has been paid to niche foundations of relational dynamics among media organizations. This study aims at exploring how a change in niche width will result in a difference in the probability of tie formation and dissolution in the audience duplication network among SNSs. The impact of niche width is estimated at both actor and dyadic levels, which is in line with Monge and Contractor’s (2003) guidelines for inferential network analysis. It is thus proposed as follows:
Method
Sample and procedure
The audience data were obtained from comScore’s Media Metrix (MMX) Multi-Platform, a panel-based product that captures the same person’s media usage across digital devices through advanced metering technology. ComScore aggregates audience data at the level of media outlets and provides monthly metrics such as reach and average views per visit/visitor. Current data subscribers can access media metrics in each of the last 48 calendar months. As of September 2020, comScore’s representative panel in the US market had included 296,594 desktop users, 19,547 Android users, 5903 iPhone users, and 4783 iPad users (comScore, 2020). Compared with self-reported data and traditional observational data (e.g. desktop-only measurements), comScore’s MMX Multi-Platform offers “an unduplicated view of total audience behavior across desktops, smartphones and tablets” (comScore, n.d.) in an unobtrusive manner and thus better captures user activities. Empirical evidence has also confirmed that surveys and desktop-only observational metrics underestimate online media exposure (González-Bailón and Xenos, 2020).
Given that no single archival database offers a complete census of the SNS industry, this study generated a sample of SNSs from multiple sources. First, a list of 191 SNSs founded prior to 2009 was gathered from Weber et al.’s (2016) online supplements. Second, comScore assigns media outlets to the category of “social networking.” The company established the categorical listing in 2007 to help its subscribers to monitor the relative performance of SNSs. The list is evolving due to multiple cases of market entry and exit over years. All media outlets on the list during the 2016–2019 period were retrieved from comScore’s database. Category members were then manually checked, and those that did not incorporate any social networking features were removed (e.g. cant-not-tweet-this.com and revolvy.com). Third, several related terms (e.g. social networking site, social networking website, and social networking) were entered into Statista’s search engine (https://www.statista.com/). Statista is a global leading provider of business data. All SNSs mentioned by Statista’s industrial studies and reports were recorded. The unduplicated list of SNSs from the above three sources was then supplemented by Wikipedia’s existing lists of SNSs (e.g. “List of social networking websites” and “American social networking websites”). Social networking services running in mobile apps only (e.g. Snapchat) were excluded from further analysis because the ability to offer web-based services is a defining feature of SNSs (boyd and Ellison, 2007; Weber et al., 2016).
This study analyzed audience duplication among SNSs at four discrete time points (September of 2016, 2017, 2018, and 2019). The choice of September is consistent with previous research using comScore data (e.g. Majó-Vázquez et al., 2017; Wu and Taneja, 2016). Despite a large panel size, duplication estimates from comScore are largely unreliable if a specific media outlet does not reach 0.01% of the total online population in a given month (Majó-Vázquez et al., 2019; Mukerjee et al., 2018). A total of 96 SNSs in the sample met the 0.01% threshold throughout the observation period (see the Supplemental File for the full list). The total number of shared audiences between these SNSs was then downloaded from the “Cross Visiting” tab in the comScore database. This metric provided a baseline for constructing the evolving audience duplication network. While Alexa data have also been used to compare Internet usage patterns (e.g. Barnett et al., 2011), Alexa’s audience overlap score does not reflect the actual size of shared audiences between sites.
Network construction
Dynamic network modeling requires that a focal network should have the same node set over time (Krivitsky and Goodreau, 2019). The audience duplication network dataset was organized in the form of four 96 × 96 matrices. In the four discretely observed networks, nodes refer to SNSs, and ties indicate that the degree of audience duplication between SNSs significantly deviated from randomness at a time point. Ties in audience duplication networks can be measured in different ways (Ksiazek, 2011; Majó-Vázquez et al., 2017). This study adopts the absolute duplication method to construct undirected network data. Absolute duplication refers to “the percentage of the total audience that is exposed to both outlets in a given pair” (Ksiazek, 2011: 240). This undirected version of pairwise audience duplication is desirable when researchers also analyze differences in audience size (Majó-Vázquez et al., 2017). A recent line of research has proposed methodological improvements to the construction of audience duplication networks by applying the filtering techniques that rely upon the phi coefficient (Majó-Vázquez et al., 2017; Mukerjee et al., 2018) or backbone extraction (Majó-Vázquez et al., 2019). These techniques assess the statistical significance of the observed duplication and eliminate non-significant duplication ties that result from random browsing behavior. The thresholding approach advocated by Majó-Vázquez et al. (2017: 291) and Mukerjee et al. (2018: 35) was employed to determine the significance of each tie in the four network matrices. As dichotomizing ties is a precondition for modeling network dynamics (Krivitsky and Goodreau, 2019), the significant (non-significant) duplication ties were then recoded as “1” (“0”). The final network dataset had a density of 0.51, 0.49, 0.48, and 0.41, and a clustering coefficient of 0.79, 0.77, 0.73, and 0.68 in 2016, 2017, 2018, and 2019, respectively. Figure 1 visualizes the audience duplication network among 96 SNSs at four different time points. Descriptive statistics of a node’s degree centrality are also reported.

A visualization of the evolution of the audience duplication network among 96 SNSs, 2016–2019.
Measures
Dependent variables
The dependent variable of a dynamic network model is a binary network (Krivitsky and Goodreau, 2019). In the present case, the audience duplication network observed in September of each year is treated as the dependent variable. Tie formation and tie dissolution refer to the addition of new ties and the deletion of existing ties, respectively.
Independent variables
Preferential attachment (H1) is measured as the geometrically weighted degree distribution (GWDegree) parameter in dynamic network modeling (Krivitsky and Goodreau, 2019). This structural parameter can be regarded as a negative indicator of preferential attachment, such that a negative and significant coefficient indicates a tendency for nodes with more existing ties to attract even more ties in the future (Hunter, 2007).
As a time-varying nodal attribute, audience size (RQ1 and H2, M = 3.23, SD = 0.93, Min = 1.47, Max = 5.37) is operationalized as the number of individuals who accessed a focal SNS through desktops, smartphones, or tablets at a given time point. This variable was estimated by comScore’s MMX Multi-Platform and was further log-transformed in statistical analysis.
Niche width (RQ2 and RQ3, M = 0.51, Min = 0, Max = 1) is a binary variable indicating an SNS’s generalism (coded as “1”) or specialism (coded as “0”) in the resource environment. After receiving necessary training, two graduate students independently evaluated an SNS’s self-claimed target audience. An SNS was manually coded as a specialist if its target audience had distinct lifestyle and interests (e.g. “music producers,” “artists,” and “gamers”) or certain demographic (e.g. “mothers and mothers-to-be”) and professional (e.g. “physicians and advanced practice clinicians”) characteristics. Otherwise, it was coded as a generalist (see also Lai, 2014; Weber et al., 2016). The two coders agreed upon 88 of 96 (91.67%) cases. The remaining eight cases where there was no inter-coder consensus were discussed, and the coding decision was finalized by the leading researcher.
Control variables
Multiple structural features and nodal attributes are controlled. Following the model-building procedure outlined by Robins et al. (2007), this study modeled the four most important network typologies: edges, 2-star, 3-star, and triangle. The edges parameter reflects a baseline propensity for tie formation or dissolution. A 2-star is a subset of three SNSs in which one SNS is linked to each of the other two. Similarly, a 3-star is a subset of four SNSs in which one SNS is connected to the others. The 2-star and 3-star parameters consider the role of network centralization in driving the process of tie dissolution. They are not included as predictors of tie formation because neither of them contributes to model convergence and fit. Another reason is that the GWDegree parameter, defined as the higher order configuration of degree distribution, has already captured the tendency for network centralization. The GWDegree parameter is not included as a control when modeling tie dissolution due to its lack of interpretability (Levy, 2016). Finally, the triangle parameter treats network closure as an important self-organization mechanism in network evolution. In the present case, a triangle is a subset of three SNSs in which each SNS has two direct ties.
As for nodal attributes, this study controls the effects of organizational age (e.g. Weber, 2012) and location (e.g. Shumate and Dewitt, 2008) on network evolution. Age (M = 11.73, SD = 11.73, Min = 1, Max = 24) varies with time and is measured as the total number of years since the founding of an SNS. Location is a categorical variable denoting where the main server of an SNS is located. About 71.88% (69 of 96) of SNSs host their servers in the United States. The remaining sites are from 15 countries including Canada, China, France, Germany, Japan, Korea, New Zealand, Russia, and the United Kingdom.
Analytical procedures
The separable temporal exponential random graph model (STERGM), an extension of the cross-sectional ERGM using Markov chain Monte Carlo (MCMC) maximum likelihood estimates, was employed to analyze the formation and dissolution of the audience duplication network among 96 SNSs. The STERGM and the stochastic actor-oriented model (SAOM) are the two most widely used tools for modeling network dynamics in discrete times. Unlike the SAOM, the STERGM does not assume that nodes have agency in changing ties or have full knowledge about the network (Leifeld and Cranmer, 2019). In the present case, the audience duplication network is a media-level projection of SNS choices of comScore’s panelists. Duplication ties are implicit in nature because they are constructed indirectly through the projection method. By contrast, ties in traditional media networks (e.g. hyperlink networks) are directly initiated by nodes. While SNSs can coordinate their activities to attract audiences, site owners have limited information about their connections in the audience duplication network. Thus, the assumption of the SAOM is likely to be violated. Another benefit of the STERGM is its ability to disentangle tie formation and dissolution. Specifically, the STERGM fits two separate models to explain network dynamics: one for tie formation, and the other for tie dissolution. It also allows the factors that drive formation to be different from those that drive dissolution (Krivitsky and Goodreau, 2019).
The STERGM treats network change as a function of parameters of (a) endogenous structural features and (b) exogenous nodal attributes. In the present case, endogenous parameters include GWDegree (H1), edges, 2-star, 3-star, and triangle. Exogenous parameters include an SNS’s audience size (RQ1 and H2), niche width (RQ2 and RQ3), age, and location. Location was dichotomized (1 = USA and 0 = others) so that correlations between the four nodal attributes could be run at each time point. Spearman’s rank correlation coefficient ranged from −.14 (the relationship between audience size and age in 2019) to .34 (the relationship between audience size and location in 2016) among the four variables. The STERGM also allows for the specification of attribute-based main (actor-level) and homophily (dyadic-level) effects. A positive (negative) and significant main effect indicates that SNSs with a certain attribute are more (less) likely to form or dissolve ties in the audience duplication network. The difference parameter (for non-categorical attributes) and the match parameter (for categorical attributes) can be used to test the homophily effect, that is, an increased likelihood of tie formation and dissolution between SNSs with similar characteristics. Both main and homophily terms of nodal attributes were included in model estimation. The final data analysis was performed using the statnet packages in R (Handcock et al., 2018).
Results
Table 1 presents the final results of the STERGMs predicting the evolution of the audience duplication network among 96 SNSs during the 2016–2019 period. The total amount of explained deviance was 33.01%—(9610 − 6438) / 9610—for the formation model (Model 1) and 35.14%—(9355 − 6068) / 9355—for the dissolution model (Model 2). Coefficient interpretation of a STERGM is similar to that of a logistic regression. A significant coefficient (log odds) indicates that a specific structural feature or nodal attribute (represented by an endogenous or exogenous parameter) produces an effect on the likelihood of tie formation or dissolution in the observed network. A parameter is statistically significant if the ratio of its coefficient (log odds) to standard error is greater than 1.96 in absolute value.
The STERGMs predicting the formation and dissolution of the audience duplication network among 96 SNSs, 2016–2019.
AIC: Akaike information criterion; BIC: Bayesian information criterion; STERGM: separable temporal exponential random graph model; SNS: social networking site; GWDegree: geometrically weighted degree distribution.
*p < .05, **p < .01, ***p < .001.
Location was transformed into a binary variable (USA vs others) when evaluating its main effect on tie formation and dissolution. The match parameter was employed to detect the presence of location-based homophily in the observed network.
A positive and significant match (difference) parameter indicates the presence of a homophily (heterophily) effect.
H1 stated that tie formation in the audience duplication network would follow the principle of preferential attachment, such that an SNS that already had many ties would be more likely to form ties in the future. Model 1 revealed that the coefficient of the anti-preferential attachment parameter (GWDegree) was negative and significant (log odds = −2.44, SD = −0.85, p = .004). Thus, H1 was supported.
RQ1 asked the main effect of an SNS’s audience size on network evolution. Model 1 showed that audience size was not a significant driver of tie formation (log odds = 0.002, SD = 0.03, p = .94). Model 2 revealed that audience size significantly constrained the probability of tie dissolution (log odds = 0.43, SD = 0.02, p < .001). A one-unit increase in log-transformed audience size would lead to an increase in the odds of tie dissolution by 53.73% (exp(0.43) − 1).
H2 predicted that an increased gap in audience size between SNSs would result in an increased likelihood of tie formation (H2a) and dissolution (H2b) in the audience duplication network. The final results confirmed that the difference parameter of audience size was a positive and significant predictor of both events (tie formation: log odds = 0.32, SD = 0.04, p < .001; tie dissolution: log odds = 0.09, SD = 0.04, p = .04), supporting H2. Specifically, a one-unit increase in pairwise dissimilarity in log-transformed audience size was associated with an increase in the odds of tie formation and dissolution by 37.71% (exp(0.32) − 1) and 9.42% (exp(0.09) − 1), respectively.
RQ2 asked the relationship between niche width and the tendency for SNSs to form or dissolve ties. Generalist SNSs established more new ties with others than did specialist SNSs (log odds = 0.14, SD = 0.04, p < .001). At the same time, the former tended to dissolve fewer existing ties than did the latter (log odds = −0.24, SD = 0.04, p < .001). These results answered RQ2.
RQ3 asked how pairwise similarity in niche width would constrain the likelihood of tie formation and dissolution. The empirical findings lent support to the homophily effect. Specifically, generalists (specialists) were more likely to have audience duplication with other generalists (specialists) (log odds = −0.19, SE = 0.06, p < .001). In addition, a generalist–generalist or specialist–specialist tie had a higher rate of dissolution than did a generalist–specialist tie (log odds = −0.27, SD = 0.01, p < .001).
MCMC diagnostic statistics showed that the joint p value (the higher, the better) equaled .87 and .78 for the formation and the dissolution models, respectively. The goodness-of-fit (GoF) test for model statistics revealed that the MC p values of the network parameters in Models 1 and 2 ranged from .76 to 1.00. As none of the MC p values was below the .05 threshold, it was concluded that these parameters well reproduced corresponding network properties in the observed data (Handcock et al., 2018). The GoF plots (see Figure 2) also showed that, with only a few exceptions, the observed distributions of degree and edgewise shared partners were within the 95% bounds of the simulated distributions. In other words, the two STERGMs provided a fairly good representation of the higher order structure of the observed network.

GoF plots for the (a) formation model and (b) dissolution model.
Discussion
The aim of this study is to examine the factors that drive longitudinal changes in audience duplication among SNSs from a dynamic network perspective. A series of hypotheses and research questions were developed to capture the impacts of both endogenous structural features and exogenous nodal attributes on the formation and dissolution of the audience duplication network. Shared traffic data among 96 SNSs from 2016 to 2019 were collected from comScore’s MMX Multi-Platform. The empirical results from the STERGMs confirmed that preferential attachment, audience size, and niche width exerted significant influences on network evolution over the 4-year period.
Preferential attachment reflects the tendency for already well-connected nodes to build a cumulative advantage in attracting additional ties (Barabási and Albert, 1999). While preferential attachment has been observed in many forms of communication networks (e.g. Peng et al., 2016; Stephens et al., 2016; Weber, 2012), it remains unclear whether the formation of the audience duplication network also follows this principle. This study fills this knowledge gap by demonstrating that this network self-organization mechanism explains the emergence of audience duplication among SNSs, as evidenced by the negative and significant anti-preferential attachment parameter in the STERGM. This finding is not in conflict with a decreasing standard deviation of a node’s degree centrality over time (see Figure 1). Preferential attachment does not take concurrent tie-dissolution processes into account and thus does not necessarily result in an increasingly uneven distribution of degree centrality (Barabási and Albert, 1999). Despite the cumulative advantage of well-connected nodes in forming new ties, the 3-star structural configuration dissolved at a high rate during the observation period (see Table 1), which partially explained the reduced skewness. It is also noteworthy to mention that a large (small) audience size is not translated into a high (low) degree centrality (measured as the number of direct ties) in the focal network. For instance, Facebook and Twitter are among the most popular SNSs in the US market, but their 4-year average of degree centrality equaled 8.50 and 28.25, respectively. These numbers were far below the average of a node’s degree centrality in the audience duplication network in each year (see Figure 1). In addition, Pearson’s correlation coefficient between an SNS’s log-transformed audience size and degree centrality was .31, .26, .38, and .41 in the 4 consecutive years, revealing that the two variables were just moderately correlated.
Previous research has produced mixed findings about the relationship between a media outlet’s audience size and its tendency to share traffic with other outlets (Mukerjee et al., 2018; Webster, 2014). This study adds to this stream of literature by running the STERGM to empirically test the main effect of audience size (an indicator of popularity) on tie changes in the audience duplication network among SNSs. The final results demonstrated that SNSs, regardless of their huge differences in audience size, tended to form new ties at a similar rate. At the same time, popular SNSs had a higher risk of losing existing ties than did unpopular SNSs (e.g. Facebook vs Virb), implying that the former became increasingly disconnected in network evolution. The above results, as well as the network’s high average clustering coefficient, offer little evidence of the popularity-based core–periphery structure where a high degree of audience duplication is observed only at the core (Mukerjee et al., 2018). While there was no main effect of audience size on tie creation, descriptive statistics showed that one SNS, on average, was connected to nearly half of the other sites. Nodes with the lowest degree centrality had only two direct neighbors in the network. These structural patterns are not in line with the “massively overlapping culture” (Webster, 2014; Webster and Ksiazek, 2012), stating that almost every media outlet shares a high proportion of the audience with each other.
Drawing upon the theory of double jeopardy and its extension to the media industry (Elberse, 2013; Taneja, 2020), this study further theorizes the mechanism through which pairwise dissimilarity in audience size constrains the probability of tie formation and dissolution when the main effect of audience size is held constant. The final results revealed that SNS choices of the US audience followed a pattern of heterophily, that is, a tendency for two SNSs with dissimilar levels of audience size (e.g. LinkedIn and Xing) to share traffic. Ties between SNSs that differed significantly in audience size were also unstable. Specifically, an increased gap in audience size led to an increased likelihood that an existing tie would disappear at a subsequent time point. These findings demonstrate the applicability of the framework of double jeopardy for understanding the evolution of the audience duplication network.
This study offers a theoretical reflection on the usage of niche-related terms in the media marketplace. The distinction between the mass and the niche can be misleading because it does not accurately reflect the theoretical underpinnings of niche (Dimmick, 2003). Both mass-appeal and the so-called niche media have niches, but the key difference is that the former cover a large volume of niches in the resource space, while the latter rely on a relatively narrow range of niches for survival. Guided by niche theory (Carroll et al., 2002; Hannan and Freeman, 1977), this study uses niche width to capture whether a media outlet’s target audience is broadly or narrowly defined. The empirical results of the STERGMs revealed that generalist SNSs tended to establish more new ties and dissolve fewer existing ties than did specialist SNSs, suggesting that generalists were increasingly connected in the observed network over time. Pairwise similarity in niche width also drove network evolution. Specifically, the rate of tie formation or dissolution was higher for a generalist–generalist (e.g. Facebook and Twitter) or specialist–specialist (e.g. CafeMom and DeviantArt) pair than for a generalist–specialist (e.g. Facebook and CafeMom) pair. In conclusion, this study extends the literature at the intersection of niche theory and organizational communication networks (Lee and Monge, 2011; Shumate and Lipp, 2008) to audience research and further confirms that niche width plays an important role in shaping dynamic changes in the audience duplication network.
Contributions
This study provides new insights into the evolution of SNSs (Barnett, 2011; Weber et al., 2016) by adopting a dynamic network approach to audience duplication. The observed network is a media-level projection of the mutually constitutive relationship between SNSs and users in the media ecosystem. This perspective on the evolution of SNS also contributes to building a communication research agenda that goes beyond user behaviors within a single SNS (Hill and Shaw, 2019).
This study advances our understanding of audience duplication among media outlets in two ways. First, it provides empirical evidence of preferential attachment in the evolution of the audience duplication network. Second, it distinguishes niche width from audience size and further evaluates their relative influences on tie formation and dissolution. This distinction addresses the issue of conceptual conflation and better captures the theoretical underpinnings of niche.
Methodologically, the direct application of the STERGM, an extension of the simulation-based ERGM, to audience research is also novel. While Taneja and Webster (2016) have acknowledged the usefulness of the ERGM, no empirical work has employed this inferential technique to identify predictors of audience duplication. The quadratic assignment procedure (QAP) and regression-based approaches can be utilized to estimate the influences of nodal attributes (e.g. audience size; Taneja and Webster, 2016) on tie presence, but these techniques do not have the ability to model endogenous mechanisms of network self-organization, which may result in biased estimates of exogenous nodal attributes (Monge and Contractor, 2003). By contrast, the ERGM or STERGM treats network change as a function of both endogenous and exogenous factors.
Limitations and directions for future research
While this study offers a nuanced network analysis of the evolution of audience duplication among SNSs, it still has several limitations that should be addressed by future research. For instance, this study focuses solely on SNS users in the United States because my comScore account does not have permission to access representative audience data in other countries. Future work should collect additional data from comScore or other sources and examine global patterns of audience duplication among SNSs.
Since not all existing SNSs are included as nodes in the observed network, the findings of this study are not readily generalizable to SNSs that failed to reach at least 0.01% of the total online population in any given year during the observation period. However, this is an inherent limitation in comScore’s data. Despite the size and diversity of comScore’s panelists, duplication statistics of media outlets that do not have a minimum percentage reach of 0.01% are largely unreliable (Majó-Vázquez et al., 2019; Mukerjee et al., 2018).
Another limitation is the analysis of dichotomous ties in model estimation. There is no doubt that a valued network contains richer information about social structure than does a binary network (Mukerjee et al., 2018). However, the STERGM is designed for predicting binary networks only. One important future direction for network scientists is to extend the STERGM framework to valued networks.
Guided by existing definitions of SNSs (e.g. boyd and Ellison, 2007; Weber et al., 2016), the sample does not include social networking services that run in mobile apps only. Needless to say, the ongoing evolution of the media landscape calls for redefining SNSs, but even an updated definition of SNSs may soon become outdated because social media have faster rates of evolution than do previous media forms (Ellison and boyd, 2013). Future research should pay close attention to continuous socio-technical changes and address the question of whether audience preferences have shifted to a point where social networking apps need to be considered as category members of SNSs.
As a two-mode network (e.g. a media-by-users matrix) can be transformed into two one-mode networks (e.g. media-by-media and users-by-users matrices), a network of media duplication can be created through the user-level projection of media choices. In this network, nodes represent users, and tie strength indicates the extent to which two users share the same set of media outlets. However, user-level characteristics are typically unavailable in aggregate data provided by commercial measurement companies like comScore. An important direction for future research is to explore how the evolution of the media duplication network is determined by both endogenous structural features and exogenous nodal attributes.
Conclusion
Audience duplication reflects the mutually constitutive relationship between media outlets and audiences in the media landscape. This study proposes an integrated framework to explain the evolution of audience duplication among SNSs. A dynamic network analysis was performed to examine SNS choices of the US audience during the 2016–2019 period. The empirical results highlight the importance of preferential attachment, audience size, and niche width in shaping changes in audience duplication over time.
Supplemental Material
sj-pdf-1-nms-10.1177_1461444821993048 – Supplemental material for Evolution of audience duplication networks among social networking sites: Exploring the influences of preferential attachment, audience size, and niche width
Supplemental material, sj-pdf-1-nms-10.1177_1461444821993048 for Evolution of audience duplication networks among social networking sites: Exploring the influences of preferential attachment, audience size, and niche width by Yu Xu in New Media & Society
Footnotes
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Author biography
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
