From Strangers to Friends: Tie Formations and Online Activities in an Evolving Social Network

Abstract

The authors study how strangers become friends within an evolving online social network. By modeling the coevolution of individual users’ friendship tie formations (when and with whom) and their concurrent online activities, the authors uncover important drivers underlying individuals’ friendship decisions and, at the same time, quantify the resulting peer effects on individuals’ actions. They estimate their model using a novel data set capturing the continuous development of a network and users’ entire action histories within the network. The results reveal that similarity (homophily) with a potential friend, the properties of a potential friend's network, and the potential friend’s domain expertise all play a role in friendship formation. Via prediction exercises, the authors find that stimulating anime watching is the most effective sitewide intervention, which leads to the highest overall site traffic and the largest number of active users, and that recommending a friend of a friend as a potential friend is the most effective strategy in stimulating friendship tie formation. In contrast to the common finding for static networks, the results indicate that seeding to users with the most friends is not the most effective strategy to increase users’ activity levels in an evolving network.

Keywords

social network formation product adoption user-generated content peer effects

Online social networks have become an indispensable part of many individuals’ everyday lives. By 2020, there were over 3.6 billion social network users worldwide, with the average user having more than eight accounts across different platforms (Statista 2022a, b). The growing connectedness among users and the increasing number of interactions and activities individuals engage in via online social networks provide ample opportunities for social network companies to increase revenues and profits. In 2020, despite the COVID-19 pandemic, these companies made an estimated revenue of $41.5 billion from social media advertising alone (IAB 2021).

Executives at social network companies use a variety of methods, such as content intervention, friend recommendation, and seeding strategies, to increase network connectedness and user engagement within the network.¹ For example, Facebook suggests content and potential friends to users under “Pages You May Like,” “Suggested For You” posts in news feeds, “People You May Know,” or “Groups You Should Join”; Instagram's suggestions can be found in “Instagram Explore,” “Accounts You May Like,” and “IGTV Discover.” According to both companies’ websites, these personalized recommendations are generated by proprietary algorithms based on content that users have previously expressed interest in and actions that users have taken on the social media platform (for more detail, see Meta 2022a, b).

Despite wide industry practice, the proprietary nature of the recommendation algorithms means that little is known about their effectiveness, and academics have resorted to using simulations (based on model estimates) to evaluate the effectiveness of these strategies. Researchers have primarily studied these strategies in the empirical context of mature (static) social networks, where the number of friendship ties remains stable (e.g., Aral, Muchnik, and Sundararajan 2013; Hinz et al. 2011; Lanz et al. 2019; Trusov, Bodapati, and Bucklin 2010). Their findings may or may not hold for evolving networks, in which an intervention is likely to trigger the formation of new friendships and therefore have a cascading effect on other activities due to the increased network connectedness. This issue becomes especially concerning in fast-growing networks where users rapidly expand their friend connections. For example, Instagram users with more than 1,000 followers saw, on average, a growth of 13% to 16% in their number of followers during the first half of 2019 (Statista 2022d). As a result, to accurately evaluate the effectiveness of recommendation and seeding strategies in evolving networks, one needs to take the endogenous formation of new friendship ties into account; that is, friendship ties are not made randomly between pairs of users, but are rather the result of users’ deliberate decisions considering potential benefits and cost of each new friendship.

To fill this gap, we develop a framework for the coevolution of individual users’ friendship tie formations and their concurrent activities within an online social network and explicitly model the endogenous formation of new friendship ties. An intriguing aspect of making friends in many online social networks (including the one we study in this article) is that people often do not know each other's identities in real life, and their interactions are mostly confined to the online network, which is vastly different from the typical friend-making scenarios in the real life. Thus one might ask how strangers become friends in online networks characterized by anonymity in the first place.² Whereas extant research in the strategic network formation literature, including Christakis et al. (2010) and Snijders, Koskinen, and Schweinberger (2010), tends to explain the formation of friendship ties in real life using only static individual characteristics such as age or gender, it is very likely that an individual's behaviors and opinions (such as product adoptions and content generation) as observed by other people in the online environment are also significant drivers of the decision to form a friendship tie. By taking these factors into account when modeling tie formations in an online social network, our research complements and enriches the strategic network formation literature.

We obtain our data from a special-interest online community for anime (Japanese cartoons) called MyAnimeList.net. This website provides a gathering place for anime fans from all over the world to interact with each other and to form friendships. Because anime fandom is a special interest and anime fans are scattered around the world, the online channel naturally becomes the main venue through which anime fans interact with each other. Therefore, most users of MyAnimeList.net do not know each other before becoming friends online, and the actions they observe on the website are the main drivers of their friendship decisions, making this platform an ideal environment for our research inquiry. We take advantage of this novel data set that documents both the continuous development of the network (i.e., which individuals become friends with each other and when that happens) and all users’ entire activity history on the platform (i.e., anime watching and posting of user-generated content [UGC]). Access to this data set enables us to overcome several modeling constraints present in recent network coevolution models (e.g., Snijders, Steglich, and Schweinberger 2007) and in the strategic network formation literature (e.g., Christakis et al. 2010).³ Most importantly, it allows us to model the friendship network development without the need to simulate the state of the network at each point in time and, as a result, to quantify the effects of users’ time-varying activities on the probability that two individuals become friends.

We model the endogenous formation of a social network and the occurrence of two types of online activities, namely, product adoptions and content generation, over time. More specifically, using a utility-maximizing framework, we model an individual user's daily decisions in three areas: (1) with whom to become friends, (2) whether to watch any anime, and (3) whether to publish a UGC post. We model friendship tie formation between two individuals as noncooperative decisions. Drawing from the network formation and social psychology literature, we consider three types of drivers that impact individuals’ friendship formation utility: characteristics of the potential friend's current network state (referred to as “network properties” in the following), similarity (homophily) between the pair, and the domain expertise of the potential friend. Each individual maximizes their own friendship formation utility, and a friendship is formed if both users agree to it. A user's utilities of engaging in either product adoption or content generation are functions of the user’s past online activities and friends’ activities, which can affect the user’s actions through a direct peer effect (on the same type of activity) and a spillover effect (on the other type of activity). Finally, the three utility functions are connected in two ways: through observed variables and through correlated error terms.

Our results for friendship tie formation reveal that all three drivers (i.e., network properties, similarity, and expertise) affect a user's friendship formation decisions. A focal user is more likely to become friends with similar users, that is, users who watch the same animes and are similar in terms of demographic traits. Although this result echoes previous research studying more traditional friendship settings, such as friendship networks among students, our finding reveals that even in (anonymous) online networks, users who share similar (though unverifiable) demographics are more likely to become friends. Among the three demographics we study, age is the most important driver with the largest marginal effect.

We also find significant positive effects of both network property variables: a focal user is more likely to form a friendship tie with users who are popular and have many friends in common with the focal user. This result stands in contrast to that of Christakis et al. (2010), who find that students are less likely to become friends with popular students. We suspect that the disparate finding is due to our unique empirical context of an (anonymous) online social network, in which popular users are more likely to be “known” by other users, and due to the lower cost of initiating and maintaining many connections in such an environment. Furthermore, we find a significant effect of domain expertise; that is, a focal user is more likely to become friends with a user who publishes many posts. This finding is consistent with the notion that information sharing is the underlying mechanism that drives the expertise effect. Sharing comments and providing feedback about animes through UGC publication is the key to attract friendships using one's domain expertise.

Our results for in-site activities (i.e., product adoptions and UGC production) reveal significant positive peer effects on the focal user: although having more friends does not necessarily make a user more active, having more active friends does increase a user's activity level. Comparing the magnitude of the direct peer effect and the spillover effect, we find that the direct peer effect is the larger driver of a user's in-site activity levels.

Using predictive exercises, we uncover several novel findings regarding content/activity intervention, friend recommendation, and seeding strategies for evolving social networks. First, the most effective friend recommendation strategy is context specific. In other words, it depends on the objectives: while recommending a friend of a friend as a potential friend is the most effective strategy in stimulating friendship tie formation in an evolving network, recommending an active UGC creator as a potential friend works best in increasing anime watching, UGC publishing, and the overall level of in-site activities via friend recommendations. Second, seeding to users who make many UGC posts is, on average, more effective in increasing online activity levels in an evolving network than seeding to well-connected users. This result stands in contrast to the common finding for static networks that indicates that seeding to well-connected users is the most effective strategy in such an environment (see, e.g., Aral, Muchnik, and Sundararajan 2013; Hinz et al. 2011; Trusov, Bodapati, and Bucklin 2010). And, lastly, shutting down the endogenous network formation in an evolving network when assessing the effectiveness of seeding strategies leads, on average, to an underestimation of seeding effectiveness by 41%.

The contribution of this research is twofold. First, our study is the first systematic investigation in marketing that theorizes and quantifies the importance of various drivers of friendship formation decisions among strangers in an online environment. Our finding that all three friendship drivers (i.e., network properties, similarity, and expertise) matter provides strong support for the popular practice of recommending people with common friends, similar traits, and/or domain expertise as potential connections by large social networks, including Facebook and LinkedIn (see also Sun and Taylor 2020). In particular, by quantifying the effects of individuals’ time-varying activities, such as UGC production, on friendship formations through expertise variables, our research enriches the network coevolution and strategic network formation literature. This richer specification is much needed when describing the network development in an online environment, where many people encounter each other through the internet and become friends even though they might never meet in real life.

Second, to the best of our knowledge, this study is among the first ones to assess the effectiveness of various content intervention, friend recommendation, and seeding strategies in an evolving online social network. By modeling the interdependent dynamics between network formation and time-varying online activities (i.e., how tie formations, product adoptions, and UGC productions influence each other), we are able to account for the endogenous formation of friendship ties and, more importantly, the cascading effects of increased network connectedness on other in-site activities. We show that not accounting for the endogenous network formation leads to a severe underestimation of seeding effectiveness. Furthermore, we demonstrate that seeding to the most-connected users, the most effective seeding strategy in static networks, is no longer the best strategy in evolving networks. Therefore, it is crucial for managers to consider the interdependence of network formation and in-site activities when devising optimal stimulation and seeding strategies for an evolving network.

The remainder of this article is organized as follows: Next, we discuss the relevant literature and introduce our data. Then, we introduce our model, estimation approach, and identification strategy. Subsequently, we present our estimation approach and discuss the empirical results. Finally, we perform prediction exercises and conclude with a discussion of limitations and suggestions for future research.

Relevant Literature

In this section, we review the relevant streams of literature on network coevolution, drivers of friendship formation, the identification of peer effects, and seeding strategies in social networks and delineate our research vis-à-vis the findings from extant research.

Network Coevolution

Recent literature on network coevolution typically adopts a continuous-time framework in which network users decide whether to alter their network ties or to perform other actions at random instants in time (Greenan 2015; Lewis, Gonzalez, and Kaufman 2012; Snijders, Steglich, and Schweinberger 2007; Steglich, Snijders, and Pearson 2010). For example, Snijders, Steglich, and Schweinberger (2007) develop a continuous-time first-order Markov model that describes both the formation of a network from an individual's perspective and the incidences of individuals’ other action(s) within the network. In the model, both the network structure and individuals’ actions evolve in a dynamic fashion: individuals are selected at random rates, and each selected individual decides whether to make a change in friendship ties, to perform an action, or to do neither. The randomness in the timing of friendship and activity decisions is necessary in this class of models because of the data limitation of observing the network only at discrete moments (i.e., a few snapshots of the network). As a result, the timing and sequence of tie formations and other actions of users that take place between two observed discrete moments are unavailable to researchers. In an empirical context more closely related to ours, Bhattacharya et al. (2019) extend the work of Snijders, Steglich, and Schweinberger and apply the model to study the coevolution of users’ social network structure and content posting behaviors using monthly data from a major social networking site.

There are several limitations to the modeling approach introduced in Snijders, Steglich, and Schweinberger (2007) and adopted in a number of later studies (e.g., Bhattacharya et al. 2019; Greenan 2015; Lewis, Gonzalez, and Kaufman 2012; Steglich, Snijders, and Pearson 2010). First, since individuals cannot both change their ties and perform other actions at the same time, simultaneous incidences of tie formation and other actions cannot be accommodated in this framework. Second, because of the randomness in the decision timing within a time period (between two sequential snapshots of the network observed by researchers), the effects of time-varying behaviors that happen continuously within the time period cannot be identified, and, as a result, their effects cannot be properly assessed. Third, although Snijders, Steglich, and Schweinberger capture homophily by accounting for observed similarities among users when modeling tie formations, latent homophily (arising from the similarity among friends in their unobserved intrinsic preferences) remains a confounding factor that may bias the effect of friends’ influence.

In this article, we overcome these three limitations by proposing a coevolution model of individuals’ concurrent decisions to both form friendship ties and perform online activities (such as product adoptions and content generation) at each point in time. We are able to do so because we observe the continuous development of tie formations and all concurrent online activities in our novel data. In other words, our data set records the occurrence and timing of all tie formations and other online activities performed within the network. This allows us to capture the effects of any time-varying behaviors while studying simultaneous incidences of these decisions. Furthermore, we are able to account for the latent homophily by explicitly estimating individuals’ unobserved intrinsic preferences for actions absent of their friends’ influence and therefore provide a cleaner identification of peer effects. We are able to do so because we observe users’ actions both before and after they make friends in our data.

Friendship Formation

More broadly, this research is related to the network formation literature. Researchers study network formation using three main modeling approaches: nodal attribute models, exponential graph models, and strategic network formation models.⁴ The first category of models explains the existence of ties and the resulting network structure via similarities among pairs of individuals (e.g., Boguñá et al. 2004; Hoff, Raftery, and Handcock 2002). However, this modeling approach explains the status quo of a network, that is, who is or is not friends with whom, rather than its evolution. Exponential graph models explain the network development on the basis of structural patterns such as triangular connections or transitivity, but do not provide insights into the mechanisms that drive individuals’ tie formation decisions (e.g., Katona and Móri 2006; Mele 2017). These models are well suited for predictions, but not for causal inferences, and therefore not suited for counterfactual analysis. To overcome these shortcomings, strategic network formation models, the most recently developed modeling approach among the three, take the perspective of an individual actor's utility maximization when explaining the evolution of a network, and allow it to depend on the existing state of the network (e.g., Hanaki et al. 2007).⁵ This article falls into the last category of modeling approaches. Note that unlike the network coevolution models discussed in the previous section, strategic network formation models describe only friendship tie formation and not any other activities.

Two notable studies in this stream of literature are Christakis et al. (2010) and Snijders, Koskinen, and Schweinberger (2010). In both, the authors model future states of a network on the basis of characteristics of the existing network state (current friendship ties) and similarity in terms of demographic traits. However, the models proposed in these two studies are not able to capture the effects of users’ changing behaviors and activities on tie formation outcomes because of the simulated network states between (a limited number of) observed snapshots of the network. Christakis et al. (2010) apply their model to study a network of 669 students with 1,541 friendship ties and find that, while having common friends is important for friendship formation, people are less likely to become friends with popular individuals. They also find that people prefer to become friends with people of the same gender and age. Snijders, Koskinen, and Schweinberger (2010) use a data set that contains six snapshots of a network of 32 students from the same class. They find that network characteristics such as transitivity matter in friendship formation. In terms of demographic characteristics, they find that male students are more popular as friends, but having the same gender does not increase one's chance of making a friend.

Studies in the social psychology literature find that similarity (homophily) is an important driver in individuals’ friending decisions in social networks. In this literature, similarity is defined more broadly as a match between two persons’ interests, experiences, backgrounds, or personalities and is not limited to similarity in demographic traits. Similarity between two individuals increases the chance of them becoming friends (e.g., Berscheid and Reis 1998; McPherson, Smith-Lovin, and Cook 2001; Sun and Taylor 2020). To put it differently, “birds of a feather flock together.” In addition, expertise or knowledge in the subject domain increases a person’s desirability as a potential friend. The increased desirability is due to the potential gain from information sharing and learning from a friend who is knowledgeable about the subject domain (Brandtzæg and Heim 2009; Watson and Johnson 1972).

Building on the extant research in the strategic network formation and social psychology literature, we identify three main drivers of users’ friendship decisions in an online social network: (1) network properties, such as out-degrees (i.e., number of friendship ties) and transitivity (i.e., number of common ties); (2) homophily between two users, including similarity in their demographic traits as well as similarity in their interests, such as animes both users have watched; (3) expertise of a user, which is reflected in the current level of activities and captures domain expertise, such as the number of anime-related posts the user has published. Although the first two types of drivers are studied in the existing research, to the best of our knowledge no studies in the extant strategic network formation literature are able to quantify the impact of the last driver on network formation. This constraint is largely due to the data limitation that only one or a few snapshots of the network are available to researchers; therefore it is almost impossible to capture the effects of time-varying expertise variables.

Our research complements and enriches the strategic network formation literature in two important ways. First, by observing the continuous evolution of the network, we do not need to rely on assumptions to simulate the current network state. Second and more importantly, we observe each individual's entire activity history, enabling us to explicitly account for the effects of individuals’ time-varying behaviors in their friendship formation decisions. The latter improvement is especially important because we study network formation in an online environment. In such an environment, individual users’ behaviors and opinions as observed by other individuals are likely to be among the most important drivers of their friending decisions.

Peer Effects

This research is also related to the literature on the identification of peer effects. Making a causal inference of friends’ influence is a challenging task (Manski 1993). Multiple social phenomena can confound the identification of social influence (see Hartmann et al. 2008; Shalizi and Thomas 2011, and discussion in the “Identification” section). Among these phenomena, homophily is probably the most challenging one to account for. Because of similarity, friends may exhibit the same behavior without one necessarily influencing the other. Different approaches have been proposed to account for homophily, including the use of instrumental variables (e.g., Bramoullé, Djebbari, and Fortin 2009; Claussen, Engelstätter, and Ward 2014; De Giorgi, Pellizzari, and Redaelli 2010), the inclusion of correlated group effects (e.g., Lee 2007; Lee, Liu, and Lin 2010; Ma, Krishnan, and Montgomery 2014), the use of exogenous shocks to peers (e.g., Tucker 2008) or exogenous randomness (e.g., Sacerdote 2001), experiments (e.g., Aral and Walker 2011), the incorporation of individual-specific unobserved tastes/preferences (e.g., Ameri, Honka, and Xie 2019; Nair, Manchanda, and Bhatia 2010; Trusov, Bodapati, and Bucklin 2010), modeling the coincidence of network and outcome formation (Badev 2021; Snijders, Steglich, and Schweinberger 2007), and explicitly accounting for the selection bias that arises from endogenous network formation (Hsieh and Lee 2016).

Our approach is similar to that of Hsieh and Lee (2016), who simultaneously model the network formation and social interaction processes. By allowing the unobserved components in both processes to be correlated, they explicitly account for the selection bias that arises from endogenous network formation, and thereby correct biases on estimated peer effects in the social interaction model. In our research, we take a similar approach by allowing the stochastic error term in our network formation equation to be correlated with the error terms in the other two online activity equations in which peer effects are estimated. However, the employed models differ. In particular, Hsieh and Lee model unidirectional tie formation, which results in a simpler model than ours since mutual agreement is not necessary. Furthermore, in contrast to Hsieh and Lee, we are able to account for latent homophily by explicitly estimating individuals’ unobserved intrinsic preferences for actions absent of their friends’ influence. Therefore, we provide a cleaner identification of peer effects. We can do so because we observe users’ actions both before and after they make friends.

Seeding

This research is also related to the literature on seeding. Seeding refers to the determination of whom to target for motivational stimulation with the goal of triggering large information cascades, adoptions, or other types of actions. The most commonly studied seeding strategies are based on network metrics such as “in/out-degree centrality”⁶ or “betweenness centrality.”⁷ For example, Hinz et al. (2011) compare three seeding strategies—stimulating high-degree, low-degree, and high-betweenness individuals with random seeding—in terms of their effectiveness in increasing adoption. They find that seeding to well-connected individuals is the most successful strategy in increasing participation in viral marketing campaigns. Similarly, Aral, Muchnik, and Sundararajan (2013) examine the effectiveness of seeding to high-degree individuals, dense regions of the network, and hubs unlikely to adopt against the effectiveness of random seeding and find that high-degree and dense region targeting generally perform better. Katona (2013) studies seeding strategies in a theoretical framework and shows that highly connected influencers are valuable only if they cover consumers who are not connected to many other influencers. In a more recent empirical study of a music network, Lanz et al. (2019) also find that most music creators do not benefit from seeding to influencers.

Many of the previous studies focus on (one-time) adoption behavior. However, for the growing number of online social platforms, users’ continuous engagement with the website may be more important than one-time adoption behavior. Trusov, Bodapati, and Bucklin (2010) focus on the activity levels of users within an online platform instead of adoption behavior. The authors develop an approach to determine which of a focal user's friends have a significant influence on the focal user's activity level. They find that, on average, only 20% of a focal user's friends influence the focal user and suggest these friends for seeding.

Similar to Trusov, Bodapati, and Bucklin (2010), we add to the existing knowledge on the effectiveness of seeding strategies in increasing user engagement on online social platforms. Going beyond the work of Trusov, Bodapati, and Bucklin, we examine different seeding strategies that not only are based on network metrics, as is common in this literature, but also can depend on users’ activity levels on the website. Furthermore, by modeling the coevolution of friendship formations and users’ actions over time, we not only capture the immediate effect of friends’ influence on each other but also capture how that effect propagates into the development of future friendship ties and into future actions of users.

Data

Our data come from MyAnimeList.net. This website is a consumption-related online community where online interactions are based on shared enthusiasm for a specific consumption activity (Kozinets 1999). MyAnimeList.net was created to allow anime fans to gather and to share their excitement and opinions about animes. The website has developed into one of the most popular platforms for anime fans over the years. Users of the website create a profile page when they join the website. On their profile page, users can share some information about themselves (e.g., age, gender, location) and create a list of the animes they have watched or are watching (which we refer to as a “watch list” in this article).⁸ Users can write reviews of and recommendations for animes. The website also hosts a discussion forum where users can share information and exchange opinions about animes. We classify any post made by a user in the reviews, recommendations, or discussion forum sections of the website as UGC. In addition, users have the option to become friends, which makes it easier for them to access their friends’ pages and to be notified about their friends’ activities, similar to bookmarking and RSS functions in web browsers.⁹ The platform does not have a friend recommendation system.

Anime fandom is a special interest and not very common. As a result, fans use special-interest communities such as MyAnimeList.net to find and connect with other fans. This implies that most users of MyAnimeList.net meet other users for the first time on the website and their interactions happen within the website. Furthermore, this website is a worldwide community and attracts users from many different countries. About half of the users reveal their locations on their profile pages. We can see that users frequently become friends with other users from different countries.¹⁰ This observation further validates our assumption that meetings and interactions among users are mostly confined to the platform.

Estimation Sample

The website was started in 2004 as a private domain. In April 2006, it was moved to a public domain and began to take its current shape. At that time, the website had about 300 users. After its transfer to a public domain, the number of members started to grow quickly (see Figure 1). About a year later, on March 16, 2007, the function of forming friendships was added. At that time, the website had about 450 members, and this number grew rapidly to 2,700 at the beginning of July 2007 and to about 10,000 by the end of 2007.

Figure 1.

Dates Users Joined MyAnimeList.net.

We focus on users who joined the website in the second half of 2007, mainly for two reasons. First, users who joined the website before March 2007 are likely to add each other as friends on the basis of past interactions. To put it differently, had they had the option of adding friends before, they would have done so. Second, existing members might have taken some time to learn about this new feature. Therefore, we start our study period about three months after the introduction of the friending function.

Studying daily friendship formation among all users who joined between July and December 2007 is, however, computationally impossible because the data set would include over 7 billion pair-day observations. One potential solution is to shorten the observation period. However, this approach would result in insufficient variation in the dependent variables. Figure 2 shows the distributions of the number of days between activities of each type. In about 50% of the cases, users add a friend and publish a post more than a month after their last action of the same kind. In 40% of the cases, users watch an anime more than a month after the last watched anime.

Figure 2.

Number of Days Between Activities.

Our solution is to sample from the network.¹¹ We implement this solution via egocentric sampling:¹² We first draw a random sample of 400 users (“egos”) out of 7,419 users who joined the website in the second half of 2007. We then include all of the egos’ friends in our estimation sample. We term the users who are not egos themselves but are friends with an ego “alters” (986 users). Thus, our estimation sample contains 1,386 users (400 egos and 986 alters). Note that egos’ friends can either be egos or alters and that all friends of all egos are included in the estimation sample. Figure 3 illustrates our sampling strategy. As an example, user 2, who is an ego, has three friends: user 1, who is another ego, and users 3 and 4, who are alters.

Figure 3.

User Sampling Strategy.

In the estimation, we model all anime adoptions and all UGC production activities for both egos and alters. For friendship formation, we model all potential ties among egos (e.g., between users 1 and 2 in Figure 3), all potential ties among egos and alters (e.g., between users 1 and 3 in Figure 3), and all potential ties among alters (e.g., between users 3 and 4 in Figure 3). We do not model the friendship formation decisions between alters and users outside our estimation sample, but we do take their friendships into account when creating related explanatory variables.

Given that activities of users in the three areas can be correlated, missing a portion of the network formation for the 986 alters could lead to bias in our estimates related to friendship formation decisions. To alleviate this concern, we estimate separate coefficients for the 400 egos (for whom we model all their tie formations) and for the 986 alters (for whom we do not model the portion of the friendship network that includes users outside of our sample) in all three utility functions (i.e., friendship formation, anime adoption, and UGC production). We only use the coefficient estimates for egos to make inferences.¹³

The observation period is 184 days between July 1, 2007, and December 31, 2007. We have fewer observations for users who joined after July 1, 2007. On average, we observe each user for 140 days.

Representativeness

To evaluate whether our random sample of 400 egos is representative of the entire network of 7,419 users who joined the community in the second half of 2007, we calculate summary statistics of four network structure measures (degree centrality, betweenness centrality, eigenvalue centrality, and transitivity/clustering coefficient) for both groups. We then test whether the average network structure measures for the 400 egos and the 7,419 users are significantly different. The summary statistics and t-values of the mean comparisons are presented in Table 1. We find none of the differences to be significant at $p < .05$ .¹⁴

Table 1.

Network Representativeness.


	Complete Network	Estimation Sample	t-Value
Average degree	4.37	3.81	1.12
Average betweenness centrality	8,986.13	5,234.28	1.50
Average eigen centrality	.0073	.0068	.94
Transitivity	.1986	.1779	1.32

To further validate the representativeness of our estimation sample, we examine the degree distributions among the 400 egos and compare it with the degree distribution in the entire network of 7,419 users who joined the community in the second half of 2007. The distribution graphs are shown in Figure 4. The degree distributions for both groups follow a similar power law distribution with the majority of users having fewer than 20 friends.

Figure 4.

Degree Distributions.

Data Description

Within our sample of 1,386 users, we observe 5,038 ties out of 947,155 possible ties being formed during the observation period and about 68 million daily observations of possible pairs. Figure 5 shows the states of the network for snapshots taken at Days 1, 60, 120, and 184. The nodes represent individual users in the network, and the links between nodes represent friendships ties. Furthermore, the color of a node reflects the quantity of a user's UGC production, and the size of a node reflects the number of animes a user watched. The nodes become darker as users publish more posts, and the size of the nodes increases as users watch more animes. As expected, the nodes become darker, bigger, and more connected over time.

Figure 5.

Network Coevolution over Time.

Table 2 summarizes key statistics of our data. In terms of demographics, 78% of users report their age and are, on average, 19 years of age, and 93% of users report their gender, with 39% being female and 54% being male.

Table 2.

Descriptive Statistics.

	Mean	SD	Min	First Quartile	Median	Third Quartile	Max	N
Age	19.3	5.2	12	17	18.5	23	78	1,088
Gender (% female)	38.5
Gender (% male)	54.0
Gender (% not specified)	7.5
Egos
Number of active days	12.62	15.43	0	3	7	17	101	400
Number of friend adding days	2.55	4.69	0	0	1	3	59	400
Number of anime watching days	9.19	10.37	0	2	5	13	68	400
Number of content generating days	2.47	9.17	0	0	0	1	95	400
Percentage of active days	17.76	17.07	0	5.04	12.90	24.24	100	400
Percentage of friend adding days	3.80	6.19	0	0	1.63	4.60	40	400
Percentage of anime watching days	13.55	14.11	0	3.23	9.70	19.44	85.71	400
Percentage of content generating days	2.56	7.72	0	0	0	.82	61.69	400
Friend adding interval in days	44.26	36.05	1	15	34	64	156	12,414
Anime adding interval in days	25.64	28.20	1	5	14	37	138	20,680
Post adding interval in days	32.07	29.33	1	7	25	48	109	6,330
Alters
Number of active days	24.06	24.42	1	7	16	34	181	986
Number of friend adding days	5.93	6.40	1	2	4	7	48	986
Number of anime watching days	15.55	16.65	0	3	10	22	121	986
Number of content generating days	6.31	17.57	0	0	0	3	181	986
Percentage of friend adding days	22.49	18.36	.54	9.26	17.95	30.43	100	986
Percentage of anime watching days	6.85	8.31	.54	2.17	4.23	8.15	80	986
Percentage of content generating days	14.46	13.38	0	4.35	10.87	21.20	82.22	986
Percentage of content generating days	4.72	12.01	0	0	0	2.78	98.37	986
Friend adding interval in days	48.41	40.36	1	16	37	71	181	78,996
Anime adding interval in days	24.82	29.86	1	5	13	32	163	84,133
Post adding interval in days	47.98	48.96	1	7	28	78	182	40,287
Egos with friends
Number of friends on last day	4.72	7.83	1	1	2	6	94	320
Number of animes on last day	76.49	83.02	0	20	50.5	100.5	522	320
Number of posts on last day	6.13	22.50	0	0	0	2	228	320
Egos without friends
Number of friends on last day	0
Number of animes on last day	46.34	57.22	0	8	24	70.5	313	80
Number of posts on last day	.68	3.19	0	0	0	0	23	80

Figure 6 demonstrates how average activity levels of users who joined the website in the second half of 2007 change over time from the day they joined the website. Figure 6, Panel A, shows a decreasing trend in making new friendship ties. Because one benefit of having friends is the reduced costs associated with learning about the website and new animes, users are more likely to add friends shortly after joining the website. Motivated by this data pattern, we assume that users’ intrinsic propensity to make friends contains two components: a constant intrinsic propensity component and a second component that decreases with membership length. Panels B and C of Figure 6 show the activity trends for anime watching and content generation. Both graphs reveal a rather constant trend over time.¹⁵ These two data patterns prompt us to assume that the intrinsic propensities in anime watching and content generation are constant over time.

Figure 6.

Average Activity Levels of New Users over Time Since Joining.

Users can engage in multiple types of activities simultaneously. On average, egos have 2.6 active days in terms of friend adding, 9.2 active days in terms of anime watching, and 2.5 active days in terms of post writing (see Table 2). In total, they have 12.6 days in which they participate in at least one type of activity. To put it differently, on average, egos are active on about 18% of the days during the study period. We observe a similar pattern for alters albeit with higher average levels of all three activities. As shown on the bottom of Table 2, 20% of egos have no friends at the end of our study period. Egos with friends are more active in both anime watching and UGC production than egos without friends, suggesting the presence of peer influence from friends. Figure 7 shows a Venn diagram of the joint probabilities of each type of activity conditional on engaging in at least one type of activity. Users are active in only one area in 84.16% of the cases. Users are active in two and three of the areas of interest in 14.79% and 1.05% of cases, respectively. Figure 8 plots the percentages of egos engaging in zero, one, two, or three activities every calendar day and reveals constant levels during our study period.¹⁶

Figure 7.

Percentage of Observations with Certain Activities Conditional on Performing at Least One Activity.

Figure 8.

Percentages of Egos Engaging in Zero, One, Two, or Three Activities per Day.

Lastly, Figure 9 shows histograms of individual users’ daily activity intensities conditional on them being active. In more than 80% of the cases, users add only one friend on an active day. Similarly, in about 60% of the cases, users watch only one anime per active day. However, the content generation intensity is higher: users publish one post per active day in about half of the cases and publish two or three posts per active day in about 20% and 10% of the cases, respectively. From these data patterns, we make the simplifying assumption to model anime watching and content generation as binary indicator variables; that is, we model whether a user watches an anime or publishes a post, but not the number of animes watched or posts published by a user, in a given day.¹⁷

Figure 9.

Number of Activities in Each Area per Day.

Model

In this section, we describe how we jointly model a user's decisions to form friendship ties, adopt animes, and generate content.

Tie Formation

We start by describing how we model tie formations among users over time. In each time period (day), a user makes decisions whether to become friends with any other user with whom the user is not friends yet.¹⁸ Because there are usually many users with whom the individual is not friends yet, at each point in time, a user can become friends with multiple users. Note that we model a user's tie formation decisions for each possible friendship pair in each time period and not whether a user makes a friend in a time period.

Suppose that the website has $i = 1, \dots, N$ users and these users can become friends with other users during $t = 1, \dots, T$ time periods.¹⁹ Let $M$ denote the adjacency matrix of the network, which shows the status of ties between each pair of individuals $i$ and $j$ with $j = 1, \dots, N$ and $i \neq j$ . In the matrix, $m_{ijt}$ equals 1 if $i$ and $j$ are friends at time $t$ and 0 otherwise. Ties are bidirectional and symmetric; that is, $m_{ijt} = m_{jit}$ . Furthermore, both users have to agree to become friends. In our data, we do not observe users’ requests for friendship with other users, only the formation of ties on mutual agreement. Therefore our model describes the decision of both users to become friends regardless of who first initiated the friendship (see, e.g., Christakis et al. 2010).²⁰ In our estimation sample, $N = 1, 386$ ( $400$ egos and $986$ alters), $T = 184$ , and M is a 1,386 × 1,386 matrix. Note that the elements of $M$ change over time as ties are formed.

The decision of two users to become friends depends on the utilities both individuals derive from becoming friends (see, e.g., Christakis et al. 2010). User $i$ ’s utility of becoming friends with individual $j$ in time period $t$ , $U_{ijt}^{m}$ , is given by the following:²¹

U_{ijt}^{m} = f (N^{m}, R^{m}, K^{m}, C^{m}, ϵ^{m}),

(1)

where

N^{m}

describes the characteristics of the current network state of user

j

R^{m}

captures the similarity (homophily) between user

i

and user

j

K^{m}

describes the expertise of user

j

, and

C^{m}

is a set of control variables. Finally,

ϵ^{m}

captures the part of the utility of user

i

at time

t

that is observed by the user but not by the researcher.

We assume that individuals have myopic utilities, that is, individuals do not anticipate future states of the network and only care about the current state of the network when deciding whether to form a tie (Christakis et al. 2010; Snijders, Steglich, and Schweinberger 2007). They do not take future links of themselves or the other party into consideration when making the decision to become friends. The assumption of myopic utility is appropriate for large networks in which individuals can meet numerous other individuals at each point in time and the number of future states of the network increases exponentially. Furthermore, users are not limited in the number of ties they can make in an online friendship network. As a result, users independently and nonstrategically form ties if the utility of such ties is positive.

We model the tie formation between two users as a noncooperative decision (e.g., Boucher 2016; Christakis et al. 2010); that is, each pair's decision to become friends depends only on their own friendship utilities and is conditionally independent from friendship decisions of other pairs of users. A tie between user $i$ and user $j$ is formed if and only if both parties decide that it is beneficial for them to become friends; that is,

m_{ijt} = 1 iff U_{ijt}^{m} > 0 and U_{jit}^{m} > 0.

(2)

Recall that the platform does not have a friend recommendation system. A focal user's consideration set of potential friends consists of nonfriend users the focal user naturally encounters on the platform. We do not explicitly model individuals’ consideration sets of potential friends. This represents a limitation of our model. The random sampling is a means to approximate the consideration set in a scalable way, and we control for individuals’ visibility on the platform through observables in the friendship formation utility (see also footnote 25).

Product Adoption and Content Generation

Next, we describe how we model a user's activities on the website. We study the incidence of users’ activities in two broad areas, namely, product adoption and content generation. Let $U_{it}^{pa}$ denote user $i$ ’s utility from watching an anime at time $t$ and $U_{it}^{cg}$ denote user $i$ ’s utility from producing content at time $t$ . Then $U_{it}^{pa}$ and $U_{it}^{cg}$ are given by

\begin{aligned} U_{it}^{pa} = & g (F^{pa}, A^{pa}, C^{pa}, ϵ^{pa}) and \\ U_{it}^{cg} = & h (F^{cg}, A^{cg}, C^{cg}, ϵ^{cg}), \end{aligned}

(3)

respectively. Both utilities depend on a user's friendship network, denoted by

F^{pa}

and

F^{cg}

, respectively, to capture peer effects; a user's past actions denoted by

A^{pa}

and

A^{cg}

, respectively, to capture state dependence; and a set of control variables, denoted by

C^{pa}

and

C^{cg}

, respectively. Finally,

ϵ^{pa}

and

ϵ^{cg}

denote the parts of the utilities that are observed by the user but not by the researcher.

Integrating All Actions

We now present the full model integrating user $i$ ’s actions in all three areas:

\begin{aligned} U_{ijt}^{m} = & f (N^{m}, R^{m}, K^{m}, C^{m}, ϵ^{m}) \forall j = 1 \dots N, i \neq j \\ U_{it}^{pa} = & g (A^{pa}, F^{pa}, C^{pa}, ϵ^{pa}) \\ U_{it}^{cg} = & h (A^{cg}, F^{cg}, C^{cg}, ϵ^{cg}) . \end{aligned}

(4)

Some variables, unobserved by the researcher, might influence more than one type of decision an individual user makes.²² For example, a user might be traveling and, as a result, not spend any time on the website, that is, be inactive in all three areas. To accommodate the simultaneous co-occurrence of activities user

i

does at time

t

, we allow the three error terms in Equation 4 to be correlated, that is,

G = [\begin{matrix} 1 & ρ_{m, pa} & ρ_{m, cg} \\ ρ_{pa, m} & 1 & ρ_{pa, cg} \\ ρ_{cg, m} & ρ_{cg, pa} & 1 \end{matrix}] .

(5)

Utility Specifications

In this section, we present detailed utility specifications for the specific context of our data. We model the utility user $i$ receives from forming a tie with user $j$ as

U_{ijt}^{m} = {\tilde{α}}_{it}^{m} + δ^{m} N_{j, t - 1}^{m} + R_{ij, t - 1}^{m} + γ^{m} K_{j, t - 1}^{m} + λ^{m} C_{ijt}^{m} + ϵ_{it}^{m} .

(6)

In Equation 6,

{\tilde{α}}_{it}^{m}

captures user

i

’s intrinsic preference for making friends at time

t

, that is, the net of user

i

’s benefit and cost of making friends at that point in time. As revealed in Figure 6, users newly joining the website are more likely to add friends compared with users who have already been members of the website for a longer time. This is likely because having friends in the beginning reduces the learning costs associated with navigating the website. Note that

{\tilde{α}}_{it}^{m}

does not depend on

j

; it is identical across all potential friends. We model

{\tilde{α}}_{it}^{m}

as follows:

{\tilde{α}}_{it}^{m} = α_{i}^{m} + κ_{1}^{m} W_{it},

where

α_{i}^{m}

is user

i

’s time-invariant tendency to have few or many friends and follows a normal distribution with mean

{\bar{α}}^{m}

and standard deviation

σ_{α^{m}}

. The variable

W_{it}

denotes the length of time (in days) user

i

has been a member of the website, and

κ_{1}^{m}

captures how the net of benefit and cost of forming friendship ties changes with membership length.

Next, $N_{j, t - 1}^{m}$ captures the properties of the network concerning user $j$ , including the out-degree, operationalized as the cumulative number of friends user $j$ has, and transitivity, operationalized as the proportion of friends that user $j$ and user $i$ have in common.²³

The variable $R_{ij, t - 1}^{m}$ is tie-specific and captures the similarity (homophily) between individual $i$ and individual $j$ . We model $R_{ij, t - 1}^{m}$ as follows:

R_{ij, t - 1}^{m} = κ_{2}^{m} R_{ij, t - 1}^{m} + ζ_{ij}^{m},

where

R_{ij, t - 1}^{m}

captures observed similarity and includes the proportion of adopted animes that users

i

and

j

have in common, and demographic similarity between user

i

and user

j

in terms of age, gender, and country.²⁴ Providing such demographic information is optional in the network we study. Thus we also control for whether age, gender, and country information of both individual

i

and individual

j

are available using three dummy variables. Then,

ζ_{ij}^{m}

captures the unobserved similarity between users

i

and

j

. Note that

ζ_{ij}^{m} = ζ_{ji}^{m}

. The unobserved similarity

ζ_{ij}^{m}

follows a normal distribution with mean

0

and standard deviation

σ_{ζ^{m}}

. The variable

K_{j, t - 1}^{m}

describes user

j

’s expertise and depends only on

j

’s attributes. We operationalize

K_{j, t - 1}^{m}

as the cumulative number of UGC posts user

j

has published. This variable captures the utility gained from information sharing and learning from friends who are knowledgeable about animes (Brandtzæg and Heim 2009; Watson and Johnson 1972).

The term $C_{ijt}^{m}$ contains several variables whose effects we control for. First, we include a weekend dummy. Second, we control for the cumulative number of animes that user $j$ has adopted. Third, we also include a dummy variable indicating whether user $j$ was active on the platform during the previous week. This variable captures user $j$ ’s visibility.²⁵ Fourth, as mentioned previously, we include dummy variables indicating whether user $i$ provided demographic information. Fifth, we include time fixed effects to address common shocks that might result in correlated unobservables affecting friending decisions across users. We operationalize these time fixed effects as week dummies.²⁶ And, sixth, we include a dummy variable indicating whether user $i$ joined the website before July 2007. Lastly, we assume that $ϵ_{it}^{m}$ follows a normal distribution with a correlation matrix as specified in Equation 5.

User $i$ ’s utility from watching an anime, $U_{it}^{pa}$ , is given by

U_{it}^{pa} = α_{i}^{pa} + β^{pa} F_{i, t - 1}^{pa} + γ^{pa} A_{i, t - 1}^{pa} + λ^{pa} C_{t}^{pa} + ϵ_{it}^{pa},

(7)

where

α_{i}^{pa}

represents user

i

’s intrinsic tendency to watch animes and is assumed to follow a normal distribution with mean

{\bar{α}}^{pa}

and standard deviation

σ_{α^{pa}}

. The variable

F_{i, t - 1}^{pa}

captures the effects of user

i

’s friendship network on user

i

’s actions. It includes user

i

’s total number of friends by time

t - 1

, the number of animes watched by all of user

i

’s friends in time

t - 1

, and the number of posts written by all of user

i

’s friends in time

t - 1

.²⁷ Note that we use lagged versions of the variables in

F_{i, t - 1}^{pa}

to address simultaneity concerns in identifying peer effects (see also Ameri, Honka, and Xie 2019; Bollinger, Burkhardt, and Gillingham 2020; Trusov, Bodapati, and Bucklin 2010). Previous research shows that having more friends might directly affect the amount of social activities of network users (Shriver, Nair, and Hofstetter 2013; Toubia and Stephen 2013).²⁸ In addition, the number of animes watched by all of user

i

’s friends captures the direct influence of friends’ activities on user

i

’s product adoptions, while the number of posts written by all of user

i

’s friends reflects the spillover effect of friends’ activities in post publishing on user

i

’s activity in anime watching.

The term $A_{i, t - 1}^{pa}$ captures state dependence in anime watching and is operationalized as a dummy variable that equals $1$ if user $i$ watched an anime at $t - 1$ and $0$ otherwise.²⁹ Furthermore, $C_{t}^{pa}$ contains several controls, including a weekend dummy and week fixed effects. And, lastly, we assume that $ϵ_{it}^{pa}$ is normally distributed with a correlation matrix as specified in Equation 5.

Similarly, user $i$ ’s utility from writing a post, $U_{it}^{cg}$ , is given by

U_{it}^{cg} = α_{i}^{cg} + β^{cg} F_{i, t - 1}^{cg} + γ^{cg} A_{i, t - 1}^{cg} + λ^{cg} C_{t}^{cg} + ϵ_{it}^{cg},

(8)

where

α_{i}^{cg}

is user

i

’s intrinsic tendency to produce content and follows a normal distribution with mean

{\bar{α}}^{cg}

and standard deviation

σ_{α^{cg}}

. The variable

F_{i, t - 1}^{cg}

captures the effects of user

i

’s friendship network on user

i

’s actions and is defined in a similar manner as in Equation 7: it includes user

i

’s total number of friends by time

t - 1

, the number of animes watched by all of user

i

’s friends in time

t - 1

, and the number of posts written by all of user

i

’s friends in time

t - 1

Then, $A_{i, t - 1}^{cg}$ represents user $i$ ’s past activities and contains two variables: a dummy variable capturing state dependence in UGC posting behavior and the (cumulative) number of animes watched by user $i$ by time $t - 1$ . User $i$ ’s past anime watching behavior may influence user’s posting decisions because a user who watches more animes is likely to have more things to write about. As in the previous equation, $C_{t}^{cg}$ contains a weekend dummy and week fixed effects. And, lastly, $ϵ_{it}^{cg}$ follows a normal distribution with a correlation matrix as specified in Equation 5.

Estimation

We show details of the log likelihood derivation in Web Appendix A. The log likelihood of the model is given by

\begin{aligned} LL = \log \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} & \prod_{t = 1}^{T} \prod_{i = 1}^{N} {[Pr (A_{it}^{pa} = 1)]}^{A_{it}^{pa}} [1 - Pr (A_{it}^{pa} = 1)]^{1 - A_{it}^{pa}} \\ \cdot [Pr (A_{it}^{cg} = 1)]^{A_{it}^{cg}} [1 - Pr (A_{it}^{cg} = 1)]^{1 - A_{it}^{cg}} \\ \cdot \prod_{j = i + 1}^{N} {[{[Pr (m_{ijt} = 1)]}^{m_{ijt}} {[1 - Pr (m_{ijt} = 1)]}^{1 - m_{ijt}}]}^{1 - m_{ij, t - 1}} d ϵ d α d ζ, \end{aligned}

(9)

where

A_{it}^{pa}

and

A_{it}^{cg}

indicate the incidence of an activity—anime watching and UGC production, respectively—of user

i

in time period

t

. We estimate our model using simulated maximum likelihood estimation. To calculate standard errors of the parameter estimates, we use the Berndt–Hall–Hall–Hausman estimator, that is, the outer product of the gradient, instead of the numerical Hessian (Berndt et al. 1974). Here, we would like to make two comments on the estimation.

First, for computational reasons, the conventional approach of estimating a model via maximum likelihood estimation and simulated maximum likelihood estimation involves taking the logarithm of the model likelihood to convert an extremely small-value product of probabilities to a sum of the logarithms of these probabilities. This approach cannot be applied to the likelihood of our model for several reasons (see Web Appendix A for details); that is, we cannot convert the product of the probabilities into a sum of the logarithms of these probabilities (see Equation 9). This situation poses a problem for common computing technologies because the likelihood is the product of a very large number of probabilities and is too small in magnitude to be detected.³⁰ To make the likelihood estimation computationally tractable, we use a transformation of the logarithm of a sum of variables to a function of the logarithm of those variables. Details on the transformation and our estimation approach are presented in Web Appendix A.

Second, to speed up the estimation, we use tensorization of large matrices and parallel computing methods to estimate the model. Because of the large size of the data and parallelization, we cannot run the estimation code on conventional computing systems.³¹ We use several large-memory supercomputing servers, including the Texas Advanced Computing Center, the Pittsburgh Supercomputing Center (Nystrom et al. 2013; Towns et al. 2014), and Jetstream cloud computing (Stewart et al. 2015; Towns et al. 2014).³² Details on estimation using supercomputers are presented in Web Appendix B.

Identification

The set of parameters to be estimated is given by ${{\bar{α}}^{m}, {\bar{α}}^{pa}, {\bar{α}}^{cg}, Σ^{α}, σ_{ζ}^{m}, κ_{1}^{m}, κ_{2}^{m}, δ^{m}, β^{pa}, β^{cg}, λ^{m}, λ^{pa}, λ^{cg}, γ^{m}, γ^{pa}, γ^{cg}, G}$ . The identification of ${κ_{1}^{m}, κ_{2}^{m}, δ^{m}, β^{pa}, β^{cg}, λ^{m}, λ^{pa}, λ^{cg}, γ^{m}, γ^{pa}, γ^{cg}}$ is standard. In the following, we first informally discuss the identification of ${{\bar{α}}^{m}, {\bar{α}}^{pa}, {\bar{α}}^{cg}, Σ^{α}, σ_{ζ}^{m}, G}$ and then discuss our identification approach for peer effects.

The mean intrinsic propensities, ${\bar{α}}^{m}$ , ${\bar{α}}^{pa}$ , and ${\bar{α}}^{cg}$ , are identified by average user behavior in each of the three areas across users and across time (see Figure 6). The diagonal covariance matrix of the user random effects, $Σ^{α}$ , is identified by variation in average activity levels across users (see Figure 6). These individual-specific random effects capture unobserved, time-invariant behavior such as a user's tendency to accept friendships or to watch animes.

The correlation matrix of the error terms, $G$ , is identified by variation in the simultaneous co-occurrence of activities in a day (see Figure 8). The term $σ_{ζ}^{m}$ captures the standard deviation of the unobserved similarity between user $i$ and user $j$ and is identified by variation in friendship formation across different pairs of users. And, lastly, conditional on $G$ , the three utilities are separately identified since each action is modeled as a function of other actions in the previous time period.

Identifying influence in social settings is a challenging task (Manski 1993). To be able to identify peer effects, one has to address three factors that can also result in correlated behavior: correlated unobservables, simultaneity, and endogenous group formation. We address correlated unobservables by including week fixed effects and simultaneity by including lagged versions of variables capturing friends’ activities. We address the issue of endogenous group formation by explicitly modeling how users become friends and by accounting for observed and unobserved homophily.

Recall that homophily refers to friends behaving in a similar manner because of their similar preferences and not because of one influencing the other. Similarity in unobserved preferences, if unaccounted for, can lead to correlated errors, which, in turn, lead to upward-biased estimates of friends’ influence. We address this issue by incorporating unobserved time-invariant components, $α_{i}^{m}$ , $α_{i}^{pa}$ , and $α_{i}^{cg}$ , in a user's decisions to form ties, to adopt animes, and to generate content (similar to the approach in Ameri, Honka, and Xie 2019; Nair, Manchanda, and Bhatia 2010; Trusov, Bodapati, and Bucklin 2010). Because we model the incidence of users’ actions and not the specific taste for which product to adopt or what type of content to generate, homophily plays a role only in the frequency of users’ actions, that is, whether they perform an action on each day. For example, two friends are similar to each other if both tend to publish a lot of posts. In our model, this unobserved heterogeneity in the propensity to perform each of the three actions is captured by $α_{i}^{m}$ , $α_{i}^{pa}$ , and $α_{i}^{cg}$ . Motivated by the data patterns shown in Figure 6, we assume that $α_{i}^{m}$ , $α_{i}^{pa}$ , and $α_{i}^{cg}$ are time invariant.³³ Moreover, because many of the users are new to the network, the latent propensities are identified not only by the variation in behavior after any friendship formation but also by behavior before any ties are formed (i.e., when friends’ influence is absent).

Results

We present the estimation results for egos in Table 3. As discussed in the “Estimation” section, we infer separate coefficients for egos and alters in all three utility functions. The complete sets of results for both egos and alters are available in Web Appendix C. The majority of the estimates for egos and alters have the same sign and significance level, indicating that, overall, the explanatory variables have qualitatively similar effects on both groups of users.

Table 3.

Model Estimation Results for Egos.


	Independent	Homogeneous	Main
Friendship Formation
Network properties
User $j$ ’s number of friends by $t - 1$ ^a	.4434***	.4601***	.4512***
	(.0071)	(.0067)	(.0082)
Ratio of number of friends in common with user $j$ to user $i$ ’s number of friends by $t - 1$ ^a	.0727***	.0768***	.1147***
	(.0100)	(.0106)	(.0106)
Similarity
Ratio of number of animes in common with user $j$ to user $i$ ’s number of animes by $t - 1$ ^a	.1842***	.1828***	.1652***
	(.0051)	(.0084)	(.0311)
Dummy for whether user $i$ and user $j$ are within 5 years of age	.2030***	.1904***	.2526***
	(.0191)	(.0185)	(.0216)
Dummy for whether user $i$ and user $j$ have the same gender	.1394***	.1405***	.1263***
	(.0056)	(.0225)	(.0349)
Dummy for whether user $i$ and user $j$ are from the same country	.2240***	.0034	.2338***
	(.0058)	(.0107)	(.0327)
Standard deviation of pair-specific random effect	.0001	.0069	.0004
	(.0043)	(.0040)	(.0044)
Expertise
User $j$ ’s number of written posts by $t - 1$ ^a	.0413**	.0500***	.0540***
	(.0127)	(.0117)	(.0131)
Control variables
Number of membership days by $t$ ^a	−.5192***	−.5396***	−.5384***
	(.0027)	(.0024)	(.0028)
User $j$ ’s number of watched animes by $t - 1$ ^a	−.0198***	−.0154***	−.0202***
	(.0051)	(.0043)	(.0049)
Dummy for whether $t$ is a weekend	−.3105***	−.3038***	−.5239***
	(.0072)	(.0055)	(.0076)
Dummy for whether user $j$ was active from $t - 7$ to $t - 1$	.0801***	.0824***	.1126***
	(.0188)	(.0176)	(.0194)
Dummy for whether both user $i$ and user $j$ indicate their gender	−.2631***	−.2756***	−.2007***
	(.0047)	(.0101)	(.0341)
Dummy for whether both user $i$ and user $j$ indicate their age	−.3187***	−.3199***	−.3907***
	(.0051)	(.0113)	(.0336)
Dummy for whether both user $i$ and user $j$ indicate their country	−.3731***	−.3504***	−.7357***
	(.0069)	(.0154)	(.0260)
Constant	−1.6586***	−1.6650***	−1.5335***
	(.0071)	(.0136)	(.0209)
Dummy for user $i$ having joined before July 2007	.1364***	.1404***	.1538***
	(.0018)	(.0017)	(.0017)
Standard deviation of individual-specific random effect	.0303***		.290***
	(.0004)		(.0004)
Week dummies	Yes	Yes	Yes
Anime Watching
Number of friends by $t - 1$ ^a	−.0193**	−.0170	−.0088
	(.0072)	(.0134)	(.0229)
Number of animes watched by friends in $t - 1$ ^a	.1477***	.1396***	.1506***
	(.0064)	(.0206)	(.0361)
Number of posts published by friends in $t - 1$ ^a	−.0394***	−.0398**	−.0388
	(.0026)	(.0143)	(.0375)
Dummy for whether i watched an anime in $t - 1$	1.1275***	1.1236***	1.1169***
	(.0020)	(.0045)	(.0197)
Dummy for whether $t$ is a weekend	.1465***	.1479***	.1570***
	(.0039)	(.0097)	(.0477)
Constant	−1.8350***	−1.8094***	−1.9329***
	(.0046)	(.0178)	(.0219)
Standard deviation of individual-specific random effect	.0099***		.0099
	(.0016)		(.0336)
Week dummies	Yes	Yes	Yes
Content Generation
Number of friends by $t - 1$ ^a	.1701***	.1723***	.2038***
	(.0107)	(.0105)	(.0175)
Number of animes watched by friends in $t - 1$ ^a	.1278***	.1329***	.1101***
	(.0134)	(.0154)	(.0248)
Number of posts published by friends in $t - 1$ ^a	.1848***	.1872***	.1666***
	(.0152)	(.0146)	(.0237)
Dummy for whether $i$ published a post in $t - 1$	1.9848***	1.9013***	2.2277***
	(.0005)	(.0017)	(.0064)
Number of animes watched by $t - 1$ ^a	−.0526**	−.0276	−.0169
	(.0175)	(.0185)	(.0379)
Dummy for whether $t$ is a weekend	−.0049***	−.0008	.0765***
	(.0011)	(.0040)	(.0094)
Constant	−2.7046***	−2.6870***	−3.0452***
	(.0024)	(.0027)	(.0110)
Standard deviation of individual-specific random effect	.0142***		.0114
	(.0019)		(.0125)
Week dummies	Yes	Yes	Yes
Error Correlation Matrix
Correlation between friendship and adoption		.3245***	.3095***
		(.0028)	(.0032)
Correlation between friendship and UGC		−.0243**	−.0247
		(.0086)	(.0210)
Correlation between adoption and UGC		−.0036	.0007
		(.0192)	(.0472)
Model Summary Statistics
Number of observations	69,020,774	69,020,774	69,020,774
Akaike information criterion	359,515.60	359,501.20	359,180.20
Bayesian information criterion	361,698.39	361,683.99	361,411.14
Log-likelihood	−179,621.80	−179,614.60	−179,451.10

* $p < .05$ , ** $p < .01$ , *** $p < .001$ .

Measured on logarithmic scale.

Notes: Standard errors are in parentheses.

In the following, we focus on discussing the results for the egos. The first column in Table 3 contains the results for a model in which the decisions about the three types of actions of making friends, watching animes, and publishing posts are made independently of each other. The second column presents the parameter estimates for a model in which we allow these three decisions to be correlated, but there is no unobserved heterogeneity among users. The third column depicts the results for our full model, in which we allow for both correlated errors and unobserved heterogeneity among users.

The results across the three different specifications are consistent overall. We therefore focus on evaluating the results for our main model in the third column of Table 3. Potential simultaneous incidences of the three types of actions a user might engage in each day are captured through correlations of the error terms. We find a significant positive correlation between friendship formation and product adoption. We also find a significant coefficient for the standard deviation of the individual-specific random effect in the friendship formation utility, suggesting the presence of unobserved heterogeneity in the intrinsic propensity of making friends. And, as expected, user $i$ ’s net benefit from making friends declines with the length of the user’s membership on the platform. This result suggests that having friends helps reduce the learning costs associated with navigating the website.

Friendship Formation

We start by discussing the parameter estimates for the friendship formation utility. As expected, network properties matter. The effect of the number of friends user $j$ has is positive and significant (marginal effect of $.0003$ ). Given the bidirectional nature of friendships in our data and the operationalization of the out-degree variable as user $j$ ’s number of friends, the direction of the effect is not identified in our model; that is, it is possible that either (1) users with more friends are more attractive as potential friends or (2) users with more friends seek to form more friendships. However, institutional details and descriptive data patterns, which we discuss in detail in Web Appendix D, suggest that (1) is the more appropriate interpretation; that is, well-connected users are desirable candidates for friendship, in our empirical context. The most important pieces of evidence presented in Web Appendix D are that users had no monetary incentives to add more friends (such as influencers have nowadays) and that users mostly form friendships during the first 12 months of platform membership. Users with longer membership duration, even those with many friends, add few friends after the first year.

The result that users with many friends are more attractive as potential friends stands in contrast to the negative estimate found by Christakis et al. (2010) for the number of friends a potential friend has. We suspect that this result is due to the unique context of the online social network environment in which users come from many different countries and do not know one another in real life. In such an environment, popular users are more likely to be “known” by other users (i.e., visible to other users), which is a prerequisite for tie formation. In comparison, students attending the same school usually know each other, so popular students do not have an advantage in this regard. In addition, initiating and maintaining many connections may be less costly in the virtual world than in real life.

Next, the effect of the proportion of friends that user $i$ and user $j$ have in common is significant and positive (marginal effect of $.0008$ ), implying that users are more likely to connect with friends of friends. Two average users who have 50% of friends in common are 44% more likely to become friends than two average users who have no friends in common. This finding is in line with results in the literature (e.g., Aral, Muchnik, and Sundararajan 2009; Bhattacharya et al. 2019; Shalizi and Thomas 2011).

In terms of similarity, the proportion of common animes has a significant positive effect with a marginal effect of $.0008$ , suggesting that common interests increase the probability of forming a friendship. Further, similarity in the three demographic traits (i.e., age, gender, and country) also increases the likelihood of friendship formation, consistent with the finding in Christakis et al. (2010). Comparing the magnitudes of the effects of the three demographics (marginal effects in parentheses), we find that having similar age ( $.0019$ ) is the most important trait, followed by coming from the same country ( $.0014$ ) and then having the same gender ( $.0009$ ). Put together, two users who share one demographic trait are, on average, three times more likely to become friends as two users who do not share any demographic traits. Lastly, the coefficient for the standard deviation of the pair-specific random effect capturing latent homophily between individual $i$ and individual $j$ is positive but insignificant. This finding suggests that our flexible modeling approach, together with the observed variables included in our model, captures the similarity between two individuals well.

User $j$ ’s number of written posts represents this user’s domain expertise and has a significant positive effect on friendship formation, albeit a small one (marginal effect of $.0000$ ). Thus, UGC publishing is conducive to the formation of a social relationship.³⁴

Finally, we find significant effects for all our control variables. The coefficient for the dummy variable indicating whether user $j$ showed any activity during the previous week is positive and significant. One likely reason is that user $j$ ’s activities increased this user’s visibility and awareness among other platform users. User $j$ ’s (cumulative) number of watched animes has a significant negative effect on this user’s formation of a friendship tie with user $i$ , although the effect is very small.³⁵ Further, the weekend dummy has a significant negative coefficient, suggesting that users are less likely to form friendships on weekends.

Online Activities

Next, we discuss our results related to a user's anime watching decisions. The number of animes watched by friends has a significant positive effect on a user's anime watching behavior. For example, if a focal user's friends watch two animes the previous day, the probability that the focal user watches animes the next day increases by 15%. However, the number of friends a user has does not have a significant influence. These results indicate that having more active friends increases a user's activity level because of the presence of direct peer effects. We do not find a spillover effect of friends’ posting behavior on a user's anime watching: a user is not more likely to watch an anime if a friend made a post the previous day. Further, our results reveal positive state dependence in anime watching: a user is more likely to watch an anime if the user did so the previous day. Lastly, the coefficient for the weekend dummy is positive and significant, implying that users are more likely to watch animes on weekends.

We now describe our results related to content generation. Similar to our findings for anime watching, we observe evidence of a direct peer effect on a user's content generation: the number of posts published by friends during the previous day has a significant and positive effect on a user's content generation decision on the following day. We also note a positive spillover effect: the number of animes watched by friends increases a user's UGC production. In addition, the coefficient for the number of friends a user has is significant and positive, suggesting that having more friends makes a user more active in publishing content. Further, there is evidence of positive state dependence in content generation; that is, we observe a significant positive effect of a user's posting on the user’s posting behavior the following day. And, lastly, the weekend dummy has a significant positive effect.

Comparing the magnitudes of the direct peer effect versus the spillover effect using marginal effects, we observe the direct peer effect to have a larger impact than the spillover effect in increasing activity levels for both anime watching and UGC publishing.³⁶ For example, if a focal user's friends write two more posts the previous day, the probability that the focal user also writes a post the next day increases by 26%. In contrast, if a focal user's friends watch two more animes, the probability that the focal user writes a post the next day increases by only 17%.

Model Fit

We evaluate the fit of our models using two criteria: the Akaike information criterion and the Bayesian information criterion. Our main model outperforms the independent model and the homogeneous model on both criteria (see bottom of Table 3). We also compare the (in-sample) predictive performance of the three models using hit rates for three holdout periods: the last 30 days, the last 60 days, and the last 90 days of our sample period. The average hit rates across these three holdout periods for each of the three activities (i.e., friendship formation, anime watching, and UGC production) are presented in Table 4. The average hit rates are very similar across the three models with the main model performing slightly better than the other two models in predicting UGC posts.

Table 4.

Hit Rates.


	Independent Model	Homogeneous Model	Main Model
Friending actions	99%	99%	99%
Adoption actions	72%	72%	72%
UGC actions	94%	95%	96%

Prediction Exercises

For companies operating social networks, advertising revenue represents their primary source of income. In 2020, the industry earned revenues of over $40 billion through advertisements (Statista 2022c). Advertising revenues depend on site traffic: the more active users are, the more ads can be shown to them. In addition, having more active users can increase the appeal of the website to nonusers and lead to continuous growth of the user base. Therefore, it is in platform owners’ best interest to motivate users (or a subset of users if stimulating all users is not feasible) to increase their in-site activities.

Using our model estimates, we examine the effects of stimulating different types of users and different types of in-site activities through a series of prediction exercises. More precisely, we assume that the platform can trigger an increase in any of the three activities of making friends, watching animes, and generating content by, for example, posting a recommendation list on a user's page: the platform can recommend to a user to become friends with other users, to adopt specific animes, or to participate in forum discussions that are active and related to the user's past adoptions or posts. Although we do not observe the log-in or page view activities of a user and, as a result, cannot directly translate the changes in activity levels to changes in ad viewership, as long as users are not spending less time on each activity than they did before the stimulation, an increase in the total activity level will also lead to an increase in the time spent on the website. Furthermore, an increase in the activity level is observable by other users and nonusers of the website and therefore can lead to activity cascades and a growing user base.

Platform-Wide Stimulation

We compare the effects of different platform-wide stimulations on overall activity levels, that is, the sum of activities in all three areas. The overall activity levels serve as a proxy for the total site traffic or total time spent on the site. The predictive exercises are implemented as follows: in each scenario, we increase one type of activity (friendship tie formation, product adoption, or UGC generation) among all egos and all alters by the amount of that activity on a typical day for an average user. To put it differently, our stimulation doubles the amount of activity of a particular type on a given day for each ego and each alter. We do so on Days 30, 90, and 150 and predict users’ behavior going forward until Day 184.³⁷ When presenting our findings, we focus on findings for the egos.

The results are presented in Table 5. The first column shows the changes in the average number of active egos per day, that is, the mean number of egos who perform at least one activity in a day, and the second column depicts the changes in the number of total activities performed by egos. Out of the three types of stimulations the platform can implement (i.e., to recommend friends, animes, or forum discussion topics), stimulating users to watch more animes is the most effective intervention, resulting in the highest overall increase in the number of active users and in the level of in-site activities. This is because anime watching is the most frequent activity of the three in-site activities and because the peer effect of friends’ anime watching is a stronger driver than the spillover effect of users’ anime watching. The effects of stimulating users to make more friends or to post more UGC are very similar in terms of magnitude.

Table 5.

Prediction from a Platform-Wide Stimulation.


Stimulation Activity	Stimulation Day	Percentage Change in Active Users	Percentage Change in Activities
Friendship formation
	30	2.76	1.97
	90	3.23	.43
	150	12.66	2.63
	Average	6.22	1.68
Anime watching
	30	1.80	12.66
	90	4.03	2.36
	150	17.14	6.47
	Average	7.66	7.17
UGC production
	30	1.74	1.69
	90	.43	.44
	150	13.90	3.21
	Average	6.20	1.78

Friendship Recommendations

Friendship recommendation systems are a popular tool used by online social networks such as Facebook or LinkedIn to increase network connectedness (Sun and Taylor 2020). Here, we ask whom to recommend as a potential friend in an evolving social network, that is, what type of individual a platform should suggest for friendship.

We evaluate the effects of forming a friendship with five types of potential friends: (1) popular user, that is, the user with the largest number of friends; (2) friend of a friend, that is, the user who has the most common friends with the focal user; (3) active anime watcher, that is, the user who watches the most animes; (4) active UGC creator, that is, the user who publishes the most posts; and (5) user with similar interest, that is, the user who watches the largest number of common animes with the focal user. Furthermore, this predictive exercise relies on the assumption that a friend recommendation strategy is successful in convincing all egos and alters to accept one recommended user as a new friend on Day 150.³⁸ We then simulate users’ behavior going forward until Day 184, and compare the increases in each of the three activities as well as the overall activity level among all egos.

Among the five friendship recommendation strategies, recommending a friend of a friend is the most effective strategy in stimulating subsequent friendship formation within the evolving network.³⁹ This finding stands in contrast to the common result for static networks, for which recommending a popular user has been found to be the most effective strategy.⁴⁰ A potential reason for our different result is that a popular user does not necessarily share interests with or is not necessarily similar to the focal user. Thus, friends of the popular user are also less likely to be potential candidates for new friendships, thereby limiting the cascading effect. However, a friend of a friend is more likely to share interests with or be similar to the focal user. Thus, friends of a friend of a friend are more likely to be potential candidates for new friendships, thereby enhancing the cascading effect. Our data and estimation results support this explanation: friends of a friend have, on average, 1.5 common friends with the focal user, whereas friends of a popular user have, on average, .3 common friends with the focal user. Friends of a friend also watch a larger number of the same animes than friends of a popular user (9.3 vs. 8.4). And, lastly, we find significant and positive effects of similarity variables in the estimation (see the “Results” section).

Seeding

Previous studies find that users often have varying degrees of activity in different areas (e.g., Iyengar, Van den Bulte, and Valente 2011; Manchanda, Xie, and Youn 2008). Our data confirm this pattern (see Figure 5). For example, a user might make many friends or publish many posts but watch only a few animes. Consequently, carefully choosing whom to target and which type of activity to stimulate are crucial decisions for platform owners in achieving a desired outcome, such as an increase in UGC production.

In this set of prediction exercises, we examine the effectiveness of different seeding strategies in increasing tie formations, anime watching, and UGC production in an evolving social network. For these prediction exercises, we select the 15% most/least active egos among all egos on the basis of their activity levels in each of the three activities (“selection activity”) as our seeding targets.⁴¹ As a benchmark, we also randomly select 15% of egos as the initial seeds. Next, we increase the activity level of these selected egos in one of the three areas of activity (“seeding activity”) by the amount of that activity on a typical day for an average user. This is the same stimulation as in the “Platform-Wide Stimulation” section. We do so on Days 30, 90, and 150 (“stimulation date”) and simulate users’ behavior going forward until Day 184. To recap, we perform a total of 63 prediction exercises, including $3 \times 3 \times 3 = 27$ prediction exercises for the 15% most active egos and $3 \times 3 \times 3 = 27$ prediction exercises for the 15% least active egos, as well as $3 \times 3 = 9$ prediction exercises for our benchmark case of randomly selected users. When presenting our findings, we again focus on the results for egos.

We start by investigating the question of whether platform owners should seed to the most or least active users. We find that seeding to the 15% most active users is a more effective strategy than seeding to the 15% least active users when the goal is an increase in anime watching or in UGC production. However, for friendship formation, seeding to the 15% least active users is a more effective strategy than seeding to the 15% most active users. Seeding to 15% randomly selected users performs the worst for all three outcomes.⁴²

Next, inspired by results for static networks, we evaluate whether seeding to the most-connected users, that is, users with the most friends, is also the most effective seeding strategy in terms of the choice of selection activity in evolving networks. In other words, we examine whether the most-connected users are indeed better candidates for seeding than users who are most active in anime watching or UGC creation in an evolving social network. In answering this question, we focus on the 27 prediction exercises that use the most active users as initial seeds. Our results indicate that, on average, seeding to users who post the most UGC is the most effective seeding strategy, followed by seeding to the most-connected users, and lastly by seeding to users watching the most animes.⁴³ A similar pattern also holds for the individual prediction exercises: in 25 out of the 27 prediction exercises, seeding to the most-connected users is less effective than either seeding to users posting the most UGC or seeding to users watching the most animes. We conclude that, in contrast to the common finding for static networks, seeding to the most-connected users is not the most effective strategy in evolving networks.

Whereas the discussion in the previous paragraph focused on the choice of the best selection activity with the goal of increasing overall activity levels, here we investigate the most effective seeding strategy for each desired outcome in terms of choosing both a selection and a seeding activity. For all three potential desired outcomes of increasing the number of friendships, increasing anime watching, or increasing UGC posting, the most effective seeding activity is the desired activity itself. However, the selection activity does not follow the same pattern. If the goal is to increase the number of friendships, then choosing the users who watch the fewest animes and encouraging them to make more friends is the most effective strategy. To increase anime watching, selecting the most-connected users and encouraging them to watch even more animes is the best strategy. And, lastly, to increase UGC posting, the best strategy is to select users who create the most content and encourage them to post more content.

Accounting for Network Evolution

First, we examine by how much the effectiveness of seeding strategies is underestimated when the endogenous network formation is shut down. To do so, we rerun 54 of the 63 prediction exercises discussed in the “Seeding” section but do not allow users to form new friendships.⁴⁴ We find that not allowing for endogenous network formation leads, on average, to an underestimation of seeding effectiveness by 41%. This result underscores the practical importance of modeling the coevolution of individual users’ friendship tie formations and their concurrent in-site activities to correctly measure seeding effectiveness.

Second, we investigate whether we can replicate the common finding in the literature that seeding to the most-connected users is the most effective strategy to increase online activity levels (anime watching and UGC posting in our case) in static networks. To do so, we focus on the subset of 27 prediction exercises in which users are not allowed to form new friendships and in which the 15% most active users (in terms of friends, anime watching, and UGC posting) are used as initial seeds; that is, we center on the selection strategy. Consistent with the literature (e.g., Aral, Muchnik, and Sundararajan 2013; Hinz et al. 2011; Trusov, Bodapati, and Bucklin 2010), we find that, in static networks, seeding to the most-connected users is, on average and in all but two individual prediction exercises, the most effective strategy to increase online activity levels.

Lastly, we investigate whether our results for the case when both the selection and the seeding strategy can be chosen continue to hold when the endogenous network formation is shut down. For the two desired outcomes of increasing anime watching or increasing UGC posting, we indeed find the same two strategies to be most effective: to increase anime watching, selecting the most-connected users and encouraging them to watch even more animes is the best strategy. To increase UGC posting, the best strategy is to select users who create the most content and encourage them to post more content.

Conclusion

In this research, we study the coevolution of individuals’ friendship tie formations and their concurrent activities within an evolving online social network. Our findings shed light on the important factors that drive strangers to become friends in an online environment. Specifically, our results reveal that all three drivers (i.e., network properties, similarity, and expertise) matter for a user's friendship formation decisions. Further, although having more friends does not necessarily make a person more active, having more active friends does increase a user's activity levels in terms of both product adoptions and content generation through peer and spillover effects. Via prediction exercises, we find that stimulating product adoptions is the most effective sitewide intervention, leading to the highest overall site traffic and the largest number of active users, and that recommending a friend of a friend as a potential friend is the most effective strategy in stimulating friendship tie formation. Contrary to previous studies investigating static networks, we find that seeding to the most-connected users is not the best strategy to increase users’ overall activity levels in an evolving network.

We believe that our results are generalizable in two directions: First, MyAnimeList.net is only one example of an interest-based online community. There are many such online communities, including stand-alone ones (e.g., goodreads.com, boardgamegeek.com, fragrantica.com) and special-interest groups on large social media platforms (e.g., subreddits on reddit.com, groups on Facebook). Second, we believe that the insights about friending/recommendation/stimulation interventions (e.g., evolving vs. static networks) are useful for social networks in general, especially for those that are in a phase of fast growth.

This research has several limitations and presents opportunities for future research. First, we observe a friendship only if both users agree to become friends. In other words, we observe neither the friendship request nor the potential rejection of that request. This is a limitation of our data. As a result, we cannot separately identify whether an increase in a user's number of friends is due to that user's elevated preference to form friendships or due to the user’s increased desirability as a potential friend to other users. Second, our anime watching data are self-reported, and accuracy in the reporting is a potential concern. Although there are no explicit incentives for users of MyAnimeList.net to falsely report their true anime watching behavior, even without explicit incentives, users may misrepresent their watching behavior because of social desirability concerns or the need to appear knowledgeable on the platform. This too is a limitation of our data. Third, we do not observe churn, that is, users abandoning the platform. Again, this is a limitation of our data. Understanding drivers of churn and whether it is contagious is important for online platforms, which are commonly populated with a large number of inactive users.

Fourth, we do not explicitly model time-varying unobserved factors that may impact subsets of users. Some institutional features of the data and model (e.g., users of the website are spread out around the world; we include random effects and time fixed effects to control for individual preference and some unobserved correlated factors) alleviate this concern but do not completely eliminate it. Furthermore, a more robust solution incorporating group-time fixed effects is computationally infeasible given the model structure and the large number of required fixed effects (see Lee 2007; Lee, Liu, and Lin 2010; Ma, Krishnan, and Montgomery 2014). Fifth, we do not allow for heterogeneity in peer influence in our main model. Although we did not find significantly different effects of popular/unpopular friends and active/inactive friends in a robustness check, a more thorough investigation of this important question is left for future research.

Sixth, we model whether users post something on the website or whether they watch an anime, but not the topic or number of posts or which anime they watch. Studying the details of each action may shed further light on the coevolution process of users’ friendship formations and concurrent activities, which we leave for future research. Seventh, we do not consider the length or content of users’ posts. Longer or more detailed posts may imply that the writer is more knowledgeable. Studying the effects of such UGC characteristics would be an interesting extension of our current research. Lastly, we do not model platform growth given our relatively short observation period; that is, we do not model users’ joining behavior, and we assume that it is exogenous. However, in the long run, the popularity of a platform in terms of the size of its user base and volume and variety of its content can change the rate of users joining the website. We hope future research can relax the exogeneity assumption and provide further insights into this research question.

Supplemental Material

sj-pdf-1-mrj-10.1177_00222437221107900 - Supplemental material for From Strangers to Friends: Tie Formations and Online Activities in an Evolving Social Network

Supplemental material, sj-pdf-1-mrj-10.1177_00222437221107900 for From Strangers to Friends: Tie Formations and Online Activities in an Evolving Social Network by Mina Ameri, Elisabeth Honka and Ying Xie in Journal of Marketing Research

Footnotes

Acknowledgments

The authors would like to thank Bryan Bollinger, Randy Bucklin, Brett Hollenbeck, Sylvia Hristakeva, Dmitri Kuksov, Xiaolin Li, Xiao Liu, B.P.S. Murthi, Ram Rao, Brian Ratchford, Peter Rossi, and Upender Subramanian; seminar participants at Cornell University, Duke University, Jinan University, Johns Hopkins University, London Business School, New York University, Tsinghua University, University of Connecticut, University of Pittsburgh, University of Texas at Austin, University of North Carolina at Chapel Hill, University of Rochester, and Virginia Tech; and participants at the 2017 Texas Marketing Faculty Research Colloquium, 2018 INFORMS Marketing Science Conference, 2018 Carnegie Mellon University Conference on Digital Marketing and Machine Learning, 2019 International Forum of Marketing Science and Applications, and 2019 Marketing Dynamics Conference. The authors thank Paul Rodriguez for his assistance with optimization of our code, which was made possible through the Extreme Science and Engineering Discovery Environment (XSEDE) Extended Collaborative Support Service program (). The authors acknowledge the Texas Advanced Computing Center at the University of Texas at Austin for providing high-performance computing resources that contributed to the research results reported in this article. The authors acknowledge XSEDE, which is supported by National Science Foundation grant number ACI-1548562. The authors used the Bridges system, which is supported by National Science Foundation award number ACI-1445606, at the Pittsburgh Supercomputing Center. All errors are the authors’ own.

Associate Editor

Shrihari Sridhar

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Jetstream and Pittsburgh Supercomputing Center Bridges and Bridges-2 (grant numbers SBE170004, SBE180006).

ORCID iD

Elisabeth Honka

Online supplement:

Notes

References

Ameri

Mina

Honka

Elisabeth

Xie

Ying

(2019), “Word-of-Mouth, Observed Adoptions, and Anime Watching Decisions: The Role of the Personal Versus the Community Network,” Marketing Science, 38 (4), 567–83.

Aral

Sinan

Muchnik

Lev

Sundararajan

Arun

(2009), “Distinguishing Influence-Based Contagion from Homophily-Driven Diffusion in Dynamic Networks,” Proceedings of the National Academy of Sciences, 106 (51), 21544–49.

Aral

Sinan

Muchnik

Lev

Sundararajan

Arun

(2013), “Engineering Social Contagions: Optimal Network Seeding in the Presence of Homophily,” Network Science, 1 (2), 125–53.

Aral

Sinan

Walker

Dylan

(2011), “Creating Social Contagion Through Viral Product Design: A Randomized Trial of Peer Influence in Networks,” Management Science, 57 (9), 1623–39.

Badev

Anton

(2021), “Nash Equilibria on (Un)Stable Networks,” Econometrica, 89 (3), 1179–1206.

Banerjee

Abhijit

Chandrasekhar

Arun

Duflo

Esther

Jackson

Matthew

(2013), “The Diffusion of Microfinance,” Science, 341 (6144), doi.org/10.1126/science.1236498.

Berndt

Ernst

Hall

Bronwyn

Hall

Robert

Hausman

Jerry

(1974), “Estimation and Inference in Nonlinear Structural Models,” Annals of Economic and Social Measurement, 3/4 (1), 653–65.

Berscheid

Ellen

Reis

Harry

(1998), “Attraction and Close Relationships,” in The Handbook of Social Psychology, Gilbert

Daniel Todd

Fiske

Susan T.

Lindzey

Gardner

, eds. McGraw-Hill, 193–281.

Bhattacharya

Prasanta

Phan

Tuan

Bai

Xue

Airoldi

Edoardo

(2019), “A Coevolution Model of Network Structure and User Behavior: The Case of Content Generation in Online Social Networks,” Information Systems Research, 30 (1), 117–32.

10.

Boguñá

Marian

Pastor-Satorras

Romualdo

Díaz-Guilera

Albert

Arenas

Alex

(2004), “Models of Social Networks Based on Social Distance Attachment,” Physical Review E, 70 (5), 56–122.

11.

Bollinger

Bryan

Burkhardt

Jesse

Gillingham

Kenneth

(2020), “Peer Effects in Residential Water Conservation: Evidence from Migration,” American Economic Journal: Economic Policy, 12 (3), 107–33.

12.

Boucher

Vincent

(2016), “Conformism and Self-Selection in Social Networks,” Journal of Public Economics, 136, 30–44.

13.

Bramoullé

Yann

Djebbari

Habiba

Fortin

Bernard

(2009), “Identification of Peer Effects Through Social Networks,” Journal of Econometrics, 150 (1), 41–55.

14.

Brandtzæg

Petter Bae

Heim

Jan

(2009), “Why People Use Social Networking Sites,” in Online Communities and Social Computing, Lecture Notes in Computer Science, Vol. 5621, A. Ant Ozok and Panayiotis Zaphiris, eds. Springer, 143–52.

15.

Christakis

Nicholas

Fowler

James

Imbens

Guido

Kalyanaraman

Karthik

(2010), “An Empirical Model for Strategic Network Formation,” National Bureau of Economic Research Working Paper No. 16039.

16.

Claussen

Jörg

Engelstätter

Benjamin

Ward

Michael

(2014), “Susceptibility and Influence in Social Media Word-of-Mouth,” ZEW–Centre for European Economic Research Discussion Paper No. 14-129.

17.

De Giorgi

Giacomo

Pellizzari

Michele

Redaelli

Silvia

(2010), “Identification of Social Interactions Through Partially Overlapping Peer Groups,” American Economic Journal: Applied Economics, 2 (2), 241–75.

18.

Dubé

Jean-Pierre

Hitsch

Günter

Rossi

Peter

(2010), “State Dependence and Alternative Explanations for Consumer Inertia,” RAND Journal of Economics, 41 (3), 417–45.

19.

Gargiulo

Martin

Benassi

Mario

(2000), “Trapped in Your Own Net? Network Cohesion, Structural Holes, and the Adaptation of Social Capital,” Organization Science, 11 (2), 183–96.

20.

Gitler

Isidoro

Klapp

Jaime

, eds. (2016), High Performance Computer Applications: 6th International Conference, ISUM 2015, Mexico City, Mexico, March 9–13, 2015: Revised Selected Papers. Springer International Publishing.

21.

Greenan

Charlotte

(2015), “Diffusion of Innovations in Dynamic Networks,” Journal of the Royal Statistical Society: Series A, 178 (1), 147–66.

22.

Hanaki

Nobuyuki

Peterhansl

Alexander

Dodds

Peter

Watts

Duncan

(2007), “Cooperation in Evolving Social Networks,” Management Science, 53 (7), 1036–50.

23.

Hartmann

Wesley

Manchanda

Puneet

Nair

Harikesh

Bothner

Matthew

Dodds

Peter

Godes

David

, et al. (2008), “Modeling Social Interactions: Identification, Empirical Methods and Policy Implications,” Marketing Letters, 19 (3/4), 287–304.

24.

Heckman

James

(1981), “Statistical Models for Discrete Panel Data,” in Structural Analysis of Discrete Data with Econometric Applications, Manski

Charles F.

McFadden

Daniel

, eds. MIT Press, 114–78.

25.

Hinz

Oliver

Skiera

Bernd

Barrot

Christian

Becker

Jan

(2011), “Seeding Strategies for Viral Marketing: An Empirical Comparison,” Journal of Marketing, 75 (6), 55–71.

26.

Hoff

Peter

Raftery

Adrian

Handcock

Mark

(2002), “Latent Space Approaches to Social Network Analysis,” Journal of the American Statistical Association, 97 (460), 1090–98.

27.

Hsieh

Chih-Sheng

Lee

Lung

(2016), “A Social Interactions Model with Endogenous Friendship Formation and Selectivity,” Journal of Applied Econometrics, 31 (2), 301–19.

28.

IAB (2021), “IAB Releases Internet Advertising Revenue Report for 2020,” (April 7), https://www.iab.com/news/iab-internet-advertising-revenue/.

29.

Iyengar

Raghuram

Van den Bulte

Christophe

Valente

Thomas

(2011), “Opinion Leadership and Social Contagion in New Product Diffusion,” Marketing Science, 30 (2), 195–212.

30.

Jackson

Matthew

(2008), Social and Economic Networks. Princeton University Press.

31.

Jackson

Matthew

Rodriguez-Barraquer

Tomas

Tan

(2012), “Social Capital and Social Quilts: Network Patterns of Favor Exchange,” American Economic Review, 102 (5), 1857–97.

32.

Katona

Zsolt

(2013), “Competing for Influencers in a Social Network,” NET Institute Working Paper No. 13-06.

33.

Katona

Zsolt

Móri

Tamás

(2006), “A New Class of Scale Free Random Graphs,” Statistics & Probability Letters, 76 (15), 1587–93.

34.

Keane

Michael

(1997), “Modeling Heterogeneity and State Dependence in Consumer Choice Behavior,” Journal of Business and Economic Statistics, 15 (3), 310–27.

35.

Kozinets

Robert

(1999), “E-Tribalized Marketing? The Strategic Implications of Virtual Communities of Consumption,” European Management Journal, 17 (3), 252–64.

36.

Lanz

Andreas

Goldenberg

Jacob

Shapira

Daniel

Stahl

Florian

(2019), “Climb or Jump: Status-Based Seeding in User-Generated Content Networks,” Journal of Marketing Research, 56 (3), 361–78.

37.

Lee

Lung-fei

(2007), “Identification and Estimation of Econometric Models with Group Interactions, Contextual Factors and Fixed Effects,” Journal of Econometrics, 140 (2), 333–74.

38.

Lee

Lung-fei

Liu

Xiaodong

Lin

(2010), “Specification and Estimation of Social Interaction Models with Network Structures,” Econometrics Journal, 13 (2), 145–76.

39.

Lewis

Kevin

Gonzalez

Marco

Kaufman

Jason

(2012), “Social Selection and Peer Influence in an Online Social Network,” Proceedings of the National Academy of Sciences of the United States of America, 109 (1), 68–72.

40.

Lovett

Mitch

Staelin

Richard

(2016), “The Role of Paid and Earned Media in Building Entertainment Brands: Reminding, Informing, and Enhancing Enjoyment,” Marketing Science, 35 (1), 142–57.

41.

Liye

Krishnan

Ramayya

Montgomery

Alan

(2014), “Latent Homophily or Social Influence? An Empirical Analysis of Purchase Within a Social Network,” Management Science, 61 (2), 454–73.

42.

Manchanda

Puneet

Xie

Ying

Youn

Nara

(2008), “The Role of Targeted Communication and Contagion in Product Adoption,” Marketing Science, 27 (6), 961–76.

43.

Manski

Charles

(1993), “Identification of Endogenous Social Effects: The Reflection Problem,” Review of Economic Studies, 60 (3), 531–42.

44.

McPherson

Miller

Smith-Lovin

Lynn

Cook

James

(2001), “Birds of a Feather Flock Together: Homophily in Social Networks,” Annual Review of Sociology, 27, 415–44.

45.

Mele

Angelo

(2017), “A Structural Model of Dense Network Formation,” Econometrica, 85 (3), 825–50.

46.

Meta (2022a), “What Are Recommendations on Facebook?” https://www.facebook.com/help/1257205004624246.

47.

Meta (2022b), “What Are Recommendations on Instagram?” https://help.instagram.com/313829416281232.

48.

Nair

Harikesh

Manchanda

Puneet

Bhatia

Tulikaa

(2010), “Asymmetric Social Interactions in Physician Prescription Behavior: The Role of Opinion Leaders,” Journal of Marketing Research, 47 (5), 883–95.

49.

Nystrom

Nick

Welling

Joel

Blood

Phil

Goh

Eng

(2013), “Blacklight: Coherent Shared Memory for Enabling Science,” in Contemporary High Performance Computing: From Petascale Toward Exascale. Jeffrey S. Vetter, ed. Chapman and Hall/CRC, 421–39.

50.

Perry

Brea

Pescosolido

Bernice

Borgatti

Stephen

(2018), Egocentric Network Analysis: Foundations, Methods, and Models. Cambridge University Press.

51.

Sacerdote

Bruce

(2001), “Peer Effects with Random Assignment: Results for Dartmouth Roommates,” Quarterly Journal of Economics, 116 (2), 681–704.

52.

Seetharaman

P.B.

Ainslie

Andrew

Chintagunta

Pradeep

(1999), “Investigating Household State Dependence Effects Across Categories,” Journal of Marketing Research, 36 (4), 488–500.

53.

Shalizi

Cosma

Thomas

Andrew

(2011), “Homophily and Contagion Are Generically Confounded in Observational Social Network Studies,” Sociological Methods & Research, 40 (2), 211–39.

54.

Shriver

Scott

Nair

Harikesh

Hofstetter

Reto

(2013), “Social Ties and User-Generated Content: Evidence from an Online Social Network,” Management Science, 59 (6), 1425–43.

55.

Shum

Matthew

(2004), “Does Advertising Overcome Brand Loyalty? Evidence from the Breakfast Cereal Market,” Journal of Economics and Management Strategy, 13 (2), 241–72.

56.

Snijders

Tom

Koskinen

Johan

Schweinberger

Michael

(2010), “Maximum Likelihood Estimation for Social Network Dynamics,” Annals of Applied Statistics, 4 (2), 567–88.

57.

Snijders

Tom

Steglich

Christian

Schweinberger

Michael

(2007), “Modeling the Coevolution of Networks and Behavior,” in Longitudinal Models in the Behavioral and Related Sciences, van Montfort

Kees

Oud

Johan

Satorra

Albert

, eds. Routledge, 41–71.

58.

Snijders

Tom

van de Bunt

Gerhard

Steglich

Christian

(2010), “Introduction to Stochastic Actor-Based Models for Network Dynamics,” Social Networks, 32 (1), 44–60.

59.

Statista (2022a), “Average Number of Social Media Accounts per Internet User from 2013 to 2018,” https://www.statista.com/statistics/788084/number-of-social-media-accounts/.

60.

Statista (2022b), “Number of Social Media Users Worldwide from 2018 to 2022, with Forecasts from 2023 to 2027,” https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/.

61.

Statista (2022c), “Social Network Advertising Spending in the United States from 2016 to 2022,” https://www.statista.com/statistics/736971/social-media-ad-spend-usa/.

62.

Statista (2022d), “Worldwide Instagram Follower Growth Rate from January to June 2019, by Profile Size,” https://www.statista.com/statistics/307026/growth-of-instagram-usage-worldwide/.

63.

Steglich

Christian

Snijders

Tom

Pearson

Michael

(2010), “Dynamic Networks and Behavior: Separating Selection from Influence,” Sociological Methodology, 40 (1), 329–93.

64.

Stewart

Craig

Cockerill

Timothy

Foster

Ian

Hancock

David

Merchant

Nirav

Skidmore

Edwin

, et al. (2015), “Jetstream: A Self-Provisioned, Scalable Science and Engineering Cloud Environment,” in XSEDE ’15: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, 29, 1–8, doi.org/10.1145/2792745.2792774.

65.

Sun

Tianshu

Taylor

Sean

(2020), “Displaying Things in Common to Encourage Friendship Formation: A Large Randomized Field Experiment,” Quantitative Marketing and Economics, 18 (3), 237–71.

66.

Toivonen

Riitta

Kovanen

Lauri

Kivelä

Mikko

Onnela

Jukka-Pekka

Saramäki

Jari

Kaski

Kimmo

(2009), “A Comparative Study of Social Network Models: Network Evolution Models and Nodal Attribute Models,” Social Networks, 31 (4), 240–54.

67.

Toubia

Olivier

Stephen

Andrew

(2013), “Intrinsic vs. Image-Related Utility in Social Media: Why Do People Contribute Content to Twitter?” Marketing Science, 32 (3), 368–92.

68.

Towns

John

Cockerill

Timothy

Dahan

Maytal

Foster

Ian

Gaither

Kelly

Grimshaw

Andrew

, et al. (2014), “XSEDE: Accelerating Scientific Discovery,” Computing in Science & Engineering, 16 (5), 62–74.

69.

Trusov

Michael

Bodapati

Anand

Bucklin

Randy

(2010), “Determining Influential Users in Internet Social Networks,” Journal of Marketing Research, 47 (4), 643–58.

70.

Tucker

Catherine

(2008), “Identifying Formal and Informal Influence in Technology Adoption with Network Externalities,” Management Science, 54 (12), 2024–38.

71.

Watson

Goodwin

Johnson

David

(1972), Social Psychology: Issues and Insights. Lippincott.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.28 MB