Abstract
Information exchange among people via social network service has produced a mass of communication data, which have been widely used in research on user interaction and information propagation on virtual social networks. The focus of this paper is to investigate the multiplex power-law distributions and retweeting patterns on Twitter platform. To achieve this goal, we analyze the multiplex power-law distributions from relationship network based on unidirectional and bidirectional follow connections and interaction network based on user and tweet entities. Further, we explain the observed features on each network. Additionally, we also explore the emergent pattern of tweet retweeting path and analyze their generative mechanisms. The observed results show that mining Twitter data from various angles could obtain more interesting discoveries in social networks.
Introduction
Twitter has gradually become one of the world’s largest and most popular social network platforms since it was published in 2006. It provides the user with a microblogging service, by which users can release and share brief information with a limit of 140 characters. This information is also known as “Tweet”. In fact, Twitter is not only the main source of social events, but also the place of spreading information.
User relationship on Twitter are derived from its “follower-followee” mechanism, the core of this mechanism is building a follow relationship between two users. You can follow other users at any time, which means to become the “follower” of others, and meanwhile they are your “followee”. User can also follow each other, as both “follower” and “followee”. However, the unidirectional relationship and bidirectional relationship between two users represent two different dimensions of social connections. It is because based on this special relation, Twitter truly realize the “retweeting” function–tweet spread from your followees to yourself, and further to your followers and so on. At last, the diffusion is achieved. Therefore, the multi-dimensional relationships and interactive retweeting behaviors are the basic reasons for information diffusion on Twitter.
In this paper, we collect the actual data on Twitter and mine the characteristics of user relations and tweet retweeting at a coarse-grained level. The data set we adopted is available from [1], which has extracted the tweets during March and April of 2012. The dataset finally includes 78906 users and 32992 tweets after our pre-processing. We also reconstruct a following relation network and an interaction network from the actual data. This dataset has been found to be very detailed and complete hence is a good test bed for studying the power-law distributions on Twitter. achieved. Therefore, the multi-dimensional relationships and interactive retweeting behaviors are the basic reasons for information diffusion onTwitter.
Related work
There have been lots of researches concerning the social network on Twitter. We roughly divide these contributions into two categories:
The first category is from the characteristics of relationship network. Among these contributions, Kwak et al. focus on the topology of the follower-followee network, and report some complex properties in this network, such as degree distribution with power-law behavior, small mean distance between nodes and modular structure [2]. Another contribution is found in Huberman’s work. Huberman and his team members reveal a sparser and simpler network of actual friends from a very dense one made up of followers and followees, it has been observed that individuals do not actively interact with all of the declared contacts, but only with a small fraction of actual friends [3].
The second category is emphasis on the characteristics of interaction network. For example, Xia et al. crawl Twitter data and mine the information spreading based on users’ retweeting behavior on it. They find the power-law distribution of the retweet-width and retweet-depth, and further study correlation between these two features [4]. And, Lerman et al. measure some macroscopic properties of the principal cascades, such as size and diameter, and plot their scale-free distribution during the period of information propagating on Twitter network. They find a cascade will branch broadly and increase in size when a highly connected hub joins this cascade. However, the hub node won’t affect the depth of the cascade as much as its spread [5].
Different from the above works, in this paper, we mine the power-law in both relationships and interaction network, and further study how them lead to the emergency of multiple patterns of tweet retweeting path, which is the exploration of information propagation on social network more in depth. In fact, on the one side, power-law distribution is one of the important features of social networks, research the power-law characteristics based on different entities on different dimensions could provide important criterion to measure the effectiveness of the dynamic model of human behavior. On the other hand, tracking and analyzing the tweet retweeting on Twitter is conducive to our understanding in the information diffusion process, thus more efficient use of social network open platform, such promote the advertising marketing through different dimensions connections.
Power-law distributions
Social network is a kind of typical scale-free network [6]. While the core characteristic of scale-free networks is that, most of the nodes in the network only connect to a small number of other nodes, and there also exist few nodes connecting to a huge number of nodes. Statistically, this phenomenon is characterized as power-law distribution.
On the one hand, the social network mainly includes relationship network and interaction network. The relationship network has multiple dimen-sional connections and the interaction networkconsists of diversified entities.
Next, we study the problem how the power-law distribution reflected in social networks and what are the features of these distributions?
Power-law distributions in relationship network
By processing the actual Twitter data, we construct the information propagation network where each node represents a Twitter user and each edge denotes the relationship between users. For example, suppose we have two users as U1 and U2, if user U1 follows user U2, then we draw a directed edge pointing to node N2 from node N1.
In this paper, we define two dimensions of the relationship network on Twitter platform: the unidirectional follow and the bidirectional follow. Generally, in the bidirectional network, two users follow each other. While in the unidirectional network, the source user follows the end user, but not vice versa.
Below we plot the distributions of the number of users’ followers and followees in two types of relationship network on Twitter.
From the above distribution curves we could find that, no matter the number of all followers (see Fig. 1) and all followees (see Fig. 2), or the number of bidirectional followers/followees (see Fig. 3), or the number of unidirectional followers (see Fig. 4) and unidirectional followees (see Fig. 5), all of them approximate to power-law distributions. By nonlinear regression analysis, we write these distribution functions as Y = 1.62e4*X–9.26 (R2:0.8294), Y = 1.67e4*X–9.35 (R2:0.8377), Y = 1.96e4*X–9.70 (R2:0.8526), Y = 1.73e4*X–1.57 (R2:0.9881) and Y = 1.12e4*X–1.82 (R2:0.9983), respectively. The power-law distribution phenomenon indicates that most of the Twitter users have a small amount of followers or followees, while few users have a large number of followers or followees. The results also show that power-law distributions not only exist in but also behaviors in multiple dimensions of the relationship network on Twitter platform.
Further, we find that the distributions of number of all followers, all followees, and bidirectional followers/followees show similar distribution trends. By contrast, the distributions of the number of unidirectional followers and unidirectional followees produce quite different results with the previous three ones.
In order to analyze and explain these observed results, for each user, we further calculate the ratio of his/her bidirectional followers to total followers and his/her bidirectional followees to total followees, respectively. In the following, Figs. 6 and 7 present the downward accumulative distributions of users with different ratios.
In Fig. 6, the abscissa berRatioValue (the value of ratio of bidirectional followers) can be seen as an attribute value of the user, by counting the number of his/her bidirectional followers divided by the number of all followers; and the ordinate P(berRatioValue) is the proportion of those users whose berRatioValue are not lower than the corresponding abscissa value of all users. In Fig.7, the abscissa beeRatioValue (the value of ratio of bidirectional followees) is also an attribute value of the user, by calculating the number of one’s bidirectional followees divided by the number of all followees, and the ordinate P(beeRatioValue) is the proportion of those users whose beeRatioValue are not lower than the corresponding abscissa value of all users. When the abscissa value is greater than 0.9, the corresponding values of ordinate begin to decrease significantly in both figures. The results indicate that in this social network constructed by the actual Twitter data, most users have their bidirectional followers and bidirectional followees over 90% occupancy of the total amount. Therefore, for most users, the number of bidirectional followers or bidirectional followees, is approximately equal to the number of all followers or all followees, leading to the distributions in Figs. 1–3 extremely similar.
On the basis of the results in Figs. 3–5, we further divide the number of bidirectional followers into 4 value intervals: [0,101 ], [101,102 ], [102,103] and [103,104]; and the number of unidirectional followers into 2 value intervals: [0,101] and [101,102]; as well as the number of unidirectional followees into 3 value intervals: [0,101], [101,102] and [102,103]. We therefore obtain 24 kinds of users (C41×C21×C31 = 24), as shown in Table 1. And, Fig. 8 lists the number of each type of users.
As shown in Fig. 8, the number of type 1, 7, 13 of users are far greater than other types of users. And the number of unidirectional followers and unidirectional followees of these three types of users are all in the interval [0,101], which means that most users may have few unidirectional followers and unidirectional followees. Meanwhile, the number of bidirectional followers of most users is in the intervals as [101,102] and [102,103], from type 7 to type 18. By comparison, we find that the count distributions in unidirectional follow network and bidirectional follow network are different, and the former shows more uneven distribution.
From the above analysis, we get to know that the bidirectional fellowship are more than the unidirectional fellowship in Twitter social network, which is accorded with the real situation that strong ties are more than weak ties on offline social network. According to the definition of users and tweets, in the bidirectional follow network, there have interactive feedback and the uses relationships are close, while in the unidirectional follow network, there have only one-way information transmission and relationships are relative not so close. Therefore, we draw the conclusion that the bidirectional follow is one manifestation of strong ties and unidirectional follow is one manifestation of weak ties on Twitter social relationships.
Power-law distributions in interaction network
In the paper, we also define two types of entities on Twitter platform: the user and the tweet. When describing the communication activities between different entities, we construct the interaction network where each node represents a user and each directed connection denotes a piece of tweet and its retweeting direction. More concretely, the in-degree of each node describes the number of other users who have retweeted a user’s tweets, and the out-degree of each node refers to the number of users who have retweeted other users’ tweets. And, each tweet is described by a set of attributes such as the retweeted number, the number of users participating in information spread (include the tweet releaser, but each user is only calculated once at most), and the spreading time from its first releasing to its last retweeting. Below we plot the distribution of the user and tweet in the interaction network based on the actual records of retweeting details.
As shown in the above figures, the in-degree and out-degree of nodes (see Figs. 9 and 10) follow power-law distributions, meaning that most of users seldom retweet other user’s tweets but few users retweet huge tweets, and most tweets are rarely retweeted but few tweets are retweeted with large quantities. Meanwhile, the distribution of retweeted number (see Fig. 11) and the distribution of participants number (see Fig. 12) of tweet, and a temporal characteristics namely distribution of spread lifetime (see Fig. 13), all follow power-law distributions. These distribution functions are represented as Y = 1.64e4*X–1 .14 (R2:0.9586); Y = 1.50e4*X–1.25 (R2:0.9775); Y = 2.16e4*X–2 .29 (R2:0.9996); Y = 7.08e5*X–5.00 (R2:0.9848) and Y = 1.45e2*X-0.50 (R2:0.5849) respectively. In fact, these three distribution features are also important indexes of measuring hot-degree of a tweet. These power-law distributions show that most of the tweets can only be spread for a short period in minority while few tweets could maintain their diffusivity until become hotspots. Furthermore, it indicates that the power-law distribution not only covers a variety of interactive entities, but also is reflected in both spatial structure and temporal structure of social network on Twitter platform. An interesting phenomenon is that, when the lifetime is less than 100, the result follows a power-law distribution approximately represented by the function Y = 2.48e2*X–0.76 (R2:0.8515), and tends to be stochastic shock with more than 100 (0≤Y≤25). In addition, the power-law distributions of user’s out-degree and the retweeted number of tweets as well as the number of participants appear exponential truncation. Such phenomena are also captured and explained in [7]. It gives a possible reason about the generation of exponential truncation: information filtering for networks growing under conditions of preferential attachment means that part of the nodes will be filtered out, which is not in the set of connection endpoint. On this basis, we deduce that, user selects to retweet whose and which tweet are mainly the preferential selection based on user’s accumulation out-degree and tweet’s accumulation hot-degree, but the users would not follow all other users. At the same time, the user filtering leads to tweet filtering because user can only scan his followees’ tweets and they may ignore some early-propagated tweets by his followees, thus form these power-law distributions with exponential truncation.
Retweeting patterns
In the relationship network, user connects each other via bidirectional follow or unidirectional follow. In the interaction network, user connects each other via propagated tweet— a link generated from the retweeted user pointing to the retweet releaser, thus forms a retweeting network of each tweet. Meanwhile, due to there are more than one releaser, the interaction network might be a multi-connection graph.
In the following, we draw all the sub-graphs which nodes number greater than 10 on the retweeting network of the hottest tweet owing most participants, and then select four representative sub-graphs from them. We find that these sub-graphs show different retweeting paths. Further, the following figures summaries and analyzes four distinguished patterns of retweeting path on the real Twitter data.
Therefore, the four main patterns of retweeting path are: Center-retweeting pattern (see Fig. 14), Key-retweeting pattern (see Fig. 15), Chain-retweeting pattern (see Fig. 16) and Dandelion-retweeting pattern (see Fig. 17). Among these patterns, the “Center-retweeting pattern” fully embodies the P-to-MP broadcast. In this case, retweeting behaviors are mainly concentrated within the releaser-centered circle. In the “Key-retweeting pattern”, the retweeting amount from the releaser is not so big, but then there have several key users who promote the spreading of information step by step. And, the “Chain-retweeting pattern” is a kind of niche shape, where tweets propagate along certain directions constantly, although the diffusivity is not very strong, but often maintains relative long paths. Finally, the “Dandelion-retweeting pattern” is a complex propagation pattern combined “Center -retweeting pattern” and “Key-retweeting pattern”. The releaser itself has certain propagation force, and there are also some key users (the key role is weaker than that in “Key-retweeting pattern”) to help the tweet spreading in the second or third round retweeting, at last the information tends to reach large retweeting amounts. In fact, these four patterns of retweeting path are also detected as typical information propagation patterns on other major social network platforms, such as SINA weibo [8].
Further, in the above figures, the largest size node refers to the tweet releaser in the corresponding sub-graphs, blue-colored edges mean retweeting behaviors through unidirectional follow and red-colored edges denote retweeting actions through bidirectional follow. The observed results in Fig. 14 show that the unidirectional follow plays an important role in the diffusion at the current depth (for example, a tweet spreads from user U1 to U2, U3, etc.). In contrast, the bidirectional follow could promote the tweet spread from the current depth to the next depth (for example, the tweet spreads from user U1 to U2, and then from U2 to U3,, etc.). Additionally, in the hot tweet propagation process, most of the links are carried out by unidirectional follow. And the links generated by bidirectional follow always are reciprocal, means that the source user retweets the tweet from the end user again. In fact, bidirectional follow has high homology and always propagate the same tweet to one user, while unidirectional follow might be heterogeneous and prefer to propagate a new tweet to one user, which is more likely to lead to hot tweet diffusion. This is also verified our conclusion that bidirectional follows are strong ties and unidirectional follows are weak ties.
Conclusions and future works
In this paper, we investigate the power-law distributions and retweeting patterns on Twitter platform, in order to find underlying mechanisms for information diffusion on social networks. To achieve this goal, we study this issue from both the relationship network based on the unidirectional and bidirectional follow relationships, and the interaction network based on user and tweet entities. By mining the actual Twitter data, we find multiplex power-law distributions in different levels of Twitter social network. Especially, the power-law distributions of user’s out-degree, the retweeted number of tweets and the number of participants all appear exponential truncation. We further explore the emergent patterns of tweet retweeting path and analyze their generative mechanisms. The observed results show that mining Twitter data from multi perspectives could obtain more interesting discoveries in social network. Our next works is to build an agent-based model for simulating the user’ retweeting behavior and tweet propagation process in Twitter based on the above mined results.
