Abstract
Social networks helps to build relationships where two or more concepts, objects, or people are connected, or in state of being connected which is multidimensional and dynamic in nature. The interactive aspect of information extraction in online social networks instigates from considerations of different parameters which levied to the invention of new metrics. These metrics normally based on their ability to adapt to existing positioning or ranking indicator approaches with intent on activities and relationships among users in modern online social network which evolves with time. Existing work on network topology analysis is mainly focused on acquiring global properties such as interactions on either synthetic network or real world data provided by some authors without involving actual scenario of social network data. This research mainly focus on supervised learning using localize properties of known influential user in terms of links evolving from online social networks data. In addition, capture top K real time influential users from the evolving social network graph of known influential user. To achieve this, we propose two approaches, first an optimal Weight based Evolving Friends Follower Ranking (WEFFR) influence ranking algorithm to assign weights by capturing adaptive degree of relationship and secondly we combine WEFFR algorithm with Page Rank algorithm (WEEFRPR) to measures influence of nodes using reciprocal influence. The experiment results on Twitter network of known influential users shows that proposed approach performs better as compare to well-known existing approaches.
Introduction
Social Network Data (SND) is a hybrid formation of user demographics, link information and the content generated by various users. Extracting information from SNDto make decisions comes from several heterogeneous sources overtime [1]. The designing system applicable for social networks adhere a large data analytic applications such as social search, content mining, image analysis and visualization. In recent past, one of growing areas of research in social networks is managing, modeling and analysis of social influencebased on network structure and has been studied thoroughly in [2–4].
Social theories describes influential people relates to other influential friends that may have great impact on society [5]. These people tend to build more relationships with other influential person with time which resulted into evolution of their network incrementally. Such evolving social networks of known influential user can be used to extract influence of other individual or groups whose dynamics prevails over time [2]. Moreover, in a real-world social network, the influential users cannot be determine without having prior knowledge of influential domain and even after achieving results, human intervention is needed to justify those results. Similarly, users on social network known as nodes may establish or remove relationship, as network evolves, which eventually affects the network structure. In evolving network structure analysis it is enviable to keep whatever result obtain from various mining approaches at time t. For example, in dynamic community detection methods when new nodes arrives rather than computing entire network it’s better to classify that node based on previous results. In addition, while mining these networks models should compute and recognize patterns continuously changing in the evolving network [6].
The contributions of proposed work are highlighted below.
The research aims to propose the evolving initial ranking approach that able to capture the adaptive degree of relationship in real-world social network of the influential user. Develop strategy to calculate iteratively the evolving aggregate weight and propose an optimal Weight based Evolving Friends Follower Ranking(WEFFR) influence ranking algorithm. Combine WEFFR algorithm with Page Rank algorithm [6] (WEEFRPR) for measuring influence of nodes using reciprocal influence which gives optimal influence ranking. Each node assigned influence as weight and proposed recursive algorithm will distribute influence score to each node. The conducted experiments demonstrate the superiority of the proposed as compare to other approaches.
Related work
The perception of influence positioning metrics becomes popular in the field of social sciences and information retrieval [7]. An investigation into network evolution attracts a numerous established literature in empirical social sciences [8]. The notion of research is dedicated to the learning in what manner persons relationships, opinions, moods, and actions are influenced by the authentic, fictional, or implicit existence of another person or their activities [4]. The conventional data mining techniques are not suitable for the evolving graph of social relationship [7].
The influential node ranking algorithm in social network has attracted a lot of attention in the research literature [1–4]. Existing research for finding top influence nodes has different approaches proposed by various authors which includes diversified ranking [9] and influence maximization [10]. Similarly for ranking nodes for a given network, Liu et al. [11] comes with a linear methodology to calculate independent influence.
The well-known node ranking algorithms are Between ness centrality [12, 13], HITS [14] and Page rank. Brin and Page [15, 16] introduced the Page rank for ranking websites. Page rank uses link analysis techniques based on random walk to rank web pages. The state-of art approaches may not be capable to handle real scenario of social network data effectively. A limitation of the current influence ranking models such as Page Rank [16], independent cascade [17] and linear threshold [18] is that they do not consider some unpredictable characteristics of user influence dissemination process in a real-world social network.
Zeynep Zengin Alp [19] proposed Personalized Page Rank, that assimilates network topology and the contents from Twitter. Their goal is to find top k influencers who are powerful or have authority on a particular topic.
Li Yang [20] introduces novel between ness centrality algorithm, to find top opinion leaders and compared their results with independent cascade model based approaches. They showed the improvement of their approach by performing various experiments on various social network data.
In a real-world social network, the influential users cannot be determine without having prior knowledge of influential domain and even after achieving results, human intervention is needed to justify those results. Similarly, users on social network known as nodes may establish or remove relationship, as network evolves, which eventually affects the network structure. For example, in online social network, such as Face book or Twitter, network structure changes are occurred mostly due to growing quantity of common friends between two nodes.
Weng et al. [21] used Page rank to propose a Twitter rank algorithm to compute the influence of users on Twitter. They concentrate on both the subjective similarity and the topological structure. However Centrality measures identify the solitary position of node for a specified network, which ultimately estimates the prominence of node.
Centrality provides outlook of authority and influence, in logical view, central node normally get more attention from other nodes owing to their crucial positions and ascendant most of information or activities going in the network. However, to discover influential nodes is non-trivial job in real networks accustom with ever-changing information, demographics and heterogeneous entities. Benchmarks of influential nodes are different for diverse application. Hence, to determine a globally accepted bounds that endures various applications is highly unlikely. In addition, present techniques and algorithms assumes two essential properties: 1) the graph is beforehand generated 2) the generated graph has the amenable properties crucial to perform analysis, i.e., the network topology is well structured and can be recuperate when needed. In reality, online social network data is unstructured and non-uniformed. In literature very few approaches are found to transform these unstructured data into a suitable graph depiction.
Preliminary and problem statement
A key notion to find top k influential nodes is to explicitly employ the various centralities measures such as degree, closeness eigenvector and page rank [7]. In many applications [1, 8] these measures are competent and regarded as a standard approach. However, this approaches proves to be ineffective in some cases as the nodes with top influence score could be highly grouped as mentioned in [16], so focusing on entirely such groups is not an efficient solution while finding influence on particular network as suggested in state-of-the-art progresses and pioneer works [5, 14]. This research mainly aims to overcome the limitation of existing approach by combining the individual influence and reciprocal influence which comes from the nodes they attached with. In addition, the evolving graph structure possesses severe issue of intrinsic dynamicity due to 1) frequent change in characteristics of data 2) change in data distribution 3) introduction of new class or relationship type or removal of it 4) varying relationship between nodes. Our objective is to convert unstructured evolving relationship present in online social network into suitable graphical form. The preliminaries require to establish our research is describe as follows:
Given a social network N, which is directed network G (V, E), where V is the set of nodes and E is the set of edges. Every node in a network correspond to a user, and assume that there is a outgoing link from node j to node i, then it signifies that i is followed by j, or we can say that, j is the followee of i, or j is a follower o i and vice versa. Suppose a node R which has many followers in social network graph of node X at time period P and, if most of friends (followees) of X do not have many followers as compared to node R, it is most likely that node R has high Influence in an entire network. we define two measurements, first is to quantify the influence level of each node we propose the WEEFR algorithm and using the reciprocated reliance of influence between all users in a social network, we investigate the second approach that gather the influence score of users in an iterative way with Page Rank algorithm. Using the iteration, all nodes influence scores disseminate through the network, thus have an effect on each other. Therefore, the second approach is capable of analyzing the influence score of all user nodes together by considering the whole network.
Assume that we have a social network N that contains K users (nodes) denoted as (S i ) 1≤i≤k and W elationships in terms of links between users that can be symbolize as (L ij ) 1≤i,j≤K, where i and j are indices. We captured relationships in terms of followers and followeeamong all users within T consecutive time periods P t (1 ≤ t ≤ T).
Figure 1(a), demonstrate a dummy snapshot of social network N1 for timestamp P1 with 7 nodes, 8 links and Fig. 1(b) shows the updated graph for timestamp P2 with 9 nodes and 14 links. For every timestamp Pi, consider to indicate the number of links connecting node i and node k. For single directional the weight of relationship is assigned as 1 and 2 for bidirectional. Similarly,

Social network graph of influential user X at time p and p + 1.
Given a social network N with K nodes (S
i
) 1≤i≤K, M links (L
ij
) 1≤i,j≤K, and additional edges
In given social network N, influence ranking scenario is a connected graph in terms of link or path between any nodes. Such network of influential nodes is dynamic which needs multilayer graph structure and contains node which has their own network. In our approach we have exploited a single level graph of influential node for generating influence ranking which can be extended for multilayer in future.
In this section, we proposed two algorithms for influence ranking of various nodes in a social network of influential node by analyzing their evolving relationship between all nodes in a jointly manner. First we proposed the evolving friends follower ranking algorithm based on weights (WEFFR), which takes an evolutionary method to compute the influence of nodes for a given social network. Second, we combined EFFRW with Page Rank algorithm (WEFFRPR) to compute reciprocal influence score of each node.
The initial approach for ranking based on aggregate weight
Figure 1 shows examples of FF relationships in a social network. The weight on the top of link indicates aggregate number of in and out going link. The aggregate number of in and out links per node can be defined as follows.
In this approach, as discussed we examine two characteristic to measure theuser influence in a social network. First, if the aggregate number of incoming and outgoing links of node
Here we observe that
In the initial ranking algorithm, we actually assigned the weight, when two nodes are linked irrespective of incoming link and outgoing link as shown in Equation (1) However, this approach might not be ideal for real world networks analysis. Since, our focus is on influence of top k nodes on given social network using friends/follower network structure, where, it may be incoming link, outgoing link or Reciprocal links. In-degree of node X can be determining by number of incoming links, whereas, out-degree of nodes X is based on outgoing links and reciprocal links means two nodes follow each other. In-degree can give actual influence of node since it is a way to measure the size of the followers of a particular node [7]. This could be evaluation measure of a nodes influence as compared to out-degree. For instance, in micro blogging sites like Twitter node with large in-degree means probable large viewers for their tweets. Therefore, instead of giving equal weight to two nodes only on connection, it’s better to distribute weight based on their in-degree. For example as shown in Fig. 1, user U at time period p - 1 has followed by three nodes same as node V. But weight assign to node V is higher as compare to node U. We propose the following concept to assign the influence between two nodes.
Accordingly, now we can get an efficient aggregate number of links for every node in a given social network of an individual influence node that must be dissimilar from the previous which we define in Equation (1).
In addition, to above equation for node i, we will find the evolving aggregated score for every node, we also calculates
We can obtain evolving connection as follows. First,
The iterative method finishes if stop criterion is fulfilled like
Start
3: for all nodesi in social network N
4: Calculate the
5: Calculate
6: Replace weight of every node using Equation (4)
7: while a stop criterion is not satisfied do
8:t + 1;
9:
10:
Now, at this point above algorithm calculates influence of each node in a network of influential user Xsing aggregated and weight distribution model.
For better understanding, we can calculate the time complexity with action of individual edge and labeled the number of edges as W i , where i denotes the time span of the data. In first iteration, the primary influence score computation of individual node is linearly associated with the number of edges, since every operation is assigned the weight for each edge. In one iteration, we have to compute the score of two joined node and assigned the weight to those nodes with our algorithm, which point out the time complexity of each move is a steady that can be indicate as c. After assigning all edge weights to nodes, it means that one iteration is complete and the time complexity of one iteration is c * Wi. Suppose there is M the number of iterations to converge all weights, then we can say that the time complexity of our algorithm is c * M * Wi which is linear.
Calculating reciprocal influence spread
Till now, we are getting ratio which symbolize the increase rate of user influence in terms of friends and followers over a time period. In social networking where number of friends and followers is a not only criterion for measuring influence. Individual influence power relies on how active your relationships with other influence person in the network, and the notice you get when you engage with such influence people in your network. The influence calculated in equation 1 and 2 ignores the reciprocal influence of linked nodes. To address this problem we used Page Rank algorithm which used for measuring influence of nodes using reciprocal influence. It gives each node a score of according to importance in our case its influence. This algorithm recursively specify measure whereby a node becomes influential if influenced nodes link to it.
In Page Rank approach every nodeoughtto transmit its influence evenly to the associated nodes to whom it is following. For example, as shown in Fig. 1, node R has 2 outgoing links, so it will disseminate half of its influence to nodes S and U. Consider, if a node has m outgoing links, it will transmits on
Let assume every node have even distribution of influence for starting period as
In addition, to above computation it is possible that some nodes might not have any link, normally known as dangling node, in such case, reciprocal influence will not move in an entire network and is blocked at that particular node. Assume, we start propagating influence with this node, then influence score of all nodes becomes 0. To handle such scenario a positive constant D between 0 and 1 normally have been taken as 0.15 called as damping factor. This damping factor helps to jump out of dangling node and start with some other random node. Now alter earlier transition matrix as
So, above transition matrix construct the random walk as follows: normally, a walk must track links from a nodein case of any outgoing links. Sometimes, the walk will leave the present node and selects randomly other node from social network of X and transfer to it. The likelihood of transfer from one node to other totally depends on damping factor D. Therefore every node has
The algorithm to calculated reciprocal influence based on Page Rank is as follows:
Start
1:
3:
4:
5: end
6:
7: dfi ← 0
8:
9: dfi
10: end for
11: for all nodes i in social network N do
12: for allxi in IL[i] do
13: end for
14: end for
15: oin ← nin//update Page Rank
16: ITN ← ITN - 1
17: end while
18: end
Where
OL ← N total count outgoing link
IL ← N total count outgoing link
M ← N fetch # of nodes from N
Here we proposed algorithm WEFFRPR for ranking of influence based on our proposed Evolving Friends Follower Ranking (WEFFR) and the classical Page Rank algorithm. Preliminaries required for the algorithm is already discussed in previous Sections 4.2 and 4.4.
1:
Time period ← [1 … k]
Score ← [0 … n]
2:
3: Sum1 ← 0
Sum2 ← 0
4:
Sum1 + = P r e v_Score1[EFFR]
5: Sum2 + = P r e v_Score2[PR]
Z = Sum1 + Sum2
6:
7: Score [S] ← Z
8:
9:
The main idea is assigning score to the nodes calculated by WEFFR combined with reciprocal influence score at each time interval k. At each, iteration of proposed algorithm, it assigns the influence score for that particular time. We performed experiments on real world social network data which will be discussed in next section.
Experimental results and performance analysis
Experimental data
In this section, experiments are performed and analyzed to calculate the efficiency and accuracy of EEFRPR algorithm. To evaluate performance of proposed algorithm we compare our results with other well-known technique such as Between ness centrality and Page Rank, which has been used by many authors to calculate influence in network data.
The analysis of WEEFRPR algorithm are carried out using Twitter online data which is one of the most used microblog site having 330 million monthly active users and increasing rapidly every day. We fetch data using Twitter restful API’s [22]. To perform experiments, we need to find some known people from diverse field having great influence on masses. In India, individuals from Bollywood, Politics and Cricket are publicizing as The analysis of WEEFRPR algorithm are carried out using Twitter online data which is one of the most used microblog site having 330 million monthly active users and increasing rapidly every day. To perform experiments, we need to find some known people from diverse field having great influence on masses. In India, individuals from Bollywood, Politics and Cricket are publicizing as most influential people. These fields also become the influencing factors which have great impact on citizens from diverse cultures and customs. To take advantage of this, we fetch the data using Twitter API on three of 100 top most followed users on Twitter in India i.e. Mr. Amitabh Bachachan (Bollywood Actor) having Rank 2 with 27,893,052 followers and 1154 following, Mr.Sachin Tendulkar (Cricketer) Rank 11 with 17240714 followers and 78 following and Mr.Arun Jaitley (Politician) Rank 33 with 9997601 followers and 161 following based on stats provided by Twitter counter website [23].
In real social scenario also a lot of evolution in network of these users (e.g., followers and friends) going on every day within the social networking sites likes Twitter. Each user of individual social network is a node and each relationship either friends or follower creates the links between two users. We have 3 separate social networks for these individual that includes all information of each user and their friends. We access Twitter friends-follower influence in stream format with 20–30 friends at regular intervals. This influence is used to manage social networks dynamically in terms of relationships and influence. For example, in social network of Mr.Amitabh Bachchan if NodeB followsnode A or vice versa, it will create a link between them having weight age of 1 and if they follow each other, then weight age is 2 as per Equation (1).
Experimental platform
All algorithms were implemented with python and all experiments were conducted on a Windows 7 machine with i3-CPU and 8 GB Ram.
Algorithm evaluation
We have set the window size for analysis ased on the size of the data received in terms of number of friends denoted as TFPer. In our simulation, we set TFPer to four values as 0–25, 26–50, 51–75 and 76–100 percent. TFPer = 25% indicates the analysis will take place after receiving each 25% of total friends from Twitter network.
In the experiment, result of proposed algorithm i.e WEFFR and WEFFRPR are compared with different famous approaches like Betweeness centrality and Page Rank.
We firstly converted the network of influencer user network into Geospatial Markup Language (GML). An detailed example of one node in Mr. Amitabh bachchan network is shown below:
node [
id 882
user_id “2222673457”
file “SrBachchan.dat”
label “rsprasad”
image “/home/hadoop/Desktop/
BHARATTWITTER/img/2222673457.jpg”
type “friends”
statuses 10366
friends 411
followers 1886655
listed 1318
ffr 4590.4015
lfr 0.0070
shape “triangle-up”
]
In Fig. 2(a, b, c), we showed the simulation results on the Amitabh Bacchan (AB) having 1158, Sachin Tendulkar(SRT) having 78 and Mr. Arun Jaitley (AJ) having 161 friends network data respectively. From all three datasets, we can see that with the influential nodes in a given network changes with respect to time. The results of different influential calculating algorithms vary distinctly as shown in Fig. 2. For better understanding of evolutionary social network data, we use visual interpretation of each social network, since visualization of network data becomes particularly interesting when it is presented in terms of specific node which makes its network evolution easy to depict.

Comparison of various algorithms for different Influential users.
Specifically, for each algorithm, we can see the trend of changing influential of various nodes. However, for diverse algorithms, trends are different for different degree. As we are dealing with network of influential persons and social theories describes influential people relates to other influential friends, so most of the nodes should have somewhat similar influence score if not the same. In Fig. 2 a, b, c all other existing approaches gave variance score to all nodes which is not justifiable known influential usernetwork. However our proposed WEFFPR gave almost constant scores to all nodes so that no nodes wiil get neglected because of some groups with highers number of friends or the node with higher influence connecting to other nodes but have no connection at all in the network of influential user.
The accuracy of proposed WEFFR and WEFFRPR algorithms are compared with Betweeness Centrality (BC) and Page ank(PR) as depicted in 2 for Mr. Amitabh Bachachan data, in 3 for Mr. Sachin Tendulkar data and Mr. Arun Jaitley data.
The X-axis denotes the percentage of friends arrives in terms of range 0–25, 26–50, 51–75, 76–100 and Y-axis denotes the accuracy of various algorithms in terms of percentage.
In Fig. 3, at first time stamp the Page Rank algorithm have higher accuracy as compare to proposed WEFFRPR algorithm. However accuracy gets increased in third and four timestamp. In Fig. 4. At first time stamp our proposed WEFFR algorithm fails to detect any influential node, hence, the accuracy of WEFFR algorithms is 0 and all other algorithms shares equal accuracy. Proposed WEFFRPR surpasses all other algorithms in next three timestamp. In Fig. 5, accuracy of each algorithms fluctuates at different timestamp and for last proposed WEFFRPR and Pagerank algorithm both have accuracyof 70%.

Comparison of various algorithms for famous Bollywood celebrity.

Comparison of various algorithms for famous cricket player.

Comparison of various algorithms for renowned politician data.
In real large-scale social networks with millions of followers of influential user, calculating their influence is not realistic. It is more practical to calculate friends influence only, which makes the size of the aimed network smaller compare to the entire network and as a result our constraint factor would have rational sense. Twitter provide users with evident mechanism to connect and share their activities between each other which creates the network on Twitter as a directed graph. Existing research on Twitter data includes exploration of structures and topology by means of various techniques such as Centrality measures [7], Link prediction [2], and low reciprocity.
Twitter REST APIs consent to fetch publicly available data to create experimental dataset. This API’s detect Twitter applications and registered users by OAuth [36] and sends data which is in JSON format.
Despite the fact, the relationship between two node contains content information (e.g., tweets, mentions images and hash-tag) coupled with them, but we skips such information, as our motivation is to rank top k influential node from topological perspective. Similarly, the concept of evaluation measure is formulated on some assumption that might be biased. Since no alternative measures are available as per best of our knowledge, which can be used to test the correctness. Also various experiments on different network of users as shown in Fig. 2 depicts that our approach gives constant scores as compared to variance score gave by all existing well known approaches. In addition accuracy achieved using our approach is quiet better as compare to other approaches such as page Rank and Betweeness Centrality. We compare the accuracy results obtained using existing and our approached with the number of nodes who are actually influential person in the list given by Twitter counter.com website as 100 most influential personalities on Twitter.
As we focused on topological network of known influenced user the complexity to find other influence user irrespective of domain or in particular domain reduces drastically. The accuracy that we are achieving can be improve by adding contents of users such as tweets, mentions like etc., while analyzing influence in a given social network. However here are some challenges which need to be cope up with by applying various approaches which explain below:
Inter-process communication: Algorithms which runs in iterative manner requires large inter-process communication. Parallel processing approaches such as Deep learning approaches on GPU’s can be a effective solution to reduce inter-process communication.
Conclusion
Networks are extremely popular for representing many different complex systems including social interaction. Discovering influential nodes in social networks has wide range of applications in real world like Online advertisement, Political mobilization, Viral marketing, etc. In incrementally evolving online social network, influential nodes form relationships with added nodes or vice-versa. Such evolving social networks signify influence of each node or groups whose dynamics alter over time. This research s aimed to capture influential nodes using proposed WEFFRPR algorithm. Experimental results demonstrates the superiority of proposes approach when compared with existing popular techniques such as Betweenness Centrality and Page rank. In future more complex networks, distributed algorithms and better evaluation measures can be studied to facilitate better ranking for influential nodes. In addition, we will focus on Holistic influence mining which is based on hybrid approach that considers both position of user in the network as well as the content generated by him which gives better idea about influential users in the network.
