Context propagation based influence maximization model for dynamic link prediction

Abstract

Influence maximization (IM) in dynamic social networks is an optimization problem to analyze the changes in social networks for different periods. However, the existing IM methods ignore the context propagation of interaction behaviors among users. Hence, context-based IM in multiplex networks is proposed here. Initially, multiplex networks along with their contextual data are taken as input. Community detection is performed for the network using the Wilcoxon Hypothesized K-Means (WH-KMA) algorithm. From the detected communities, the homogeneous network is used for extracting network topological features, and the heterogeneous networks are used for influence path analysis based on which the node connections are weighted. Then, the influence-path-based features along with contextual features are extracted. These extracted features are given for the link prediction model using the Parametric Probability Theory-based Long Short-Term Memory (PPT-LSTM) model. Finally, from the network graph, the most influencing nodes are identified using the Linear Scaling based Clique (LS-Clique) detection algorithm. The experimental outcomes reveal that the proposed model achieves an enhanced performance.

Keywords

IM social influence analysis multiplex networks Wilcoxon Hypothesized community detection linear scaling based influencing nodes identification parametric probability theory-based link prediction

1. Introduction

In today’s world, the daily lives of individuals have undergone significant changes due to the rapid evolution of online social networks. These platforms facilitate a range of activities, including communication, idea-sharing, friendship-building, and news consumption among the population [1, 2]. A significant degree of resemblance between friends is seen in social networks. Similar behaviour, interests, and hobbies show how people are more similar. Influence is the process of affecting another person in such a way that the affected person resembles like the influential person. In other words, influence tends to bring friends closer together. Influencers are the users who have positive or negative impact on another user’s opinion. But in large social network its challenging task to determine, “which nodes are influential nodes?”. Influence Analysis is an emerging field of study in online social networks (OSN) research which evaluates user connections to discover top-k influential nodes in the social network. Influence analysis on social data is performed to locate and assess influential nodes.

User interactions within OSN plays a crucial role in the swift and widespread dissemination of specific information, making them valuable tools for companies engaged in product marketing [3]. Social media interaction represents communication network where every person is node and communication is link between nodes. The communication link establishes between nodes if they do actions like, follow, share etc. In social media interaction in spreading messages across social networks every node not carry equal influence [4, 5]. Consequently, users who exhibit heightened activity, influence, and significance due to their behavior or social connections become pivotal in facilitating the rapid spread of messages [6]. The challenge lies in identifying such influential user groups or seed nodes to maximize information propagation is defined as the “Influence Maximization problem” in social networks [7]. In social influence analysis, link prediction predicts a user’s social behavior and influence propagation. The links between nodes may vary due to changes in the actions. So when researchers study social network evolution, the temporal changes in the association, and the effect of different associations need to be addressed. Network dynamics and temporal variations are key obstacle in maximizing influence [8]. This specific problem to analyze existing network snapshots to predict future links is termed as the dynamic link prediction problem. This prediction problem can be formulated as, consider the network as graph NG (V, E) where V is the set of network nodes and E is the set of network links to other nodes in the network. If a network snapshot is taken at time t₀ with a set of existing links E, the dynamic link prediction task is to discover the new links in the network at time t₁. The process of opinion formation is often portrayed as a dynamic interaction, where individual’s opinions evolve dynamically based on their interactions with peers [9].

In social influence analysis, the data is analyzed to identify network nodes which can maximize information spread. The link prediction techniques is to predict undisclosed links and anticipate future connections based on existing connections. Forecasting new connections relies on various factors such as observable links between network nodes, node attributes, network structure, topology, and nodal attributes [10].

In existing influence maximization (IM) methods, static link prediction takes the current network snapshot of a specific time for analysis, and discovers the new possible links. Here network evolution is not considered during link prediction [11, 12]. Consequently, the use of static social networks for studying information propagation in dynamic social networks tends to yield suboptimal seed identification. But the network keeps changing which can affect the association of links [13]. In score based approach, the similarity score of future link prediction is computed by applying similarity scores of node pairs. The distance of each node is computed with respect to the closest centroid. The learning-based approach uses the structure, and topological features to predict the links. The topological features and similarity score consider the static homogeneous snapshot of the network which lacks a temporal aspect. Existing approaches have used a homogeneous network where the network consists of the same type of node and the same type of association. However, users may be associated with different types of network nodes with different types of associations.

While some studies have addressed the dynamic nature of networks, none have specifically focused on the contextual features of multiplex networks for effective seed generation [14]. Existing IM methods ignore the context propagation of interaction behaviors among users. And it varies according to the user’s interest so dynamic link prediction is the recent trend in research.

2. Literature survey

Li et al. selected the key influencers using an agent-based evolutionary approach. The model identified an influencer set using the adaptive solution optimizer. Proposed system outperformed existing algorithms. However, convergence speed of the influence spread model was lower under the changing environment [15]. Zhang et al. introduced an IM model based on prediction and replacement. The model predicted the upcoming network snapshot and adopted a replacement algorithm to select seed nodes. Experimental results were very promising. However, the model was limited to the issues related to multilayer networks [16]. Liu et al. offered attributed IM by applying the features of the user’s group and their emotions. The seed candidate sets were located using the influencer user discovery algorithm. Proposed system results demonstrated the efficiency of the algorithm in influence maximization. The ignorance of context-based influence diffusion degraded the model performance for effective seed generation [17].

Wang et al. identified the influence maximizing node sets by the moth-flame optimization method. Experimental results indicated the approach was effective but still, the removal of weakly connected nodes resulted in suboptimal seed generation [18]. Zhang et al. presented an IM method based on community detection in networks. To find the set of influential nodes greedy algorithm with the sub-modular property-based approach was used. Finally, experimental results demonstrated effectiveness of the proposed system in influence maximization. However, proposed system lacks the consideration of user behavior and social tie factors [19].

Wang et al. analyzed the node coverage gains to present an IM approach. The node coverage gain measured on the overlapping communities was used for seed point selection. The experiments demonstrated that the approach achieved competitive influence spread. However, the model required more running time in topological field construction [20]. Olivares et al. developed an IM model through the Linear Threshold (LT) model. The optimization model was used for seed selection. Results suggest that the solution properly solved harder instances. The swarm intelligence method was not adapted to the binary domain, which increased the execution time of the model [21].

Noemi Gasko et al. proposed Shapley Influence Maximization Extremal Optimization approach where Independent cascade (IC) model is used to select influential nodes in set of cascade. The effectiveness of proposed algorithm is shown with the comparative analysis. The proposed work extention is discussed with the hybrid approach of algorithms with heuristic information [22]. Saeed Nasehi Moghaddam et al. provided novel approach to identify the influence node which can be solution of CELF algorithm by incorporating parameter tuning strategy for diffusion models. The results are compared with 16 genetic algorithms and provided optimal solution. The dynamic network is not captured by proposed system [23]. Inder Khatri et al. applied discretization of the nature-inspired Harris’ Hawks Optimization meta-heuristic algorithm for community detection and identified the set of seed nodes. The work is analysed with different social networks. The community detection improved the result so in this paper proposed work included community detection for identifying seed nodes [24].

Venkatakrishna et al. devised K $++$ Shell decomposition algorithm for identifying influential nodes in multilayer network. The experimental result and comparative analysis shows that proposed algorithm outperforms the state of art algorithms. So in this paper proposed work included the multilayer network for experimental analysis [25]. Peikun Ni et al. worked with implementation of Metaverse for adding advantage of real and virtual network to explore the influence maximization with different aspect. Experimental work proved the efficiency of proposed algorithm. The dynamic aspect was the further study direction discussed as future scope [26]. Farzaneh Kazemzadeh et al. proposed the Influence Maximization Based on Community structure (IMBC) algorithm to address the issues of identification of seed nodes with less computing time. The algorithm used pruning technique for selection of optimal seed set in community. The proposed work has good performance with respect to time and seed node selection [27]. Ziwei Liang et al. proposed Reverse Reachable set-based Greedy (RRG) algorithm to find influence in competitive online social networks with pruning approach which shows significant performance with less time complexity.

The scope of the work can be extented to locate influence in multi-entity competitive online social networks [28]. Guoyao Rao et al. implemented $K$ -Grouping Joining Influence Maximization diffusion grouping model to identify possible groups with similar buying behaviour based on coupons. The context of product advertisement has significant impact on buying [29]. Chen Dong et al. provided three-stage iterative framework for maximizing influence. The adaptive search strategy is applied to reduce overlapping of influence in communities and it improves the seed selection in network. So proposed work focused on minimizing overlapping communities [30]. Venkatakrishna et al. worked with multilayer networks and implemented community-based influence maximization (CBIM) model for identification of k seed nodes. The comparison of proposed algorithm shows improvement in influence spread. So in proposed work multilayer network is used for experimental work [31]. Shuxin Yang et al. designed activation probability-aware (AP) framework for influence maximization in dynamic social network. The framework captures the temporal changes in links. The proposed work motivated us to use probabilistic approach for dynamic link prediction [32]. Yuxin Ye et al. discussed the IM techniques, diffusion model study and challenges and provided summary of context-aware diffusion models. This study was useful to select context – aware approach to maximize influence [33].

2.1 Problem statement

Various methods developed in the literature have the following shortcomings,

•
Existing work lacks the incorporation of the contextual features along with a multiplex network for effective seed generation.
•
Existing work lacks in the consideration of the dynamic nature of the network so it’s difficult to identify the optimum solution in dynamic networks.
•
Link prediction between two nodes is complicated due to the presence of different types of nodes and links in a multiplex network.
•
Discovering people with similar interests according to their structural properties is an important issue in dynamic networks.

The proposed model has the following contributions as,

•
For seed generation, contextual features-based IM in a multiplex network is introduced.
•
To find the optimal seed set for IM, the dynamic nature of the network is considered.
•
To make link prediction easier in a multiplex network, the influence path analysis-based link prediction is used.
•
To discover people with similar interests, community detection is performed in different ways with the consideration of social media data.

In this paper Section 1 provides introduction of influence maximization and enlightens the motivation of research work. Section 2 deals with the literature survey done in the research trends and work done in the area of influence maximization. Section 3 gives details of proposed system implementation. Section 4 provides detailed insights into the outcomes of the proposed methodology as applied in experimental work. Section 5 concludes the study and outlines future research plan.
3. Proposed IM model

This paper presents a context-based framework to find influence spreader nodes. The context-based IM model in multiplex networks is shown in Fig. 1. The different methods used for various steps in the proposed system are shown in the Fig. 1. Initially, multiplex networks along with their contextual data are taken as input. Community detection is performed for the network using the WH-KMA algorithm. From the detected communities, the homogeneous network is used for extracting network topological features, and the heterogeneous networks are used for influence path analysis based on which the node connections are weighted. Then, the influence-path-based features along with contextual features are extracted. These extracted features were given for the link prediction model using the PPT-LSTM model. Finally, from the network graph, the most influencing nodes i.e seed nodes are identified using the LS-Clique detection algorithm. The detail description of each block presented in the Fig. 1 is given in the following subsections.

Figure 1.

System architecture of the proposed methodology.

3.1 Data model

Influence maximization framework in single-layer networks has limitations, as users are associated with various multiple social networks. The proposed data model incorporates the multiplex social network and the network data as input where each node is treated as an individual user and the connection between them is represented as an edge to form the network as a multigraph. The multiplex network is used to estimate the comprehensive influence ability of influential nodes in different network layers. Interlayer edges in multiplex networks can only link nodes that in various layers represent the same actor. Thus, sets of interactions between the same (or a similar set) of entities (followers, followees, etc.) are generally represented by multiplex networks. The use of multiplex networks is beneficial in terms of incorporating heterogeneous influence propagation, combining structural and behavioral information, adapting different strategies, and also modeling different types of relationships. The IM in a multiplex network is expected to have good spreading ability in each layer of the network.

3.1.1 Initial network graph

Initial network graph shows how the network was structured. The initial network graph is the base network architecture that is used for future analysis or algorithms when it comes to impact maximization or any other activity involving networks. The initial network graph for n layers and m network graph is is modelled as $\Im_{M}({\Re,\aleph,U})$ using Eq. (1)

$\displaystyle\Im_{M}({\Re,\aleph,U})=\left\{{\begin{array}[]{ll}\Re_{n}&n=1,2,% \ldots,N\\ \Im_{\Re(m)}&m\subset\Re_{n}\\ \aleph_{k},\aleph_{I}&k\subseteq({u1,u2}),\aleph_{I}\subseteq({U,\Re_{n}})\\ U_{l}&l=u1,u2,\ldots uL\\ \end{array}}\right.$ (1)

Where, $({\Re_{n}})$ is layers of the network $n$ , $({\Im_{\Re(m)}})$ is number of network graphs in each layer $m$ , $({\aleph_{k},\aleph_{I}})\in\aleph$ is the $(k)$ number of respective edges between the nodes $({u1,u2})\in U$ and the $(I)$ number of intra-layer edges between the identical users in different layers $({U\in\Re_{n}})$ , and $U_{l}$ is the set of $(L)$ nodes.

3.1.2 Network data

In order to achieve context-aware IM, social media network data $nd(i)$ is considered and set of cluster centroids are specified using by Eq. (2) under the fact that a user with their interest in the network has more influence on those friends who have similar interests. In the context of network data, influence maximization is selection of collection of nodes whose activation or adoption of a behavior, product, or concept can result in the most amount of influence dispersed throughout the network. Network data is critical to impact maximizing because it provides the structure for influence propagation models and methods.

$\displaystyle D_{nd(i)}=\{{D_{nd(1)},D_{nd(2)},D_{nd(3)},\ldots,D_{nd(K)}}\}$ (2)

Where, $D_{nd(i)}$ is cluster centroids consists of the $K$ number of social media network data $nd(i)$ .

3.2 Community detection

Community detection in social network analysis discover categories or communities of nodes within a network which are closely connected. Community identification techniques reveal the organization of nodes in complex networks and provide insights regarding node interactions and relationships. After modelling the network structure, community detection is performed to find nodes with common interests using the Wilcoxon Hypothesized K-Means (WH-KMA) algorithm. The K-means algorithm is selected for its faster computation on large data. However, due to placement of cluster centroid in a spherical manner leads to inaccurate clustering results. It merges different underlying clusters into one cluster and gives misleading clustering of the data. Hence, the clustered data are validated by using the WH technique.

Step 1: Initially, the number of communities $(T)$ , set of cluster centroids $({D_{nd(i)}})$ and the number of data points i.e., the nodes $(U)$ in the network $({\Im_{M}})$ are specified.

Step 2: In the next step, the $(T)$ number of cluster centers is selected randomly to group nodes with similar interests.

Step 3: Then, the distance between each node $({U_{l({\textit{loc},\textit{Glo},\textit{core}})}})$ and $({D_{nd(i)}})$ is calculated and the node having the minimum distance is assigned to the closest centre. In the proposed model, community detection is carried out in three ways using Eqs (3)–(5), as local $({S_{\textit{loc}(t)}})$ (nodes within a single layer), global $({S_{\textit{Glo}(t)}})$ (nodes in more than one layer), and core $({S_{\textit{core}(t)}})$ (nodes from all layers).

$\displaystyle U_{\textit{loc}}=\Im_{M}\left\{{U_{l}\in\bigcup\limits_{n=1}{\Re% _{n}},\aleph_{k(U)}\in\bigcup\limits_{n=1}{\Re_{n}}}\right\}$ (3) $\displaystyle U_{Glo}=\Im_{M}\left\{{U_{l}\in\bigcup\limits_{n=1,2}{\Re_{n}},(% {\aleph_{k(U)},\aleph_{I(U)}})\in\bigcup\limits_{n=1,2}{\Re_{n}}}\right\}$ (4) $\displaystyle U_{\textit{core}}=\Im_{M}\left\{U_{l}\in\bigcup\limits_{n}{\Re_{% n}},(\aleph_{k(U)},\aleph_{I(U)})\in\bigcup\limits_{n}{\Re_{n}}\right\}$ (5)

The Euclidean distance $({R_{ed}})$ is calculated using Eq. (6),

$\displaystyle R_{ed}({D_{nd(i)},U_{l({\textit{loc},\textit{Glo},\textit{core}}% )}})=\sqrt{\sum\limits_{i,l=0\text{ to }N}{({D_{nd(i)}-U_{l({\textit{loc},% \textit{Glo},\textit{core}})}})^{2}}}$ (6)

Where, $({\bigcup\Re})$ signifies the aggregation function for each layer.

Step 4: Then, the new centroid is calculated by taking the average of all data points and the clustered data is evaluated using the WH by Eq. (7)

$\displaystyle S_{\textit{eva}(t)}=\sum\limits_{t=1\text{ to }T}{[{\Phi({D_{nd(% i)},U_{l({\textit{loc},\textit{Glo},\textit{core}})}}).r_{t}}]}$ (7)

Where, $S_{\textit{eva}(t)}$ is evaluated clusters, $\Phi$ is the sign function and $r_{t}$ is the rank of corresponding pairs in each cluster.

The WH-based indexing process compares all the related samples in a single community to rank the equality of each sample. Hence, the final community structures is obtained by Eq. (8),

$\displaystyle S_{(t)}=\{{S_{\textit{loc}(t)},S_{\textit{Glo}(t)},S_{\textit{% core}(t)}}\}$ (8)

Where, $S_{(t)}$ is the set of community structures.

Algorithm 1: Community Detection using WH-KMA
Input: Network structure and data $\Im_{M}({\Re,\aleph,U})$ , $D_{nd(i)}$
Output: Detected Communities $S_{(t)}$
Begin
Initialize Network structure and data $\Im_{M}({\Re,\aleph,U})$ , $D_{nd(i)}$
For each node in $\Im_{M}({\Re,\aleph,U})$ do
Initialize clusters $(T)$
Initialize centroids $({D_{nd(i)}})$
For each $({D_{nd(i)}})$ do
Calculate distance to each centre
End for
Assign node to $(T)$ with minimum distance
Evaluate WH
End for
Return $S_{(t)}=\{{S_{\textit{loc}(t)},S_{\textit{Glo}(t)},S_{\textit{core}(t)}}\}$
End

From the community structures, the local communities are considered as homogeneous networks, and the global and core communities as heterogeneous networks.

3.3 Influence path analysis

Influence path analysis is used for investigation of the influence spread propagation in the network by studying the paths. The goal of path analysis is to understand the sequence of contacts or transmissions that lead to the nodes in the network adopting a behavior. Influence path analysis can shed light on the mechanisms of influence propagation, identification of influential nodes or pathways, and offer tactics for increasing influence spread. The influence path analysis using the meta-path indicates different types of interactions among different kinds of nodes. To ease link prediction, the influence path analysis of a heterogeneous network is carried out, which accounts for intra-community and inter-community meta-path propagation. The inter-community meta-path propagation is computed using Eq. (9) which represents common information shared by the same user $U_{l}$ in two different layers $({\Re_{n=1},\Re_{n=2}})$ .

$\displaystyle U_{l}\in\Re_{n=1}\to D_{nd(i)}\to U_{l}\in\Re_{n=2}$ (9)

The intra-community meta-path propagation is computed using Eq. (10) which refers common information shared by two different users $({U_{l=u1},U_{l=u2}})$ in the same layer $\Re_{n=1}$ .

$\displaystyle U_{l=u1}\in\Re_{n=1}\to D_{nd(i)}\to U_{l=u2}\in\Re_{n=1}$ (10)

Based on this, the link between two nodes is weighted to find link importance in the network.

The link importance $({\aleph_{\textit{imp}({I,k})}})$ is computed by Eq. (11),

$\displaystyle\aleph_{\textit{imp}({I,k})}=\frac{\partial({\aleph_{({u1,u2})}})% }{n}-\frac{\partial({\aleph_{({u1})}})\partial({\aleph_{({u2})}})}{n^{2}}$ (11)

Where, $\partial({\aleph_{({u1,u2})}})$ is the frequency of occurrence of the link, $\partial({\aleph_{({u1})}}),\partial({\aleph_{({u2})}})$ are the rate of incidence of two nodes, and $n$ is the number of transactions. Using link importance, the edges in the heterogeneous network are assigned with weights to form a weighted network.

3.4 Feature extraction

In feature extraction from $({S_{\textit{loc}}})$ , the topological features $({\chi_{\textit{top}}})$ , such as shortest path, Adar index, cosine similarity, common neighbours, preferential attachment, etc., were extracted. In the case of $({S_{\delta({\textit{Glo},\textit{Core}})}})$ , path-based features $({\chi_{pf}})$ , such as the path count, normalized pathcount, symmetric activity ratio, and weighted activity ratio were extracted. In addition, some of the contextual features $({\chi_{\textit{con}}})$ , including, user’s interest, number of followers/followees, favorite posts, frequency of posts, and age of the account were extracted from $D_{({{nd(i)}})}$ .

$\displaystyle\chi(j)=\{{\chi_{\textit{top}}(j)\in S_{\textit{loc}},\chi_{pf}(j% )\in S_{\delta({\textit{Glo},\textit{Core}})},\chi_{\textit{con}}(j)\in D_{({{% nd(i)}})}}$ (12)

Where, $\chi(j)$ is the final feature vector obtained by the fusion of the $(j)$ number of features using Eq. (12).

3.5 Link prediction

In social influence analysis, link prediction predicts users social behavior and analyze dynamic propagation of influence spread to infer more accurate influence network based on current structure and additional node and edge parameters. The feature vector $({\chi(j)})$ obtained from historical snapshots of the network structure is used by the Parametric Probability Theory-based Long Short Term Memory (PPT-LSTM) to forecast the feature values for possible future relationships between two nodes. LSTM is selected for its ability to handle long-term dependencies. It’s goal is to describe the probability distribution of future links between nodes in a network by using the sequential character of network data to identify temporal dependencies. PPT-LSTM is a powerful paradigm for forecasting future interactions and revealing hidden linkages in dynamic networks that uses both temporal dynamics and probabilistic modelling. However, the maximum parameters and operations used in the model increase the training time. To solve this, probability theory and the parametric Tanh activation function are used for the parameter setup layer in LSTM. The architecture of PPT-LSTM is shown in Fig. 2.

Figure 2.

Architecture of PPT-LSTM.

The PPT-LSTM consists of two recurrent features, such as hidden state and cell state. At the current time step $(q)$ , the input feature vector $({\chi_{j(q)}})$ and the output from the previous hidden state $({H_{({q-1})}})$ are fed to the cell state. Inside the cell state, the inputs were processed by three gates, which are forget gate $({G_{f(q)}})$ , input gate $({G_{in(q)}})$ , and output gate $({G_{o(q)}})$ represented by Eqs (13)–(15).

$\displaystyle G_{f(q)}=A_{\textit{sig}}({\hbar_{f(q)}({\chi_{j(q)},H_{({q-1})}% })+F_{f}})$ (13) $\displaystyle G_{in(q)}=A_{\textit{sig}}({\hbar_{in(q)}({\chi_{j(q)},H_{({q-1}% )}})+F_{in}})$ (14) $\displaystyle G_{o(q)}=A_{\textit{sig}}({\hbar_{o(q)}({\chi_{j(q)},H_{({q-1})}% })+F_{o}})$ (15)

Where, $A_{\textit{sig}}$ is the sigmoid activation function. The parameter setup layer is to initialize the network parameters $({\hbar_{f,in,o(q)},F_{f,in,o}})$ based on the probability theory as Eq. (16),

$\displaystyle\Theta({\hbar,F})\leftarrow 0\leqslant\Theta({\hbar,F})\leqslant 1% ,\text{ for each time }q({\hbar,F})\subset\textit{Max}(\textit{Acc})$ (16)

Hence, the new cell state and hidden states are expressed by Eqs (17)–(19),

$\displaystyle C_{(q)}=G_{f(q)}.C_{({q-1})}+G_{in(q)}.C_{(q)}$ (17) $\displaystyle H_{(q)}=G_{o(q)}.A_{PT}({C_{(q)}})$ (18) $\displaystyle A_{PT}=\left\{{\begin{array}[]{ll}\beta.C_{(q)}&\text{if }({C_{(% q)}<0})\\ \tanh({C_{(q)}})&\text{if }({C_{(q)}\geqslant 0})\\ \end{array}}\right.$ (19)

Where, $A_{PT}$ is the parametric tanh function, $\beta$ is the random parameter, $\Theta({\hbar,F})$ is the probability of parameters, and $\textit{Max}(\textit{Acc})$ signifies the maximum accuracy. The feature vector is provided as input to the recurrent layers of the link prediction network. Here, the information in each feature vector consists of the topological features, path based features, and contextual features. The feature vector has different types of features combined in a single vector. These features are used to train the link prediction network and based on this training, the network outputs the forecasted mutual features.

Finally, using the forecasted mutual features between the unconnected nodes, such as mutual connections, user mentions count, favourites, followers, friends, etc., from PPT-LSTM, the edges that are expected to be added to the network are predicted. With the prediction result the network graph updated for the time $(q+1)$ is constructed with newly added or rejected edges as, $\{{\Im_{M(q+1)}({\Re,\aleph,U})}\}$ .

3.6 Influencing nodes identification

Cliques are fully linked subgraphs in which every node is directly connected to every other node. Clique is selected for its efficiency in terms of both execution time and accuracy. But, the existing algorithm suffers to find an accurate group due to too much of estimations for the irrelevant cell size to a set of very high values. Therefore, the values in the high range are normalized using the linear scaling technique. Linear Scaling based Clique (LS-Clique) is an algorithm that uses cliques to find influential nodes by applying functional characteristics of cliques. From the graph $\{{\Im_{M(q+1)}({\Re,\aleph,U})}\}$ , the most influencing nodes are predicted using the LS-Clique detection algorithm as given below.

Step 1: If $\{{\Im_{M(q+1)}({\Re,\aleph,U})}\}$ is a complete graph, then $Y=\{{Y_{1},Y_{2},Y_{3},\ldots,Y_{N}}\}$ are the set a chain of subsets in the graph.

Step 2: Equation (20) is used to find the nodes $({U_{p}})$ whose neighborhood $\phi(p)$ is maximal in the graph and the nodes $({U_{q}})$ whose elements are mutually related $\phi(p)\subset\phi(q)$ .

$\displaystyle({U_{p},U_{q}})=\left\{{\begin{array}[]{ll}U_{p}&\max({\phi(p)})% \\ U_{q}&\phi(p)\subset\phi(q)\\ \end{array}}\right.$ (20)

Step 3: Set of cliques $({R^{\prime}})$ computed using Eq. (21) for the subgraph $({Y(p)-U_{p}})$ generated by the neighborhood of a point $({\phi(p)-\{p\}})$ .

$\displaystyle R^{\prime}=Y(p)-U_{p}\leftarrow\phi(p)-\{p\},R^{\prime}\in Y(p)=% \{{Y\cup\{{U_{p}}\}}\}$ (21)

Where, $\{{Y\cup\{{U_{p}}\}}\}$ indicates the cliques $({R^{\prime}\in Y(p)})$ added to one subset of the chain, $p$ indicates the nodes with maximum neighborhoods and $q$ indicates mutually connected elements.

Step 4: The set of cliques $({Q^{\prime}})$ computed using Eqs (22) and (23) for the subgraph $({Y(q)-U_{q}})$ generated by the point $\phi(q)-\{q\}$ .

$\displaystyle Q^{\prime}=Y(q)-U_{q}\leftarrow\phi(q)-\{q\},B=\{{Y\in Q^{\prime% }:Y(q)\not\subset\phi({U_{p}})}\}$ (22) $\displaystyle B=\frac{\phi-\phi_{\min}}{\phi_{\max}-\phi_{\min}}$ (23)

Where, $Q^{\prime}\in Y(q)=\{{Y(p)\not\subset\phi({U_{p}})}\}$ signifies the elements that cannot be arranged in the subset of $Y(p)$ , and $B$ is the range of points normalized using the LS technique for cliques of any finite size. LS is a method of rescaling the distribution of data with a large range to the lowest value. It reduces the enlarged ranges in the same ratio to remain true to the graph. This results in the estimation of the correct group for IM.

Step 5: The steps are continued until all graph nodes $U=\{{({U_{p},U_{q}})\cup\Im_{M(q+1)}(U)}\}$ are traversed.

In this way, the influencing nodes i.e seed nodes are selected recursively according to the dynamic change in the network and recommended for the network for IM.

4. Results and discussion

The real-time social network datasets for twitter, instagram and facebook are used for the implementation of proposed algorithms.

4.1 Dataset description

The data for the experimental work were collected from different social platform datasets, such as Instagram, Facebook, and Twitter [14, 15, 16, 17, 18, 19]. Instagram social media data for the IM task consists of users, with 70,409 nodes/users, and 1,007,107 edges/connections (followees and followers). The Twitter data is collected through the streaming API available online. The Twitter dataset includes 75,460 twitterers, 1,02,426 tweets, and 1,22,276 retweets in addition to the twitterers posted at once, retweeted at once, and twitterers engaged in both tweeting and retweeting. The Facebook dataset contains the details of friendship-relationship networks among 93 users connected together by as many as 323 links in the social network, the nodes, and links.

4.2 Performance analysis

Here, the outcomes attained for the proposed context-based IM model are contrasted with the existing methods.

Figure 3 compares the community detection accuracy of proposed WH-KMA and existing KMA, Fuzzy C-Means (FCM), Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) and Clustering Large Applications (CLARA) methods. The community detection accuracy of the proposed model is 97.51%, which is higher as compared to the existing methods. The analysis states that the WH technique well distinguished the grouped data.

Table 1
Analysis of community detection time

Methods	Community detection time (ms)
Proposed WH-KMA	11017
KMA	13838
FCM	18137
BIRCH	20438
CLARA	22514

Figure 3.

Performance comparison of proposed WH-KMA.

Figure 4.

Performance comparison of proposed PPT-LSTM.

Table 1 compares the community detection time of the proposed WH-KMA and existing community detection algorithms. While analysing Table 1, it is observed that time taken by the proposed WH-KMA is decreased by 2821 ms than KMA. Therefore, the identification of communities with the parallel evaluation of the goodness of split using the WH technique minimized the community detection time of the proposed model.

Figure 4 shows comparative analysis of the proposed PPT-LSTM, LSTM, Convolution Neural Network (CNN), Recurrent Neural Network (RNN), and Artificial Neural Network (ANN) in terms of accuracy, precision, recall, F-measure, and specificity. While analyzing the accuracy and specificity, the results attained by the proposed model are improved by 1.39% and 4.05% than the existing methods. Therefore, reduction of parameters using the PPT technique well supported the backpropagation process to learn more relevant data.

Table 2

Analysis of training time

Methods	Training time (ms)
Proposed PPT-LSTM	10.12
LSTM	14.13
CNN	16.24
RNN	18.04
ANN	21.28

Table 2 shows the model training time of the proposed PPT-LSTM and existing LSTM, CNN, RNN, and ANN methods. While comparing, the training time of the proposed model is decreased by 4.01 ms than the LSTM as well as less than other techniques. Hence, the reduction of parameters in the desirable sense to minimize the loss using the PPT technique, minimized the training time of the proposed context-based IM model.

Figure 5.

Comparison of influence spread.

Figure 6.

Comparative analysis of proposed LS-Clique of (a) Twitter (b) Facebook and (c) Instagram.

Figure 7.

Analysis of running time.

Figure 8.

Comparative analysis of influence spread of proposed LS-Clique IM model with existing IM model.

Figure 5 analyses the achieved influence spread of proposed LS-Clique and existing Clique, Genetic Algorithm (GA), Greedy, and Degree Discount Heuristic (DDH) methods for the seed size at interval of 10 upto 50 iteration. In comparison with the existing methods, the influence spread by the proposed LS-Clique algorithm is maximum. This indicates that the separate feature analysis of heterogeneous and homogeneous networks improved the performance of the proposed context-based IM model.

Figure 6 shows the influence coverage attained by the proposed LS-Clique and existing IM model for Twitter, Facebook, and Instagram social media networks. The influence coverage of proposed LS-Clique algorithm outperforms the existing approaches under various node generation. Thus, the community detection-based network analysis improved the quality of the proposed context-based IM model.

Figure 7 analyses the execution run time of the proposed LS-Clique and existing models for three different network data, Twitter, Facebook, and Instagram. The running time attained by the proposed context-based IM model is 4.214 s (Twitter), 4.21 s (Facebook), and 2.91 s (Instagram), which is minimum than the existing methods. This indicates that periodic estimation of network snapshots, continuous updation of influencing nodes, and reduction of enlarged graphical ranges using the LS technique significantly minimized the execution run time of the proposed context-based IM model.

Figure 8 analyses the influence spread of proposed LS-Clique IM model with existing IM models developed by Li et al. [14], Liu et al. [16], Zhang et al. [15], and Wang et al. [19] in Section 2. The analyses are done for a maximum node set of size 50. While comparing the existing methods, the influence spread attained by the proposed model is maximum. The consideration of contextual features from the network nodes and the effective link prediction through different ways of community detection proves the effectiveness of the proposed context-based IM model.

5. Conclusion

In this paper influence maximization model framework is based on context propagation in a multiplex network. The WH-KMA-based community detection, PPT-LSTM-based link prediction, and LS-Clique-based IM are the three significant contributions of the proposed context-based influence maximization model framework. The proposed model is implemented using the social media datasets collected for three different networks, namely Twitter, Facebook, and Instagram. Experimental analysis is done to evaluate the performance of the proposed context-based IM model including influence spread, coverage, running time analysis of LS-Clique, link prediction performance of PPT-LSTM, and community detection analysis of WH-KMA. In the experimental analysis, the proposed PPT-LSTM, WH-KMA, and LS-Clique techniques are contrasted with the existing methods. Separate feature analysis of heterogeneous and homogeneous networks, periodic estimation of network snapshots, continuous updation of influencing nodes, and reduction of enlarged graphical ranges using the LS technique significantly minimized the execution run time of the proposed context-based IM model. The consideration of contextual features from the network nodes and the effective link prediction through different ways of community detection proves the effectiveness of the proposed context-based IM model. The experimental outcomes reveal that the proposed context-based IM model achieves an enhanced performance and faster than the state-of-the-art algorithms in a dynamic environment. In the future, the work can be extended to assess the scalability in providing approximation bound on IM.

Footnotes

Declaration of competing interest

The authors certify that there are no conflicts of interest in the subject matter discussed in this manuscript.

References

Liu

Wang

Chen

Zhang

. Link Prediction Model for Weighted Networks Based on Evidence Theory and the Influence of Common Neighbours. Hindawi Complexity. 2022; 1-16. doi: 10.1155/2022/9151340.

Singh

Kailasam

. Link prediction-based IM in online social networks. Neurocomputing. 2021; 453: 151-163. doi: 10.1016/j.neucom.2021.04.084.

Singh

Srivastva

Verma

Singh

. Influence Maximization frameworks, performance, challenges and directions on social network: A theoretical study. Journal of King Saud University – Computer and Information Sciences. 2021; 34(9): 7570-7603. doi: 10.1016/j.jksuci.2021.08.009.

Kumar

Mishra

Singh

Biswas

. Community Enhanced Link Prediction in Dynamic Networks. ACM Transactions. 2023; 1-32. doi: 10.1145/3580513.

Cai

Deng

Wang

Sellis

Xia

. Community-diversified IM in social networks. Information Systems. 2020; 92: 1-12. doi: 10.1016/j.is.2020.101522.

Cai

Mian

Sellis

. Target-aware holistic IM in spatial social networks. IEEE Transactions on Knowledge and Data Engineering. 2022; 34(4): 1993-2007. doi: 10.1109/TKDE.2020.3003047.

Dhandapani

. Novel Influence Maimization algorithm for social network behavior management. Journal of ISMAC. 2021; 3(1): 60-68. doi: 10.36548/jismac.2021.1.006.

Banerjee

Jenamani

Pratihar

. A survey on Influence Maimization in a social network. Knowledge and Information Systems. 2020; 62(9): 3417-3455. doi: 10.1007/s10115-020-01461-4.

Biswas

Abbasi

Chakrabortty

. An MCDM integrated adaptive simulated annealing approach for influence maximization in social networks. Information Sciences. 2021; 556: 27-48. doi: 10.1016/j.ins.2020.12.048.

10.

Bhatkar

Gosavi

Shelke

Kenny

. Link Prediction using GraphSAGE. International Conference on Advanced Computing Technologies and Applications (ICACTA). 2023. pp. 1-5. doi: 10.1109/ICACTA58201.2023.10393573.

11.

Kahr

Leitner

Ruthmair

Sinnl

. Benders decomposition for competitive Influence Maximization in networks. Omega. 2021; 100: 1-13. doi: 10.1016/j.omega.2020.102264.

12.

Kumar

Singhla

Jindal

Grover

Panda

. IM-ELPR: Influence Maximization in social networks using label propagation based community structure. Applied Intelligence. 2021; 51(11): 7647-7665. doi: 10.1007/s10489-021-02266-w.

13.

Ali

Babaei

Chakraborty

Mirzasoleiman

Gummadi

Singla

. On the fairness of time-critical IM in social networks. IEEE Transactions on Knowledge and Data Engineering. 2023; 35(3): 2875-2886. doi: 10.1109/TKDE.2021.3120561.

14.

Zareie

Sakellariou

. Minimizing the spread of misinformation in online social networks: A survey. Journal of Network and Computer Applications. 2021; 186: 1-14. doi: 10.1016/j.jnca.2021.103094.

15.

Jiang

Bai

Lai

. ABEM: An adaptive agent-based evolutionary approach for IM in dynamic social networks. Applied Soft Computing. 2023; 136: 1-14. doi: 10.1016/j.asoc.2023.110062.

16.

Zhang

Gao

Zhou

. Influence Maximization based on SATS scheme in social networks. Computing. 2023; 105(2): 275-292. doi: 10.1007/s00607-022-01125-x.

17.

Liu

Wang

. An Influence Maximization method based on crowd emotion under an emotion-based attribute social network. Information Processing and Management. 2022; 59(2): 1-18. doi: 10.1016/j.ipm.2021.102818.

18.

Wang

Xie

Koh

Cheong

. Identifying influential spreaders in social networks through discrete moth-flame optimization. IEEE Transactions on Evolutionary Computation. 2021; 25(6): 1091-1102. doi: 10.1109/TEVC.2021.3081478.

19.

Zhang

Gan

. Identifying influential nodes in social networks via community structure and infuence distribution difference. Digital Communications and Networks. 2021; 7(1): 131-139. doi: 10.1016/j.dcan.2020.04.011.

20.

Wang

Sun

. Influence Maximization in social graphs based on community structure and node coverage gain. Future Generation Computer Systems. 2021; 118: 327-338. doi: 10.1016/j.future.2021.01.025.

21.

Olivares

Munoz

Riquelme

. A multi-objective linear threshold inuence spread model solved by swarm intelligence-based methods. Knowledge-Based Systems. 2021; 212: 1-14. doi: 10.1016/j.knosys.2020.106623.

22.

Gasko

Kepes

Lung

Suciu

. Identification of influential nodes with Shapley Influence Maximization Extremal Optimization algorithm. Applied Soft Computing. 2023; 146: 110653. doi: 10.1016/j.asoc.2023.110653.

23.

NasehiMoghaddam

Fathian

Amiri

. Alternate solutions for influence maximization: Beyond theoretical approximation by the Genetic Algorithm Framework. Swarm and Evolutionary Computation. 2023; 83: 101424. doi: 10.1016/j.swevo.2023.101424.

24.

Khatri

Choudhry

Rao

Tyagi

Vishwakarma

Prasad

. Influence Maximization in social networks using discretized Harris’ Hawks Optimization algorithm. Applied Soft Computing Journal. 2023; 149: 111037. doi: 10.1016/j.asoc.2023.111037.

25.

Venkatakrishna

Chowdary

. K+⁣+ Shell: Influence maximization in multilayer networks using community detection. Computer Networks. 2023; 234: 109916. doi: 10.1016/j.comnet.2023.109916.

26.

Guidi

Michienzi

Zhu

. Equilibrium of individual concern-critical influence maximization in virtual and real blending network. Information Sciences. 2023; 648: 119646. doi: 10.1016/j.ins.2023.119646.

27.

Kazemzadeh

Safaei

Mirzarezaee

Afsharian

Kosarirad

. Determination of influential nodes based on the Communities’ structure to maximize influence in social networks. Neurocomputing. 2023; 534: 18-28. doi: 10.1016/j.neucom.2023.02.059.

28.

Liang

. Targeted influence maximization in competitive social networks. Information Sciences. 2023; 619: 390-405. doi: 10.1016/j.ins.2022.11.041.

29.

Rao

Wang

Chen

Zhou

Zhu

. Maximizing the influence with K-grouping constraint. Information Sciences. 2023; 629: 204-221. doi: 10.1016/j.ins.2023.01.139.

30.

Dong

Yang

Meng

. TSIFIM: A three-stage iterative framework for influence maximization in complex networks. Expert Systems With Applications. 2023; 212: 118702. doi: 10.1016/j.eswa.2022.118702.

31.

Venkatakrishna

Chowdary

. CBIM: Community-based influence maximization in multilayer networks. Information Sciences. 2022; 609: 578-594. doi: 10.1016/j.ins.2022.07.103.

32.

Yang

Song

Tong

Chen

Zhu

Wua

Liang

. Extending influence maximization by optimizing the network topology. Expert Systems With Applications. 2023; 215: 119349. doi: 10.1016/j.eswa.2022.119349.

33.

Chen

Han

. Influence maximization in social networks: Theories, methods and challenges. Array. 2022; 16: 100264. doi: 10.1016/j.array.2022.100264.

34.

Dataset Link Available from: Instagram: https://www.kaggle.com/krpurba/im-instagram-70k-eg.

35.

Dataset Link Available from: Facebook: https://www.kaggle.com/code/boldy717/simple-network-analysis-of-facebook-data.

36.

Dataset Available from: Twitter: https://www.kaggle.com/datasets/goyaladi/twitter-dataset?select=twitter_dataset.csv.

37.

Dataset Available from: http://weibo.com.

38.

Dataset Available from: http://snap.stanford.edu/data/ego-Facebook.html.

39.

Dataset Available from: http://snap.stanford.edu/data/ego-Twitter.html.

Context propagation based influence maximization model for dynamic link prediction

Abstract

Keywords

1. Introduction

2. Literature survey

2.1 Problem statement

3.1.1 Initial network graph

4.1 Dataset description

4.2 Performance analysis

Table 1 Analysis of community detection time

Footnotes

Declaration of competing interest

References

Table 1
Analysis of community detection time