EC-BED-NETS: A Novel Deep Learning Framework for Recognizing Dominant Nodes in Multifaceted and Social Networks

Abstract

Identification of influential nodes in multifaceted and social networks become one of the most significant researches in this booming digital world. Many strategies were proposed to determine the dominance of nodes based on their topographical information in the networks. Traditionally, centrality measurements were used directly on topographical structure of the networks and these measurements consider different characteristics related to structural and functional importance. The nonlinear link between the functional importance of the nodes, which makes the study so complicated and difficult to detect using traditional centrality measures. Inspired by the amazing execution structure of long short-term memory (LSTM), this article proposes the new hybrid boosted ensemble LSTM framework for solving the mentioned problem. This proposed framework adopts the enhanced centrality methods to construct the different feature vectors that can reflect the functional and structural location of the nodes in their networks, then categorizes the nodes in accordance with the measurements, and finally uses the proposed boosted deep learning framework to classify and rank the influential nodes. From the extensive experiments, the proposed framework has shown the best classification accuracy of 95.5% and it outperforms the other machine and deep learning models and even traditional centrality measurements.

Introduction

In recent times, social networks have been rooted in common man's life and it is playing the most crucial role in transforming human life. These networks play an important role in spreading the vital information among the other users in networks.^1–3 Individuals who participate in social and complex networks have independent characteristics and importance based on their inputs and involvement. From those, few individuals play a significant role in the success of spread of information.^4–6

With this scenario, research of finding the influential points in the social and complex networks has attracted brighter interests because of its features and also its excessive implication in numerous purposes such as (1) in network protection model, the key target is to find the crucial nodes that have control over the other nodes in terms of transmitting fake news, worms, and even transmission of diseases⁷; (2) in big data analysis, investigators have adopted applicable models to detect important information among the enormous data stored⁸; (3) in electrical field, researchers use intelligent methods to identify the influential nodes during power transmission⁹; (4) marketing investors in advertising domain use novel methods to optimize the advertising strategy¹⁰; and (5) even in stem cell therapy, cognitive methods were adopted to identify key information in the cells.¹¹

Various centrality measurements, such as betweenness,¹² degree,¹³ closeness,¹⁴ shell types,¹⁵ and cluster rank values,¹⁶ Eigen value,¹⁷ can be used to identify prominent nodes in networks, with nodes with higher centralities being deemed more influential than those with lower centralities. However, in many real-time scenarios, these centrality measures yield only local and global information but not the other relational factors such as neighborhood importance and infection rates (IRs).¹⁷ Hence the traditional centrality measurements suffer from the drawback of finding the complex relationship between the utilitarian significance of the nodes and its features that result in performance degradation and nonflexible behavior in a real-time situation.

The rise of artificial intelligence in terms of machine and deep acquiring models has been used to tackle the mentioned problems. Recently the deep learning models have become more popular due to its flexibility and handling huge volume of data. The significant network-oriented applications of deep neural learning models are listed as: (1) in link prediction mechanism, gated recurrent unit^18,19 encoder has been used to learn both spatial and temporal information of the network and encoded long short-term memory (E-LSTM)²⁰ has been used to enhance the performance in link prediction problems. However, these algorithms are used to predict the connectivity in the networks but do not solve the mentioned problems. Moreover, intelligent modeling of deep learning methods is badly required to uncover the complex relationship between the different characteristics of the nodes that, in turn, effectively classifies the most important influential category of nodes without sacrificing performances.

With this motivation, this research proposes the new deep learning-based framework EC-BED-NETS (enhanced centralities based on boosted ensembled deep learning networks). The proposed algorithm is different from the existing methodologies in the following ways: 1.

This research discusses about the enhanced centrality measurements suitable for designing an intelligent prediction model for influential nodes using deep learning models.

The next objective of this research is all about designing the hybrid scalable deep learning models that can efficiently classify the important nodes in the networks.

Finding the influential nodes in social media networks is a real challenge among researchers. The proposed deep learning model that is implemented in this research has provided new limelight for the mentioned problem.

The association of the script is given as follows:

Related studies by different authors are discussed in Related Studies section. Details of the proposed methodology, enhanced feature extraction, and data labeling along with the proposed hybrid algorithms are discussed in the Proposed Methodology section. Experimental results, comparative analysis, and validation mechanisms are accessible in Experimental Results section. The article is concluded along with future scope in Discussion and Conclusion section.

Related Studies

Most commonly, real-time applications such as Twitter, social networks, G-maps, and so on are structured as the network graphs (i.e., nodes and links). In such complex networks, endeavor influential nodes become a big challenge nowadays. To address this, researchers are focusing on learning models to predict the most important nodes on each network.

Fan et al. in 2019 proposed a neural network graph framework termed as “DrBC” to perceive the betweenness centrality (BC) among nodes in the network. This traditional method comprises encoders and decoders. Neighborhood encoders are used to capture each node's structural information. The Meridien Lossless Packing (MLP) decoder uses the BC value to determine the relative rank of each node, which is then prioritized during runtime. This learning model includes small network graphs that is directly implemented on real-world network.²¹ Ahmad et al. in 2020 developed a new clustering approach for identification of influential nodes on social networks. The proposed method used the k-shell decomposition algorithm to segregate the nodes and forms the clusters. Extended clustering coefficient ranking approach is designed to group local nodes based on its similarity and is correlated together. A node with low-correlation ratio is identified as the most influential node that can spread messages to different locations inside the social network. The proposed model does have a time complexity barrier that is difficult to overcome if the network is too large.²²

Li et al. created a reinforcement learning algorithm such as Quasi-Recurrent Neural Network (Q-RNN) with seed selection approach to detect the most significant node in social networks. Deep learning neural network-based influence maximization (DISCO) includes various states to capture the structural information of nodes. Based on the rewards received on each node, influential nodes have been detected. DISCO can avoid the costly diffusion sampling phase.²³ Devika and Subramaniyaswamy developed a semantic graph with vertices and links. Unsupervised semantic graph-based solutions for singable reflection have an essential point to establish an organisation of words, and then hubs will be positioned based on centrality measures. The proposed method includes feature extraction of Twitter information, keyword extraction algorithm in terms of BC, eigenvector, degree, etc. The extracted features comprise semantic features and node features to notice the top 10 significant nodes in Twitter network, and page ranking algorithm is used to sort the influential nodes at runtime.²⁴

Mao and Xiao created an essential centrality measure named PARW-Rank which used the basic centrality measures for assessing hub influences and arbitrary walk procedure. The halfway inclination map is built by analyzing the inclination connection between each hub match in an organization for each important degree (partial dependence graph).²⁵ Nurek and Michalski utilized supervised learning algorithms such as decision trees, support vector machine (SVM), and Random Forest to predict the influential node in social networks. The network features and social information are combined to form structural data. The developed preparation data have been feedforwarded into various machine learning algorithms.²⁶

Rossi et al. devised a system for visualizing complex social systems and identifying significant hubs in their structure. Creators proposed a most standard algorithm called as K-truss composition strategy that makes a difference in visualizing and analyzing the social systems and discovery of persuasive hubs.²⁷

Zhao et al. devised a scheme and tested it within the same network. He found out that the typical role of machine learning (ML) classifiers is far superior to traditional centrality techniques in identifying server centers and their capacities, and shows the better centrality procedures. The proposed model is effective in small scale networks and not suitable for large network models.²⁸

Proposed Methodology

The proposed EC-BED-NETS structure for categorizing the influential nodes is presented in Figure 1. In this article, node and hub have been interchangeably used with similar meaning.

FIG. 1.

Block diagram for the proposed EC-BED-NETS frameworks for influential node identification. EC-BED-NETS, Enhanced Centralities Based on Boosted Ensembled Deep Learning Networks.

Overview mechanism

Identification of the influential nodes based on the proposed algorithm is shown in Figure 1. The proposed algorithm uses the enhanced centrality features along with the infections as the preparation data sets. This article builds the new feature vectors to each node using the enhanced centrality measures and the IRs. Then, based on the training data sets, the article uses the boosted ensembled deep learning models to learn the cataloging/prediction rules and to categorize the importance of influential nodes in complex and social networks. Relevant discussions are presented as follows.

Data sets

In this study, we analyzed several benchmark network datasets such LFR-1000,²⁹ LFR-200,²⁹ Dolphin,³⁰ Copperfield,³¹ Netscience,³² Euroroad,³³ Chicago,³⁴ and PGP,³⁵ as well as social networks like WhatsApp to gather data and build centrality features for improved categorization. Table 1 presents detailed information of the different network scenarios used for data set collection.

Table 1.

Different network scenario benchmarks used for data set collection

Data sets details	V = Nodes	\|E\| (no. edges)	Max degree	Average degree	Assortativity
LFR-1000	1000	10,610	98	21,22	−0.0786777
LFR-200	200	1052	16	10.520	0.202020
Dolphins	62	159	12	5.129	−0.0043567
NetScience	379	914	34	4.823	−0.008132
Copperfield	112	425	49	7.589	−0.219000
Chicago	1467	1417	12	1.770	0.216756
Euroroad	1176	1298	10	1.414	0.12456
PGP	10,680	24,316	205	2.230	0.23821
Whatsapp	50	2145	31	2.4375	0.1234

Figure 2 shows the different network benchmark scenarios that are used for testing and validation.

FIG. 2.

Different network scenario used in the proposed framework (a) LFR-1000; (b) LFR-200; (c) PGP; (d) Dolphins; (e) WhatsApp networks.

Enhanced centrality feature extraction

The main focus of this research is to construct efficient feature databases that can differentiate the normal and influential nodes. In the literature, many centrality measure detection methods have been proposed to represent the importance of the nodes. But this article illustrates the usage of the enhanced number of centrality measurements to make the most accurate classification of the influential nodes.

To represent the structural and functional characteristics of the nodes, the following centralities are measured.

Degree centralities

This centrality signifies the number of relationships connected to the nodes. It has two types: one is indegree centrality and the other is outdegree centrality. The centralities can be determined by the following equations:

Indegree centrality, $D_{i n} (P_{i}) = |P_{j i} \in P|, j \neq i,$ (1)

where “P_ji” is the edge going from P_i node to evaluate node P.

Outdegree centrality, $D_{o t} (P_{i}) = |P_{i j} \in P|, i \neq j,$ (2)

where “P_ij” is the connectivity weight (i.e., edge) from evaluated node P_i to every other node P_j in the network.

Betweenness centralities

This reflects the fraction of all shortest paths passing through the nodes. The mathematical expression for BC is given as follows: $D_{B} (P_{i}) = \sum_{P_{s} \neq P_{i} \neq P_{d}} \frac{μ_{P_{s}, P_{d}} (P_{i})}{μ_{P_{s}, P_{d}}},$ (3)

where $μ_{P_{s}, P_{d}} (P_{i})$ is the number of shortest paths between nodes P_s and P_d passing through node P_i and $μ_{P_{s}, P_{d}}$ is the number of all shortest paths between P_s and P_d.

Closeness centralities

This represents the distance of nodes in the networks whose mathematical expressions are given as $D_{c} (P_{i}) = \frac{N}{\sum_{P y} d (P_{y}, P_{i})},$ (4)

where N is the number of vertices in the network and d(P_y,P_i) is the distance between vertices P_y and P_i.

Eigen vector centralities

The eigen vector centrality is used for computing the centralities of other nodes in the network and the expression to determine eigen vector centrality is given as follows: $E_{v} (P_{i}) = 1 ∕ α \sum_{k} γ_{P_{k}, P_{i}} * E_{v} (P_{k}),$ (5)

where $A = α (k, i)$ is the adjacency matrix of a graph and $γ$ is a constant.

PageRank centralities

It computes the ranking of the nodes based on the centrality of the nodes in networks and the expression to determine it is given as $R_{p} (P_{i}) = ρ \sum_{k} \frac{A_{P_{k}, P_{i}}}{d_{k}} * R_{p} (P_{k}) + β,$ (6)

where $ρ$ and β are constants and d_k is the outdegree of node P_k if such degree is positive, or d_k = 1 if the outdegree of node P_k is null. Again, A = (a_i,j) is the adjacency matrix of a graph, where $A = α (k, i)$ is adjacent matrix.

Position centrality

This is considered to be most important measurement that represents the position of the nodes in relevance to the important nodes that are calculated by the PageRank mechanism. $H_{c} (P_{i}) = β \sum_{k} γ_{P_{i}, P_{k}} * R_{p} (P_{i}),$ (7)

where A = (a_i,j) is the adjacency matrix of a graph and R_p (P_i) is the page rank of the node, β is a constant.

Clustering coefficient

This parameter reflects the fraction of triangles that exist all over existing triangles in the nodes' neighborhood. The mathematical expression for finding clustering coefficient is determined as $C_{c} = 2 M_{P, i} ∕ K_{i} (K_{i} - 1),$ (8)

where “M_P,i” is the number of neighbors associated with the hub P_i. Within the equation, it is connected with the number of conceivable sets of neighbors of hub P_i, that is k_Pi = (k_Pi − 1)/2, where k_Pi is the degree of a hub P_i.

Besides these centralities, this article includes another important parameter called clique that plays an important role in increasing the performance of the classifier. A clique is characterized as a completely associated subgraph that implies that each hub has coordinated links to all other hubs within the clique. Ref.³⁶ details more about this measure that is included in the proposed network. The method of measuring K-Score and K-Shell is discussed in Ref.³⁷

Other important measures that are used for classifications are neighborhood variability and time stamp centrality. The neighborhood variability is defined as the difference between the set of neighbors that the specific nodes communicate in different time stamps. The neighborhood variability can be calculated in three important methods such as transmitted neighborhood variability (TNV), received neighborhood variability (RNV), and general neighborhood variability.

TNV considers a set of neighbors to which the given node was sending messages. RNV looks at a set of neighbors from which the given node had been receiving messages. General neighborhood variability uses a set of neighbors with which the node communicates, without distinguishing between sending and receiving messages. The Jaccard coefficient was used for calculating the difference between sets, so the coefficient takes values between 0 and 1, where 0 means totally different sets and 1 means identical sets. The Jaccard coefficient was calculated for a particular time duration of 10 seconds. Furthermore, a neighborhood variability was calculated as an average Jaccard coefficient for each node based on time intervals (10 seconds). This parameter is also used to determine the activeness of the nodes in the networks.

To determine the neighborhood variability, time stamp centrality is considered to be the most important parameter. The time stamp is the time duration taken by the nodes to transmit and receive the messages. The mathematical expression to determine the time stamp is given as follows:

where $R t (P (j, i))$ is the time taken by the nodes to receive the messages, $T t (P i, j)$ is the time for the nodes to transmit the messages, and N is the total time interval. In summary, all the feature vectors used for classifications are listed in Table 2.

Table 2.

Summary of features used for the proposed classification

SI. no	Centrality features	Significance
01	Indegree centrality	Represents the number of links connected to the nodes.
02	Outdegree centrality	Represents the number of links connected to the nodes.
03	Betweenness centrality	Reflects the fraction of all shortest paths passing through the nodes
04	Closeness centrality	Represents the distance of nodes in the networks
05	Eigen vector centrality	Used for computing the centralities of other nodes in the network
06	PageRank centrality	Computes the ranking of the nodes based on the centrality of the nodes in networks
07	Position centrality	Represents the position of the nodes in relevance to the important nodes
08	Clustering coefficient	Denotes the fraction of triangles that exist all over existing triangles in the nodes' neighborhood
09	K-shell centrality	Denotes the K number of decomposition of networks
10	K-score centrality	Computes the K number of pruned nodes in networks
11	Time stamp centrality	Calculation of time difference between the transmitted and received messages in the network
12	TNV	Considers a set of neighbors to which are used for sending messages.
13	RNV	Considers the set of neighbors used for receiving messages.
14	General neighborhood variability	Represents the total number of neighbors in communication.

RNV, received neighborhood variability; TNV, transmitted neighborhood variability.

Data labeling methodology

To make an effective labeling process since these inputs are given as the inputs to the deep learning models, this article uses the advantage of SIR (susceptible infection-recovered models)³⁸ to analyze the different influential nodes in the networks. The article uses SIR models related to COVID-19³⁹-based widespread models. Normally it splits the networks into three categories such as infectious, recovered, and susceptible. The former nodes can contaminate the neighbors with the IR and also can recover back to normal state with recovery rate. It is a dynamic and random process in which it is used to set the nodes as infections and noninfections. The final outbreak number of infected and recovered hubs and hub with larger outbreak is considered to be more powerful. For different analyses and validation, we have used the IR ranging from 0.1 to 0.8 (even equal to 1).

The main purpose of the research is to identify the influential nodes in the networks using our proposed deep learning models and hence labeling is required to detect the influential hubs. As already mentioned, infected nodes are considered as label 1 and noninfected nodes are named as label 0. Furthermore, this article uses the IR as the important parameter to decide the number of labels.

BED-NETS predictor

In a previous research,²⁸ many machine learning algorithms were used for an effective classification of influential nodes. As discussed in the Enhanced Centrality Feature Extraction section, a large number of data sets with increasing labels were extracted and will result in overfitting problems during classification that may degrade the performance of the classifiers. Hence this article introduces the new hybrid predictor BED-NETS that is an integration of long short-term memory (LSTM) and boosted machine learning algorithms for better classification of influential nodes. This section discusses about the step-by-step formulation of the proposed classifier.

ADA-boosted algorithm

Freund³⁹ was first to propose the Ada-boost algorithm that strengthens the weak classifiers. Normally the Ada-boost algorithm strengthens the weak classifiers by updating its weights until classification/prediction accuracy is obtained as maximum. The final model is a strong model when every weak classifier satisfies the performance of it rather than guessing as discussed in Zhang et al.⁴⁰ It is not effortlessly overfitted amid preparing, which has been hypothetically clarified and demonstrated in past literary studies.⁴¹ The pseudo code for the Ada-boost algorithm that is used in the proposed network is explained hereunder.

Pseudo Code for Ada-Boost Algorithm
1	Inputs Samples Training Sets {x_i,y_i} where x = {x_1, x_2, x_3, x_5…….x_n} where n = no of input samples and y_i €(1,–1) where y_i is the label associated with x
2	Initialize D_k (i) = n
	Where D_k is Weight of Training Samples
3	For k = 1,2,3,…………….K
4	Train the weak classifier using the distribution D_k
5	Calculate the error function, e_k with respect to the function D(k)
6	e_k = P_r(h_k(x_k is not equal to y_k) where h_k is the hypothesis function
7	Choose α_k = 0.5 {ln(1 – e_k)/e_k} where α_k is the weight of the h_k,
8	Reinitialize the weight with D_k+1
9	Calculate the error function and repeat step 5
10	If error is less than e_k
11	Then output is calculated by H(x) = sign(∑α_k h_k)
12	End
13	End

Recurrent neural networks: an overview

As mentioned in Kermack and McKendrick,³⁷ in artificial neural network (ANN), the unseen layer of each neural network is combined with the unseen layers of other neural networks between the nodes. But according to recurrent neural network (RNN), the same hidden layer nodes are connected. One of the important characteristics of RNN is that it can efficiently learn the time series data because it has the ability to encrypt the earlier data into the process of learning in the existing concealed layer. In RNN methodology,⁴² the direct form of graphs can be formed by nodes along with its sequences. Hence dynamic behavior can be exhibited for time of sequences. This uses internal memory (state) for the process of sequences of input. So RNN utilizes past data for the prediction of future values. If the interval time between past data and current data for prediction is large in practical applications, then this methodology is not able to memorize past data significantly, so there still is a vanishing gradient problem,⁴³ hence the predicted outcomes are not satisfactory in some real-time scenario. To overcome this issue, RNN performance has been enhanced and LSTM network has been introduced.

LSTM: an overview

An LSTM network is the combination of RNN with LSTM units. The LSTM network comprises cell, input, output, and forget gates. Cells are denoted as LSTM recollection blocks to reminisce the values over the time intervals. Let “x_t” the unseen layer output is “h_t” and its former output is “h_t₋₁,” the cell input state is “C_t,” the cell output state is “G_t” and its former state is “G_t_–1,” the three gates' states are $j_{t}, T_{f}$ , and T_0. The structure of the LSTM cell indicates that both G_t and h_t are transmitted to the next neural network in RNN. LSTM combines the output of the previous unit with the current input state in which the output and forget gates are used to update the memory. The calculations of each block are estimated using the equations as follows.

The input gate is given as $j_{t} = θ (G_{l}^{i} . O_{t} + G_{h}^{i} . e_{t - 1} + s_{i}) .$ (11)

The forget gate is given as $T_{f} = θ (G_{l}^{f} . O_{t} + G_{h}^{f} . e_{t - 1} + s_{f}) .$ (12)

Output gate is calculated as $T_{o} = θ (G_{l}^{0} . O_{t} + G_{h}^{o} . e_{t - 1} + s_{o}) .$ (13)

Cell input is given as

where $G_{l}^{0}, G_{l}^{f}, G_{l}^{i}, G_{l}^{C}$ are the matrices of weight in input gates to the output layers, whereas $G_{h}^{i}, G_{h}^{f}, G_{h}^{o}, G_{h}^{C}$ denotes the weights in gate inputs to the hidden layers. Also $s_{i}, s_{f},$ $s_{o},$ s_C are the bias vectors and tanh is considered to be the hyperbolic function. Second, cell output state is calculated and it is given as follows:

Also hidden layer output is calculated and is given as $e_{t} = T_{o} * t a n h (T_{C}) .$ (16)

Numerous LSTM units are concentrated in building the single layer of LSTM. To each unit, the operations are computed in one-time index and the output is handed over to the next LSTM unit. As the data increase,³⁴ the layers of LSTM also increase, which may degrade the performance of the LSTM network. Hence this article introduces the hybrid boosted LSTM models.

Ensembled boosted LSTM models

Figure 3 shows the ensembled boosted LSTM models used for predicting the influential nodes in the networks. We normally use the Ada-boost to train the LSTM predictors for every iteration of n, and all n-LSTM predictors are combined to form one strong predictor.

FIG. 3.

Architecture for the proposed LSTM model used for influential node prediction. LSTM, long short-term memory.

The proposed boosted LSTM models works in six major steps. First, LSTM network is trained with the group of input centrality features. The output expression from the LSTM predictor is calculated by Equation (16). The threshold predictor value is set (in accordance with thumb rule) and compared with the output from the LSTM predictor to calculate the error function. The error function can be calculated by the following expression: $e (k) = P (A c t u a l) - P (O b t a i n e d) .$ (17)

If the error value is large, Ada-boost will boost the input weights of the LSTM predictor. At the next iteration, the predicted output will be obtained from boosted weights of LSTM cells. When the error reaches to lesser value after “n” iterations, then all outputs are ensembled to form the new predicted output. Figure 5 shows the pseudo code for complete working of the proposed algorithm.

Pseudo code of the proposed BED-NETS predictor
1	Input Training Sets: x(i) = (x1,x2,x3,x4,x5….) where x(i) is Input Centrality features
2	Outputs: Influential Nodes (z)
3	Initialize D_k = n where D_k is weights of LSTM predictors
4	For m = 1,2,.3,…………….N
5	Train the LSTM classifier using the Input Samples
6	Get the hypothesis with error function with respect to D_k
7	Error function is calculated at each stage which is then weighted D_k
8	Choose α_k = 0.5 {ln(1–e_k)/e_k}-network parameter mechanism
9	Update the D_k+1 (i)
10	Calculate the error function and repeat step 4
11	If error is less than e(k) \|\| error is zero
12	Then Ensemble all the outputs E(x) = E(x) = sign(∑α_k^*e(k)/α_k
13	Else
14	Go to step 4
15	End
16	End

Experimental Results

To verify the working procedure of the EC-BED-NETS framework, we carried out the experimentation to perceive the exactness of the proposed models with the preparation and testing data from the different benchmark networks (mentioned in Data sets section) along with SIR propagation models. The computer configuration used for conducting the experiments is based on i5 CPU (eighth generation) with 16GB RAM, 2TB hard disk and 8 GB NVIDIA GPU. The hyperparameters used for tuning the proposed algorithms are listed in Table 3.

Table 3.

List of hyperparameters used for training

SI. no	Hyperparameters used	Specifications
01	Training data sets	4367
02	Testing data sets (20%)	225
03	No. of epochs used	400
04	No. of LSTM cells used	10
05	Batch size	40
06	Learning rate	0.001
07	Decay in leaning	0
08	Data dimension	14 attributes
09	Optimizer and activation function	Adam and Softmax
10	Classes label	02
11	Processing time	14–16 hours
12	RNN_Dropout	0.3

LSTM, long short-term memory.

The complete model is developed using Keras library that runs on TensorFlow as back end. The performance metrics of the proposed models are calculated by the following mathematical expressions: $A c c u r a c y = \frac{T o t a l V a l u e s D e t e c t e d}{T o t a l T e s t i n g D a t a} .$ (19) $P r e c i s i o n = \frac{T R u e P o s i t i v e}{T r u e P o s i t i v e + T r u e N e g a t i v e} .$ (20)

R e c a l l = \frac{T r u e N e g a t i v e}{T r u e P o s i t i v e + T r u e N e g a t i v e} .

(21)

As mentioned in table, ∼4567 data were collected from various benchmarks, in which 70% were used for training and 30% were used for testing. The results with various scenarios of IRs are presented in the following section.

Tables 4 to 9 present the performance analysis of the EC-BED-NETS for the various benchmarks and WhatsApp social networks. The accuracy of prediction of the EC-BED-NETS has maintained the accuracy of 0.95 to 0.93 for the various benchmarks with the changes in the IRs and labels. It is found that error of prediction with different rates of infection is low as 0.02, and to prove the efficiency of the proposed algorithm, we have carried out the experimentations with the same network scenario on various existing deep learning algorithms such as E-LSTM,⁴³ stacked LSTM,⁴⁴ and other machine learning algorithms such as Random Forest, Support Vector Machines, and K-Nearest Neighbors (KNN) algorithms.

Table 4.

Performance of the proposed Enhanced Centralities Based on Boosted Ensembled Deep Learning Networks for different benchmarks at infection rate: 0.1

Data set details	Performance metrics (@IR: 0.1 with 10 labels)
Data set details	Accuracy	Precision	Recall
LFR-1000	0.954	0.933	0.92
LFR-200	0.945	0.921	0.92
Dolphins	0.955	0.932	0.912
Net Science	0.950	0.92	0.922
Copperfield	0.955	0.93	0.923
Chicago	0.952	0.921	0.92
Euro road	0.953	0.930	0.92
PGP	0.955	0.921	0.912
WhatsApp	0.960	0.912	0.922

IR, infection rate.

Table 5.

Performance of the proposed Enhanced Centralities Based on Boosted Ensembled Deep Learning Networks for different benchmarks at infection rate: 0.1 @ increase in labels

Data set details	Performance metrics (@IR: 0.1 with 20 labels)
Data set details	Accuracy	Precision	Recall
LFR-1000	0.950	0.930	0.920
LFR-200	0.950	0.925	0.920
Dolphins	0.940	0.920	0.915
Net science	0.950	0.920	0.920
Copperfield	0.950	0.92	0.924
Chicago	0.950	0.91	0.921
Euro road	0.945	0.90	0.925
PGP	0.950	0.920	0.920
WhatsApp	0.965	0.920	0.920

Table 6.

Performance of the proposed Enhanced Centralities Based on Boosted Ensembled Deep Learning Networks for different benchmarks at infection rate: 0.15 @10 labels

Data set details	Performance metrics (@IR: 0.15 with 10 labels)
Data set details	Accuracy	Precision	Recall
LFR-1000	0.940	0.915	0.900
LFR-200	0.940	0.910	0.890
Dolphins	0.930	0.910	0.90
Net science	0.945	0.920	0.900
Copperfield	0.940	0.924	0.90
Chicago	0.945	0.920	0.89
Euro road	0.945	0.910	0.88
PGP	0.943	0.900	0.89
WhatsApp	0.960	0.920	0.91

Table 7.

Performance of the proposed Enhanced Centralities Based on Boosted Ensembled Deep Learning Networks for different benchmarks at infection rate: 0.15 @ increase in labels

Data set details	Performance metrics (@IR: 0.15 with 20 labels)
Data set details	Accuracy	Precision	Recall
LFR-1000	0.940	0.910	0.90
LFR-200	0.935	0.89	0.885
Dolphins	0.920	0.90	0.89
Net science	0.925	0.92	0.89
Copperfield	0.930	0.91	0.90
Chicago	0.935	0.910	0.89
Euro road	0.940	0.90	0.88
PGP	0.940	0.90	0.89
WhatsApp	0.960	0.920	0.91

Table 8.

Performance of the proposed Enhanced Centralities Based on Boosted Ensembled Deep Learning Networks for different benchmarks at infection rate: 0.2 @10 labels

Data set details	Performance metrics (@IR: 0.2 with 10 labels)
Data set details	Accuracy	Precision	Recall
LFR-1000	0.940	0.890	0.89
LFR-200	0.940	0.900	0.890
Dolphins	0.930	0.900	0.90
Net science	0.945	0.910	0.900
Copperfield	0.930	0.900	0.90
Chicago	0.935	0.910	0.89
Euro road	0.945	0.900	0.88
PGP	0.943	0.880	0.89
WhatsApp	0.960	0.915	0.91

Table 9.

Performance of the proposed Enhanced Centralities Based on Boosted Ensembled Deep Learning Networks for different benchmarks at infection rate: 0.2 @20 labels

Data set details	Performance metrics (@IR: 0.2 with 20 labels)
Data set details	Accuracy	Precision	Recall
LFR-1000	0.89	0.885	0.88
LFR-200	0.88	0.88	0.87
Dolphins	0.885	0.875	0.88
Net science	0.89	0.890	0.89
Copperfield	0.90	0.880	0.88
Chicago	0.920	0.890	0.88
Euro road	0.910	0.90	0.87
PGP	0.920	0.89	0.88
WhatsApp	0.950	0.880	0.89

Figures 4 to 6 show the comparative analysis between the performance of the proposed EC-BED-NETS and the other learning models. In all the three analyses, the proposed algorithms have maintained the accuracy of 0.95 to 0.94 with increasing IRs, and it clearly shows that the boosting ensemble characteristics implemented in LSTM have proved to be more advantageous, whereas the other existing models have decaying performance such as E-LSTM as 0.9 to 0.8, stacked LSTM as 0.85 to 0.8, SVM with 0.85 to 0.79, KNN with 0.75 to 0.68, and RF as 0.65 to 0.58. Moreover, to authenticate the performance of the proposed technique, we compared the proposed model with other learning models in the integration of more labels and IRs.

FIG. 4.

Comparative accuracy analysis of different learning models to determine the influential nodes at the IR of 0.1. IR, infection rate.

FIG. 5.

Comparative accuracy analysis of different learning models to determine the influential nodes at the IR of 0.2.

FIG. 6.

Comparative accuracy analysis of different learning models to determine the influential nodes at the IR of 0.5.

Figure 7 shows the validation analysis of the different learning models with different benchmarks at a high IR >0.5 but equal to 1 (random value is set to 0.8). For validation, we have used the networks with larger edges and social network. For the networks, the accuracy of predicting the influential nodes was maintained from 0.95 to 0.94, whereas the other algorithms have higher rate of differences in the higher order IR and random increase in the labels. The following observation can be made from the mentioned figures: (1) When the labels have least number with high IR (0.8), the proposed algorithms and other learning models exhibit the characteristics with less variations such as the proposed algorithms with 0.95, E-LSTM with 0.88, stacked LSTM with 0.85, and other machine learning models ranging from 0.80 to 0.75. But as the labels increase, the proposed framework exhibits lesser deviations (0.95–0.94), whereas the other algorithms exhibit large deviations. (2) Inclusion of enhanced centrality and boosted structure in the deep learning models has shown greater performances in predicting the influential nodes in all scenarios of testing and validation.

FIG. 7.

Validation comparison between the different learning models to determine the influential nodes (a) LFR-1000 benchmarks; (b) Euro roads; (c) PGP; (d) Chicago; (e) WhatsApp networks.

Thus the proposed EC-BED-NETS frameworks have exhibited better performance in all small scale networks, large scale networks, and even in social networks.

Discussion and Conclusion

To identify the influential hubs in complex networks, traditional algorithms and conventional ML techniques suffer from limitation of performance, scalability, and flexibility in various real-time propagation scenarios. To overcome the mentioned limitation, this article discusses about the unique methodology using hybrid deep learning models. The proposed EC-BED-NETS framework has incorporated several reliant features such as enhanced centrality feature extraction and hybrid ensemble boosted LSTM networks, and uses the SIR propagation model (COVID-19) to differentiate the influential and noninfluential nodes. To assess the importance of our proposed system, this article performed the arrangement of tests in nine genuine world scenarios. The experiment results demonstrate that the performance of the proposed framework has outperformed that of other machine learning and deep learning models in recognizing the influential nodes in complex and social networks. The average prediction of 95.5% is maintained in all real-time scenarios and proves that the proposed framework is the best solution to recognize the infectious nodes in the networks. The prediction of the number of influential nodes is a complex problem, but it has been significantly solved by the proposed EC-BED-NETS framework. But in future usage of bio-inspired optimization algorithms, the performance of the proposed model is still to be improved, and testing in various real-time propagation scenarios will also increase the reliability of the proposed algorithms.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

Funding Information

No funding agency is involved in our research.

Abbreviations Used

References

Bond

, Fariss

, Jones

, et al. A 61-million-person experiment in social influence and political mobilization. Nature. 2012; 489:295.

Sheikhahmadi

, Nematbakhsh

, Zareie

. Identification of influential users by neighbors in online social networks. Physica A. 2017; 486:517–534.

Cheung

, Luo

, Sia

, Chen

. Credibility of electronic word-ofmouth: Informational and normative determinants of on-line consumer recommendations. Int J Electron Commer. 2009; 13:9–38.

Zareie

, Sheikhahmadi

, Khamforoosh

. Influence maximization in social networks based on TOPSIS. Expert Syst Appl. 2018; 108:96–107.

Bae

, Kim

. Identifying and ranking influential spreaders in complex networks by neighborhood coreness. Physica A. 2014; 395:549–559.

Wang

, Du

, Fan

, Xing

. Ranking influential nodes in social networks based on node position and neighborhood. Neurocomputing. 2017; 260:466–477.

C-H

, Abd-El-Haliem

, Bozkurt

, et al. A complex NLR signalling network mediates immunity to diverse plant pathogens. Proc Nat Acad Sci USA. 2016; 114:8113–8118.

Szklarczyk

, Morris

, Cook

, et al. The STRING database in 2017: Quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 2017; 45:D362–D368.

, Chen

, He

D-R

. Complex network properties of Chinese power grid. Int J Mod Phys B. 2004; 18:2599–2603.

10.

Candia

Advertising and irreversible opinion spreading in complex social networks. Int J Mod Phys C. 2009; 20:799–815.

11.

Pal

, Biswas

, Maharatna

, Chakrabarti

Architecture for complex network measures of brain connectivity. In Proceedings of IEEE International Symposium Circuits System (ISCAS), Baltimore, MD, USA, May 2017, pp. 1–4, DOI:10.1109/ISCAS.2017.8050239.

12.

Freeman

LC.

A set of measures of centrality based on betweenness. Sociometry, 1977; 40:35–41.

13.

Freeman

LC.

Centrality in social networks conceptual clarification. Soc Netw. 1978; 1:215–239.

14.

Sabidussi

The centrality index of a graph. Psychometrika. 1966; 31:581–603.

15.

Kitsak

, Gallos

, Havlin

, et al. Identification of influential spreaders in complex networks. Nat Phys. 2010; 6:888.

16.

Bonacich

Factoring and weighting approaches to status scores and clique identification. J Math Sociol. 1972; 2:113–120.

17.

Bonacich

, Lloyd

. Eigenvector-like measures of centrality for asymmetric relations. Soc. Netw. 2001; 23:191–201.

18.

, Zhang

, Philip

, et al. Deep dynamic network embedding for link prediction. IEEE Access, May 25, 2018, PP. 1–1.

19.

Chung

, Gulcehre

, Cho

, Bengio

Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555v1 [cs.NE] 11 Dec 2014.

20.

Chen

, Zhang

, Xu

, et al. E-LSTM-D: A deep learning framework for dynamic network link prediction. arXiv:1902.08329v1 [cs.SI] 2019.

21.

Fan

, Zeng

, Ding

, et al. Learning to identify high betweenness centrality nodes from scratch: A novel graph neural network approach. CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, Nov. 3–7, 2019, pp. 559–568.

22.

Zareiea

, Sheikhahmadib

, Jalilic

, et al. Finding influential nodes in social networks based on neighborhood correlation coefficient. Knowl Based Expert Syst. 94:105580.

23.

, Xu

, Bhowmick

, et al. DISCO: Influence Maximization Meets Network Embedding and Deep Learning. Social and Information Networks (cs.SI), arXiv:1906.07378v1 [cs.SI], June 2019.

24.

Mao

, Xiao

. A comprehensive algorithm for evaluating node influences in social networks based on preference analysis and random walk. Hindawi

Complex

. 2018, Article ID 1528341, pp. 1–16. DOI: 10.1155/2018/1528341.

25.

Nurek

, Michalski

. Combining machine learning and social network analysis to reveal the organizational structures. Appl Sci. 2020; 10:1699.

26.

Farooq

. Detection of influential nodes using social networks analysis based on network metrics. 2018 International Conference on Computing, Mathematics and Engineering Technologies—iCoMET, 2018.

27.

Devika

, Subramaniyaswamy

. A semantic graph-based keyword extraction model using ranking method on big social data. Wireless

Netw

. 2019:1–13, DOI:10.1007/s11276-019-02128-x.

28.

Zhao

, Jia

, Huang

, et al. A machine learning based framework for identifying influential nodes in complex networks. DOI:10.1109/ACCESS.2020.2984286

29.

Barabási

, Posfai

Network science. Cambridge, United Kingdom: Cambridge University Press 2016.

30.

Lancichinetti

, Fortunato

. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E. 2009; 80:016118.

31.

Lusseau

, Schneider

, Boisseau

, et al. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol. 2003; 54:396–405.

32.

Newman

ME.

Finding community structure in networks using the eigenvectors of matrices. Phys Rev E. 2006; 74:036104.

33.

Duch

, Arenas

. Community detection in complex networks using extremal optimization. Phys Rev E. 2005; 72:027104.

34.

Kunegis

Konect. The koblenz network collection. In: Proceedings of the 22nd International Conference on World Wide Web, ACM, Rio de Janeiro Brazil, May, 13–17, 2013, pp. 1343–1350.

35.

Watts

, Strogatz

. Collective dynamics of ‘small-world’networks. Nature. 1998; 393:440.

36.

Katz

A new status index derived from sociometric analysis. Psychometrika. 1953; 18:39–43.

37.

Kermack

, McKendrick

. A contribution to the mathematical theory of epidemics. Proc Roy Soc London A, Containing Papers Math Phys Character. 1927; 115:700–721.

38.

Yeghikyan

. Modelling the coronavirus epidemic in a city with

Python

. https://towardsdatascience.com/modelling-the-coronavirus-epidemicspreading-in-a-city-with-python-babd14d82fa2, (Last accessed July 20, 2021).

39.

Freund

, Schapire

. A decision-theoretic generalization of on-line learning and an application to boosting. J Comp Syst Sci. 1997; 55:119–139.

40.

Zhang

, Ni

, Zhang

, et al.

Research and application of AdaBoost Algorithm based on SVM.

2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 2019, pp. 662–666.

41.

Shu

, Wang

. An Improved Adaboost Algorithm based on uncertain functions: 2015 International Conference on Industrial Informatics—Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), Dec. 3–4, 2015, Wuhan, China.

42.

Ravuri

, Stolcke

. A comparative study of recurrent neural network models for lexical domain classification. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 20–25, Shanghai, China.

43.

Liu

, Cai

, Wang

, et al. An ensemble model based on adaptive noise reducer and over-fitting prevention LSTM for Multivariate Time Series Forecasting. IEEE Access (Vol. 7), IEEE,, 2019; 26102–26115.

44.

Wei

, Pan

, Hu

, et al. Identifying influential nodes based on network representation learning in complex networks. PLoS ONE. 2018; 13: e0200091.