DMHANT: DropMessage Hypergraph Attention Network for Information Propagation Prediction

Abstract

Predicting propagation cascades is crucial for understanding information propagation in social networks. Existing methods always focus on structure or order of infected users in a single cascade sequence, ignoring the global dependencies of cascades and users, which is insufficient to characterize their dynamic interaction preferences. Moreover, existing methods are poor at addressing the problem of model robustness. To address these issues, we propose a predication model named DropMessage Hypergraph Attention Networks, which constructs a hypergraph based on the cascade sequence. Specifically, to dynamically obtain user preferences, we divide the diffusion hypergraph into multiple subgraphs according to the time stamps, develop hypergraph attention networks to explicitly learn complete interactions, and adopt a gated fusion strategy to connect them for user cascade prediction. In addition, a new drop immediately method DropMessage is added to increase the robustness of the model. Experimental results on three real-world datasets indicate that proposed model significantly outperforms the most advanced information propagation prediction model in both MAP@k and Hits@K metrics, and the experiment also proves that the model achieves more significant prediction performance than the existing model under data perturbation.

Introduction

Information propagation is a fundamental and ubiquitous event in our daily lives,¹ such as viral marketing,² online advertising,³ fake news controlling,⁴ campaign strategy,⁵ and epidemic prevention.^6,7 Therefore, the information propagation prediction techniques, which aims at identifying potential users of information dissemination, are urgently needed to solve the problems of current online social media application scenarios.^8,9 The information diffusion prediction can be described as: given a sequence of information cascades, the diffusion model aims at estimating the likelihood of spreading information among other potential users and predict the ranking of these users, as shown in Figure 1. However, due to the complexity and large-scale nature of the data, predicting information propagation in social networks is a challenging task.^10,11

FIG. 1.

Information propagation prediction.

The problem of diffusion prediction has attracted extensive research attention. Previous studies always concentrate on sequence or structure of cascades itself, ignoring social structures that are not visible in cascades but have a significant impact on users’ behavior,^12–15 leaving them unable to exploit inactive users. Recurrent neural networks (RNNs) are used to extract sequential features of diffusion events, while convolutional neural networks (CNNs) have been used to mine meaningful features from the diffusion network.¹⁶ Furthermore, graph neural networks (GNNs) have also been widely used for information diffusion prediction, as they can capture the structural information of the diffusion network.¹⁷ Xue et al.¹⁸ compared the performance of the SIR model, logistic regression, and deep neural networks on predicting the spread of COVID-19 topic on Twitter. They found that deep neural networks outperformed the traditional methods in terms of prediction accuracy. Similarly, Li et al.¹⁹ compared the effect of various deep learning-based methods, including RNNs, CNNs, and GNNs, on predicting the popularity of tweets on Twitter. They found that GNNs outperformed the other models in terms of both prediction robustness and accuracy. Qiu et al.²⁰ proposed a GNN framework to capture users’ potential feature of user social networks for predicting diffusion. Yang et al.²¹ proposed reinforced recurrent networks with structural context (FOREST) that combines graph convolutional network (GCN) and gated recurrent unit (GRU) to jointly extract the cascading contents and social network feature representations. Recently, Yuan et al. proposed dynamic heterogeneous graph convolutional network (DyHGCN),²² which developed heterogeneous graphs to collectively learn user interactions and social relation. Wang et al.²³ developed an dynamic diffusion variational autoencoder (DyDiff-VAE) to predict the diffusion likelihood by federating information from cascade sequence and forwarding user content.

Although the above methods are able to exploit both cascading sequences and user–social relationships to some extent, they still suffer from some limitations. First, the existing methods ignore the global dependencies of cascades and users, which is insufficient to characterize their dynamic interaction preferences, limiting the performance of prediction. Furthermore, the existing method model is poor in solving the problem of model robustness. Once the model is disturbed, its performance will significantly decrease.

To address above issues, we propose a novel information propagation predication model named DropMessage Hypergraph Attention Network (DMHANT) for information propagation prediction. Specifically, we not only takes advantage of the friendship relationship by using GNN, but also construct a hypergraph based on the cascade sequence. To obtain user preferences dynamically, we divide the diffusion hypergraph into multiple subgraphs according to the time stamps, develop hypergraph attention networks to explicitly learn complete interactions, and adopt a gated fusion strategy to connect the subgraphs embedding. Furthermore, due to the prominence of self-attention mechanisms in dealing with sequential tasks,^24–26 we choose two Multi-Head Self-Attention modules to efficiently capture the friendship and subgraphs feature interactions within cascades. Finally, we can obtain the predictive representation through a postfusion strategy. In addition, a new drop immediately method DropMessage is added to increases the robustness of the model.

The main contributions of our work can be summarized as follows: •

To take full advantages of global dependencies of users and cascades, we construct a hypergraph based on the cascade sequence and analyze the hypergraph to deeply learn the dynamic correlation between users and cascades.

•

While learning the cascade sequence, we introduce the social relationship graph and design a hypergraph attention network to help the model dynamically learn the importance of different parts of the hypergraph and social relationship graph data and adjust the weights accordingly to improve the model’s performance and generalization ability.

•

To enhance the robustness of the model, we add a new drop immediately method DropMessage which directly performs dropping operations on message matrix.

Related Work

Recently, many technologies have been proposed to tackle information propagation prediction, including independent cascade (IC)-based, embedding-based, and deep-learning-based approaches.

Independent Cascade-based approach

Lots of cascade propagation models are based on the assumptions from the IC approaches, and some extensions have been proposed by incorporating more information, such as continuous timestamps,²⁷ user profiles,²⁸ and the influence of nodes. Chen et al.²⁹ proposed a modification of the IC model called the weighted IC model, which takes into account the strength of the connections between nodes in the network. Several techniques have explored the integration of additional information in cascade modeling. Wang et al.³⁰ introduced an emotion-based IC model to capture dynamics of information spread in online social media, specifically focusing on the context of emotion contagion.

However, pairwise independence oversimplifies the intricate nature of diffusion process, leading to the problem of poor performance when applied to real datasets.

Embedding-based approach

Embedding-based approaches can fully utilize representation learning techniques. Xu et al.³¹ analyzed the main factors influencing users’ retweeting behavior on Twitter and trained classification models using publisher features, posting content features, and retweeter features and found that the social network characteristics of publishers and retweeters were most important in influencing retweeting behavior. Li et al.³² proposed an engagement ranking model based on adoption probability, which estimates the adoption probability of each user for different topic tags and gives actual participants a higher dynamic weights in learning to predict which users will be activated to generate propagation behavior early on. Wang et al.³³ introduced a model named Diffusion-Network Representation Learning (DNRL) which learns user representations simultaneously from social network and diffusion sequence.

Nevertheless, embedding-based approaches did not consider the modeling of sequential information in cascades.

Deep learning-based approach

With successful applications in various areas of deep learning, many deep learning-based methods have shown remarkable potential in modeling information diffusion. Wang et al.³⁴ proposed an attentional recurrent neural network to improve propagation prediction accuracy by incorporating social network structure information into propagation sequence modeling. Zhou et al.³⁵ introduced a hierarchical cascade framework that combines multiscale modeling and user representation learning.

Recently, some graph networks have been proposed to effectively capture the dependencies in cascading sequences of information diffusion. Qiu et al.²⁰ used graph attention neural networks in propagation behavior prediction and proposed the DeepInf framework, which takes the user’s egocentric network as input and fuses propagation features into convolutional GNNs and attention networks. Recently, Feng et al.³⁶ introduced a hypergraph neural network that utilizes a Chebyshev expansion of the simple graph Laplacian. Bai et al.³⁷ then introduced attention mechanism to hypergraph. Xie et al.³⁸ introduced an independent asymmetrical embedding model that aims to embed each individual into a single latent influence space and multiple latent susceptibility spaces. Wang et al.³⁹ propose a cascade-enhanced (CE)-GCN, effectively exploiting collaborative patterns over cascades to improve the prediction of future propagation.

These approaches typically ignore the global dependencies of users and cascades, which is insufficient to characterize their dynamic interaction preferences, thus limiting the performance of prediction. Moreover, the existing method model is poor in solving the problem of model robustness.

Preliminaries

Construction of social graph

As users’ propagation behavior is consistently influenced by personal interests, previous behavior and the external environment, we will construct the friendship graph and diffusion hypergraphs firstly which are essential components for predicting diffusion.

Social graph can be represented as $G_{F} = (U, E)$ , where U denotes set of users in social graph and E denotes the set of edges that represents social relationship between users.

Construction of diffusion hypergraphs

To enhance the global dependencies of users and to facilitate taking the time factor into account, we construct the cascade sequence as a hypergraph, as shown in Figure 2. The cascades sequence is represented as $C = {C_{1}, C_{2}, \dots, C_{M}}, | C | = M$ , where M is the number of cascades. The $C = {c_{1}, c_{2}, \dots, c_{L}}$ , where L denotes the length of each cascade and $c_{l} = {(u_{i}^{l}, t_{i}^{l}) | u_{i}^{l} \in U}$ means the ith user is activated by information l at time $t_{i}^{l}$ .We consider each C_m as a hyperedge and each user u as a point to construct the hypergraph $G_{D}$ . The cascades are split into T subsets based on timestamps to construct new diffusion hypergraphs $G_{D} = {G_{D}^{t} | t = 1, 2, \dots, T}, G_{D}^{t} = (U^{t}, E^{t})$ , where $U^{t}$ denotes set of user and $E^{t}$ denotes hyperedges.

FIG. 2.

Hypergraph construction process.

The node–hyperedge relationship of each subgraph is independent, which means that the hyperedges that appeared in the previous hypergraph will not appear in other hypergraphs. If u_i participates in e_m during tth timestamp, the connection between node u_i and e_m will only exist in hypergraph $G_{D}^{t}$ .

Problem formulation

Based on the above introductions, we can describe the prediction task as: given a friendship graph $G_{F} = (U, E)$ which contains a user set $U = {u_{1}, u_{2}, \dots, u_{n}}, | U | = N$ and an observed diffusion sequence $c_{l} = {(u_{i}^{l}, t_{i}^{l}) | u_{i}^{l} \in U}$ , our aim is to estimate the likelihood ${\hat{y}}_{u_{j}, m}$ that the user u_j will participate in c_m in the next step and identify the next diffusion user by ranking and analyzing the infection probabilities of all candidates.

The Proposed Model

In this section, we will introduce DMHANT in detail. The overall architecture of our model contains four major components, as shown in Figure 3: (1) GCN module that learns users’ social relationship. (2) Sequential Hypergraph Attention Network (HANT) module which obtains interaction-based user and cascade embeddings. (3) Multi-Head Self-Attention module that captures the context interaction within the cascade. (4) Interaction Prediction module that calculates infection probability of candidates. In the following subsections, we will provide an elaborate introduction to each component of our framework.

FIG. 3.

Four modules of DMHANT: (1) users’ social relation is learned by GCN Module; (2) user dynamic interaction is learned by sequential HANT Module; (3) contextual interaction information within a cascade is learned by Multi-Head Self-Attention Module; (4) the final results are obtained by Interaction Prediction module. DMHANT, DropMessage Hypergraph Attention Network; GCN, graph convolutional network; HANT, Hypergraph Attention Network.

GCN module

The social relationship networks can express users’ static dependency. By introducing users’ social networks, it can facilitate the extraction of effective information among users and also alleviate the cold start problem to some extent. We can further obtain user preferences by learning their neighborhood features even if no user participated in a cascade sequence.

Considering the relative stability of the user’s social relationship, we use a GCN to learn user’s social network structure. Given a social relationship graph $G_{F} = (U, E)$ , the propagation of each GCN layer can be defined as: $X_{F}^{l + 1} = σ ({\tilde{D_{F}}}^{- \frac{1}{2}} \tilde{A_{F}} {\tilde{D_{F}}}^{- \frac{1}{2}} X_{F}^{l} W_{F})$ (1)

Where the initial user embeddings $X_{F}^{0} \in R^{N \times d}$ is randomly initialized from normal distribution, N is the number of users, and d is the dimension of embedding. $W_{F}$ is the trainable weight matrix; σ denotes the ReLU activation function. $\tilde{A_{F}} = A_{F} + I$ , where $A_{F}$ is adjacent matrix and $D_{F} = \sum_{j} A_{i j}$ . After passing the L-layer GCN, we finally get the static embedded post features of users $X_{F}$ .

HANT module

As users’ friendships graph may not accurately capture their global dependencies, we additionally construct sequential hypergraphs based on diffusion cascades in our framework and adopt sequential attention networks to effectively learn the dynamic timing characteristics and user interactions at the cascade level. The process of HANT is shown in Figure 4. All modules contain two aggregations.

FIG. 4.

Feature learning process for hypergraphs: (1) Nodes-to-hyperedge aggregation; (2) hyperedges-to-node aggregation; (3) Sequential HGATs with Fusion.

Nodes-to-hyperedge aggregation

The first aggregation is nodes-to-hyperedge aggregation, which aims to learn the representation of hyperedge. Given a hypergraph $G_{D} = {G_{D}^{t} = (U^{t}, E^{t}) | t = 1, 2, \dots, T}$ , we get the embedding $E_{j, t}$ of hyperedge $e_{j}^{t}$ by aggregating the initial user representation $x_{i, t}$ of all its connected nodes $u_{i}^{t}$ , formally: $E_{j, t}^{l + 1} = σ (\sum_{u_{i}^{t} \in e_{j}^{t}} α_{i j}^{t} W_{1} x_{i, t}^{l})$ (2)

Where σ denotes an activation function ReLU, and $W_{1} \in R^{d \times d}$ is the trainable weight matrix. $α_{i j}^{t}$ is attention coefficient of $u_{i}^{t}$ in $e_{j}^{t}$ .

However, the traditional method of calculating attention coefficients requires prior knowledge of each vector feature. Here we cannot know the feature representation of the hyperedges in advance. Meanwhile, subhypergraph only contains the user cascade interaction information for the current time interval, which aims to learn the short-term preferences of users but will inevitably result in· information loss. Therefore, since the root node can partially reflect the content of the cascade, we compute the attention scores of other nodes by using the root features replacing the hyperedges for each hypergraph, that is: $α_{i j}^{t} = \frac{\exp {LeakyReLU (a ([W_{1} x_{i, t}^{l} | | W_{1} r_{j}^{l}]))}}{\sum_{u_{p}^{t} \in e_{j}^{t}} \exp {LeakyReLU (a ([W_{1} x_{p, t}^{l} | | W_{1} r_{j}^{l}]))}}$ (3)

Where $r_{j}^{l}$ is the representation of root user of $e_{j}^{t}$ at layer l. LeakyReLU is an activation function, a linear mapping with shared parameters. $W_{1}$ augments the features of the vertices; $[\cdot | | \cdot]$ concats the transformed features for vertices $x_{i, t}^{l}$ and $r_{j}^{l}$ . $a (\cdot)$ is a feedforward neural network that maps the stitched high-dimensional features to a real number.

Hyperedges-to-node aggregation

On the basis of obtaining representations of hyperedges, we choose to train another aggregator to integrate all hyperedges $E_{i}^{t}$ participated by $u_{i}^{t}$ to learn the representation $x_{i, t}$ of $u_{i}$ during the tth time interval. Here we also take the direct summation average weighting for the aggregation operation and all hypergraphs share the same weight matrix, which can be formulated as: $x_{i, t}^{l + 1} = σ (\sum_{e_{j}^{t} \in E_{i}^{t}} W_{2} E_{j, t}^{l + 1})$ (4)

Where σ represents an activation function. After passing the L-layer HANT, we finally get the dynamic embedded post features of users $X_{m, t}$ .

Sequential HGATs with fusion

According to the previous definition of subhypergraphs, each hypergraph only includes information at short intervals, so learning a single hypergraph cannot accurately describe the dynamic changes in user preferences. Therefore, we develop a fusion strategy to connect them in a chronological order. The propagation of negative news can be viewed as a Markov process,^40–43 meaning that the state of the next moment up to the current state is related to the state of the previous moment, independent of the state of the previous moment. Therefore, when calculating the features of each subgraph, we weight the initial representation of each user and the output of the tth time interval using the attention mechanism as the input to the next interval, formally: $x_{i, t + 1} = g_{R_{1}} x_{i, t}^{L_{D}} + (1 - g_{R_{1}}) x_{i, t}^{0}$ (5) $g_{R_{1}} = \frac{\exp (X_{i, t + 1}^{L_{D}'})}{\exp (X_{i, t + 1}^{L_{D}'}) + \exp (X_{i, t + 1}^{'})}$ (6) $X_{i, t + 1}^{L_{D}'} = W_{X_{1}}^{T} σ (W_{R_{1}} x_{i, t}^{L_{D}})$ (7) $X_{i, t + 1}^{'} = W_{X_{1}}^{T} σ (W_{R_{1}} x_{i, t}^{0})$ (8)

Where σ represents the activation function ReLU, $x_{i, t}$ is the initial embedding of user $u_{i}^{t}$ , $x_{i, t}^{L_{D}}$ is the output embedding at t interval, and $W_{R_{1}}$ and $W_{Z_{1}}^{T}$ are the transformation matrix. We finally get the dynamic embedded post features of users X_D after T steps.

DropMessage

GNNs are effective tools for graph representation learning but currently have some limitations such as nonrobustness. Here we adopt a novel random dropping strategy called DropMessage, which performs dropping operations directly on the message matrix, allowing for selective removal of a certain fraction of messages during GNN computations.

Message Matrix

A basic idea of existing GNN models is to adopt the message-passing framework, where each node receive messages from its neighbors and simultaneously transmit messages to its neighbors. When applying message-passing GNNs on the graph, we gather all the propagated messages into a message matrix $M = {m_{1}, \dots, m_{k}} \in R^{k \times c}$ , where $m_{i}$ is a message propagated between nodes, c denotes the dimension number of the messages, and k is the total number of messages propagated on the graph. The message-passing phase runs for L time steps and is defined in terms of message functions f₁ and vertex update functions U_t, during the message-passing phase, hidden states $h_{i}^{l}$ at each node in the graph are updated based on messages $m_{i}^{t + 1}$ according to $m_{i}^{l + 1} = \sum_{j \in N (i)} f_{l} (h_{i}^{l}, h_{j}^{l}, e_{i j})$ (9) $h_{i}^{l + 1} = U_{l} (h_{i}^{l}, m_{i}^{l + 1})$ (10)

Where $j \in N (i)$ is the neighbors of i in graph G. The message function f_l, vertex update functions $U_{l}$ are learned differentiable functions. Then we get message matrix $M \in R^{k \times c}$ . In particular, in the message matrix M, each row corresponds to a message propagated on a directed edge; therefore, the row number k of M is equal to the directed edge number in the graph.

DropMessage Method

Traditional Dropout method drops the elements

$X_{drop} = {X_{i, j} | ε_{i, j} = 0}$ in the feature matrix X, equivalent to masking elements $M_{drop} = {M_{i, j} | source (M_{i, j}) \in X_{drop}}$ in the message matrix M, where $source (M_{i, j})$ indicates which element in the feature matrix that $M_{i, j}$ corresponds to.

DropMessage strategy performs random masking directly on the message matrix M. They can be expressed as: $X_{i, j} = ε X_{i, j}$ (11) $M_{i, j} = ε M_{i, j}$ (12)

Where $ε \sim Bernoulli (1 - δ)$ and δ is dropping rate. According to the definition above, we come to a conclusion that DropMessage strategy conducts finest-grained masking on M, which makes it more flexible and can maintain the diversity of messages and further increase the robustness of the model. From the above description, we can regard Dropout as a special form of DropMessage to some extent. The difference between two methods is shown in Figure 5.

FIG. 5.

The difference between DropMessage and Dropout: Dropout drops the elements in the feature matrix, while DropMessage strategy directly performs random dropping on the message matrix.

Multi-Head Self-Attention module

Graph representation-based learning alone cannot capture contextual interaction information within a cascade, we adopt two Multi-Head Self-Attention modules to efficiently extract the static and dynamic feature interactions within cascades. For static learning, which is introduced in GCN module, given the obtained user embedding $X_{F} = [(x_{i})] \in R^{| c | \times d}$ , the representation can $h e a d_{i, F}$ be calculated as: $Q = X_{F} W_{i}^{Q}$ (13) $K = X_{F} W_{i}^{K}$ (14) $V = X_{F} W_{i}^{V}$ (15) $\begin{array}{l} h e a d_{i, F} & = Att (Q, K, V) & = softmax (\frac{Q {(K)}^{T}}{\sqrt{d'}} + M) V \end{array}$ (16)

Where $W_{i}^{K}, W_{i}^{Q}$ , and $W_{i}^{V}$ are learnable transformation matrices; $d' = d / H$ , d is the dimension of embedding; and H denotes the number of heads of attention. The Muti-Head Self-Attention representation is: $M utihea d_{F} = [h e a d_{1, F}; h e a d_{2, F}; \dots; h e a d_{H, F}] W^{O}$ (17)

Where $W^{O}$ is learnable transformation matric. Then we can get the final attentive representation $X_{F}^{'}$ through two layers fully connected neural network: $X_{F}^{'} = σ (M utihea d_{F} \cdot W_{1} + b_{1}) W_{2} + b_{2}$ (18)

Where σ is RuLU activation function, $W_{1}$ and $W_{2}$ are learnable transformation matrices, and $b_{1}$ and $b_{2}$ are bias parameters.

Here we get social relationship representation $X_{m}^{F'}$ . Similarly, we use the same self-attention method to learn dynamic hypergraph representation $X_{D}$ and obtain final embedding $X_{D}^{'}$ .

Interaction Prediction Module

Fusion

In order to simultaneously integrate the vector representations learned from social relationship representation $X_{m}^{F'}$ and dynamic hypergraph representation $X_{D}^{'}$ , we have designed a fusion strategy to learn a more expressive representation $X_{m}^{'}$ . It can be calculated as: $X_{m} = g_{R_{2}} X_{F}^{'} + (1 - g_{R_{2}}) X_{D}^{'}$ (19) $g_{R_{3}} = \frac{\exp (W_{X_{2}}^{T} σ (W_{R_{2}} X_{F}^{'})}{\exp (W_{X_{2}}^{T} σ (W_{R_{2}} X_{F}^{'}) + \exp (W_{X_{2}}^{T} σ (W_{R_{2}} X_{D}^{'})}$ (20)

Where $W_{X_{2}}^{T}$ and $W_{R_{2}}$ are the transformation matrix for attention. $σ (\cdot)$ is tanh.

Prediction

As we get final representation $X_{m}$ , to avoid the involvement of previously activated users in the prediction, we can calculate the propagation probabilities $\hat{y} \in R^{| c | \times N}$ : $\hat{y} = softmax (W_{p} Z_{m})$ (21)

Where $W_{p}$ is a transformation matrix that maps the $X_{m}$ to user-specific space. We apply the cross-loss for training process: $loss (θ) = - \sum_{j = 2}^{| c_{m} |} \sum_{i = 1}^{| U |} y_{j i} \log ({\hat{y}}_{j i})$ (22)

Where θ represents all learnable parameters in the model. If the user u_i participates in cascade c_m at step j, $y_{j i} = 1$ , otherwise $y_{j i} = 0$ .

The whole process of DMHANT is shown in Algorithm 1.

Algorithm 1 Framework of DMHANT Algorithm Input The friendship graph $G_{F} = (U, E)$ ; The cascades sequence $C = {C_{1}, C_{2}, \dots, C_{M}}$ ; Output Diffusion probabilities, $\hat{y} \in R^{| c | \times N}$ 1: Construct new diffusion hypergraphs $G_{D} = {G_{D}^{t} | t = 1, 2, \dots, T}$ 2: Use DropMessage to replace the traditional Dropout method. 3: Learning feature representation $X_{F}$ of social relationship networks $G_{F}$ with GCN modules 4: Learning feature representation $X_{D}$ of hypergraph $G_{D} = {G_{D}^{t} | t = 1, 2, \dots, T}$ with HANT modules. 5: Calculate attentive representation $X_{F}^{'}$ and $X_{D}^{'}$ with Multi-Head Self-Attention module. 6: Calculate final representation $Z_{m}$ with a fusion strategy. 7: Get final results $\hat{y} \in R^{| c | \times N}$ by FNN. 8: return $\hat{y}$ ;

Experiments

Datasets

To investigate the generalization of DMHANT, we collect datasets from various social platforms, specifically Douban and from Q&A websites such as Android and Christianity. The specific details regarding the sampled data are provided in Table 1.

Table 1.

Dataset

Dataset	Douban	Android	Christianity
#Users	12,232	9958	2897
Friendship
#Links	396,580	48,573	35,624
Density	30.21	4.87	12.30
Interaction
#Cascades	3475	679	589
Avg. Length	21.76	33.3	22.9
Density	6.18	2.27	4.66

•

Douban:⁴⁴ The dataset obtained is from a social website consists of information where users share their book or movie reading statuses and follow the statuses of other users. The friendship relation in this dataset is established based on the co-occurrence relation among users. This implies that users who frequently share similar interests or engage in similar activities are considered to have a friendship relation.

•

Android:⁴⁵ A dataset collected from a Stack Exchanges, a community websites. The friendship relation is constructed by users’ interaction on various channels, such as questioning, answering.

•

Christianity:⁴⁵ The dataset collected from community Q&A websites contains the user friendship network as well as cascading interactions that are related to the Christian theme.

Baselines

We will compare DMHANT with several information propagation prediction models: •

DeepDiffuse⁴⁶ uses RNN and attention mechanism to predict when the next infection will occur and who will be infected.

•

NDM⁴⁷, neural diffusion model employs self-attention mechanism and CNNs to learn cascades effectively, incorporates relaxed independence assumptions to alleviate long-term dependency.

•

FOREST²¹ models GNNs and RNN-based method to obtain users’ social relationships and learn cascades context.

•

Inf-VAE⁴⁵, influence variational autoencoder introduces a variational autoencoder framework that jointly embed social homophily through temporal influence and GNNs.

•

DyHGCN²² constructs heterogeneous graphs which encompass both diffusion relations and social relationship of users, jointly encoding diffusion cascades and social networks through GCN.

•

CE-GCN³⁹ effectively leverages collaborative patterns within cascades to improve the performance of prediction of next user.

•

H-Diffu⁴⁸ encodes social graph and information diffusion cascades into two latent hyperbolic spaces, each with its own trainable curvature. Additionally, it introduces a co-attention mechanism to effectively capture the processes of diffusion cascades using positional embeddings.

Evaluation settings

Based on the problem formulation above, we can regard our prediction task as a retrieval problem. Therefore, in line with previous studies,⁴⁷ we employ Mean Average Precision on top k(MAP@k) and Hits score on top K (Hits@k) as evaluation metrics. These metrics consider both the occurrence and ranking position of ground-truth users in the predicted list. In our experiments, we evaluate MAP@K and Hits@K with K values of 10, 50, and 100, respectively.

Implementation details

Our experiments are done on an Intel(R) Core(^TM) i5-7400 CPU @ 2* 3.00 GHz, 8.0GB RAM, Windows 7 and a single card GPU connected to a server NVIDIA RTX3090. The software is implemented in Python 3.7.10, while using the Deep Graph Library, the PyTorch framework on the backend, and Adam as the model optimizer.

For each dataset, we randomly select 80% of the cascades sequences for training, 10% for validation, and the remaining 10% are reserved for testing.

We set the maximum length of cascade to 200. For our model DMHANT, the batch size is set to 64, the dimension of embedding is 64, and the learning rate is 0.001. In order to maintain high computational efficiency, the number of multi-headed attention heads K is set to 8. The number of subhypergraphs is also set to 8. To fully train the parameters of the model, training batch is set to 100 and Batch size is 64, with each Batch including cascades sequence and friendship graph.

Performance of model

We compare DMHANT with several baselines on three datasets and the experimental results are shown in Tables 2 and 3. Based on these, we can make the following analysis.

Table 2.

Experimental results comparison on 3 dataset (%) (Hits@k scores for K = 10, 50, 100)

	Douban			Android			Christianity
Models	@10	@50	@100	@10	@50	@100	@10	@50	@100
DeepDiffuse	9.06	14.96	19.13	4.13	10.53	17.12	10.26	21.86	30.73
NDM	10.07	21.53	30.45	4.83	14.21	18.93	15.42	31.34	45.83
FOREST	19.51	32.05	39.12	9.64	17.72	24.03	24.85	42.01	51.23
Inf-VAE	8.94	22.03	35.72	5.98	14.70	20.93	18.38	38.54	51.15
DyHGCN	18.71	32.33	39.72	9.10	16.34	23.10	26.23	42.80	52.47
CE-GCN	20.28	34.11	41.77	9.23	18.12	26.44	27.02	45.55	53.14
H-Diffu	21.08	34.21	40.45	9.78	19.07	26.12	28.07	44.32	54.07
DMHANT(ours)	21.33	35.47	42.34	10.33	20.82	27.99	30.37	46.72	56.21

The bolded data indicates the most effective outcomes.

Table 3.

Experimental results comparison on 3 dataset (%) (MAP@k scores for K = 10, 50, 100)

	Douban			Android			Christianity
Models	@10	@50	@100	@10	@50	@100	@10	@50	@100
DeepDiffuse	6.02	6.93	7.13	2.30	2.52	2.54	7.25	7.82	7.83
NDM	8.23	8.73	9.16	2.01	2.22	2.93	7.43	7.68	7.86
FOREST	11.26	11.83	11.94	5.83	6.17	6.26	14.63	15.46	15.58
Inf-VAE	11.02	11.28	11.98	4.83	4.86	5.27	9.25	11.96	12.45
DyHGCN	10.61	11.26	11.36	6.09	6.40	6.52	15.64	16.32	16.44
CE-GCN	11.70	11.95	12.22	6.10	6.21	6.13	15.87	16.87	17.44
H-Diffu	11.95	12.19	12.35	5.12	6.01	6.31	16.02	17.55	17.54
DMHANT(ours)	12.06	12.65	12.69	6.43	6.81	6.97	17.84	18.62	18.77

The bolded data indicates the most effective outcomes.

(1)

We can find that DMHANT significantly and outperforms consistently all baseline methods on three real datasets. Compared to the second best model H-Diffu, DMHANT construct a hypergraph based on the cascade sequence to dynamically learn its global features and uses DropMessage Replacing traditional dropout method, thus reaching up to 2.14% improvement in Hits@100 score and 1.23% in MAP@100 score than DyHGCN. The improvement in these metrics demonstrates the effectiveness of DMHANT.

(2)

The methods that utilize user social relations (FOREST, DyHGCN, CE-GCN, H-Diffu, and DMHANT) generally perform better than cascades-based approaches (DeepDiffuse and NDM). since they use traditional neural networks, but not GNNs that can mine graph structures. In terms of feature mining, they only exploit the propagation cascade feature and failed to incorporate the relationship graph between users, which proves the validity of social relationship.

(3)

DMHANT, CE-GCN, and H-Diffu consider users’ global interactions and ultimately achieve excellent results, which confirms our theory that users’ interaction preferences can be learned both from their historical behavior and social relationship.

(4)

Overall, the Android dataset is worse than the other two datasets and the model improvement is relatively small. The reason may be that the learning of users’ intimate relationships in Douban and Christianity can effectively describe their behavior, in which case the introduction of construction of hypergraphs may not help much.

Robustness of our model

To verify the robustness of DMHANT, we also conduct experiments through handling perturbed graphs. Specifically, we randomly add a certain ratio of edges (0%, 10%, 20%, 30%) into the both relationship graph and hypergraph. In particular, to demonstrate the robustness of DropMessage strategic, we develop a set of comparison experiments that adopt traditional dropout method. The results are displayed in Figure 6. It can be seen that the Hit@100 of all methods decrease as the side ratio increases, but our model DMHANT consistently outperforms other methods in noisy situations. As for the comparison experiments that adopt traditional dropout method, the results indicate that the DropMessage strengthen the robustness of GNN models. This means that even if our dataset is perturbed, our model still gives a high accuracy rate. Besides, the reason for the poor performance of all models on Android may be due to the relative sparsity of the dataset.

FIG. 6.

Robustness study of our DMHANT.

Ablation study

We conduct ablation studies and compare with different variants of DMHANT on three datasets to evaluate the contribution of each part: w/o GCN remove one single GCN Module from DMHANT while keeping the other parts unchanged. w/o HANT remove one single HANT Module from DMHANT, while keeping the other parts unchanged. w/o ATT ignores Muti-Head Self Attention mechanism for $X_{F}$ and $X_{D}$ . w/o Fusion replaces all gated fusions with average weighted fusion.

As shown in Tables 4 and 5, DMHANT achieves the best performance compared to any of its variants. Specifically, from the results, first, there is little difference in the experimental performance between the two, whether the GCN module or the HANT module is removed, indicating the validity of two types of global dependencies. Second, experimental performance of direct weighted average strategy is significantly inferior to that of using the fusion strategy on three datasets, which is due to the fact that the Fusion strategy uses an attention mechanism that fully considers the weight of both. Third, the single GCN module and the single HANT module outperform the average weighting of both, indicating that the two modules should be fused with a certain weighting to ensure the benefits of the model, while illustrating the importance of the attention mechanism introduced by the Fusion strategy.

Table 4.

Ablation study of DMHANT (%) (Hits@k scores for K = 10, 50, 100)

	Douban			Android			Christianity
Models	@10	@50	@100	@10	@50	@100	@10	@50	@100
w/o GCN	20.12	34.21	41.22	10.09	18.76	25.99	27.41	45.56	53.64
w/o HANT	20.41	34.33	40.87	10.15	19.79	27.03	29.19	44.15	54.04
w/o ATT	21.21	32.15	38.24	9.45	19.47	26.26	27.41	45.15	54.83
w/o Fusion	19.44	33.79	40.11	8.79	18.86	26.27	27.78	45.87	54.24
DMHANT	21.33	35.47	42.34	10.33	20.82	27.99	30.37	46.72	56.21

The bolded data indicates the most effective outcomes.

Table 5.

Ablation study of DMHANT (%) (MAP@k scores for K = 10, 50, 100)

	Douban			Android			Christianity
Models	@10	@50	@100	@10	@50	@100	@10	@50	@100
w/o GCN	11.56	11.96	12.11	6.39	6.77	6.87	17.37	18.16	18.28
w/o HANT	11.24	11.47	12.08	6.38	6.8	6.92	16.47	17.68	17.86
w/o ATT	11.26	11.83	11.94	5.83	6.17	6.26	16.46	16.86	17.45
w/o Fusion	11.02	11.28	11.98	4.83	4.86	5.27	14.63	15.46	15.58
DMHANT	12.06	12.65	12.69	6.43	6.81	6.97	17.84	18.62	18.77

The bolded data indicates the most effective outcomes.

Parameters sensitivity

To investigate the effect of some parameters, we show some Hits@100 results in Figure 7.

FIG. 7.

Parameters Sensitivity study of our DMHANT.

Figure 7a illustrates the impact of the proportion of the training set. As the training ratio increases, the effectiveness of the model improves. This improvement may also be attributed to the reduction in the size of the test set, which results in less data available for testing and consequently increases the Hits@100 metric. It is worth noting that the model effect does not change much when the ratio changes from 70 to 90, which also indicates robustness of the model.

Figure 7b plots the effect of the number of dimensions. Overall, the effect improves as dimensions increases because higher dimensions can better represent the influence of interrelationships. In addition, the highest scores for all three evaluated datasets are obtained when the size is 64. Therefore, we can empirically set the default number of dimensions to 64.

In Figure 7c, we plot the influence of number of heads in Multi-Head Self-Attention module. We observed that the model performance improves as the number of heads increases, which also indicates that the multi-head attention mechanism acts as an integration to improve the overfitting problem of the model.

In Figure 7d, we verify the relationships between model performance and learning rate, we observe that the model presents a small peaks of performance when the learning rate is around 0.001 and 0.01. Therefore, the selection of a suitable learning rate has an significant impact on the model.

Conclusion

This article investigates the GNN-based information propagation prediction. Aiming at how to collect more global dependencies and dynamic interaction preferences feature from observed network as well as how to solve poor robustness, we presented a novel information propagation prediction model named DMHANT, which jointly learns users’ social relationships and diffusion cascade representation. Through GCN module, HANT module, Muti-Head Self-Attention module, and Interaction Prediction Module, our model fully captures dynamic interactions. In addition, a new drop immediately method DropMessage is added to increases the robustness of our model. Experiments on three real-world datasets demonstrate the effectiveness and robustness of proposed model.

For future works, we will consider incorporating the content of the propagated information and the user’s preference for information content to further explore the features in it and improve the model performance.

Footnotes

Acknowledgments

The authors thank the editors and the anonymous reviewers for their efforts.

Author Disclosure Statement

No competing financial interests exist.

Funding Information

This research project was supported by Major Science and Technology Projects in Henan Province under 221100210100.

References

Shi

, Xu

, Fan

, et al. Cost effective approach to identify multiple influential spreaders based on the cycle structure in networks. Sci China Inf Sci, 2023; 66(9):1–10; doi: 10.1007/s11432-022-3715-4

Dinh

, Zhang

, Nguyen

, et al. Cost-effective viral marketing for time-critical campaigns in large-scale social networks. IEEE/ACM Trans Networking, 2014; 22(6):2001–2011; doi: 10.1109/TNET.2013.2290714

Safar

, Sawwan

, Taha

, et al. Virtual social networks online and mobile systems. Mobile Information Systems, 2009; 5(3):233–253; doi: 10.1155/2009/473571

Shu

, Sliva

, Wang

, et al. Fake news detection on social media: A data mining perspective. SIGKDD Explor Newsl, 2017; 19(1):22–36; doi: 10.1007/978-3-031-01915-9_4

Nian

, Yang

, Shi

, et al. A behavioral propagation and competition model based on pressure. Mod Phys Lett B, 2021; 35(19):2150328; doi: 10.1142/s0217984921503280

Zhao

, Li

, Wang

, et al. Mathematical modeling and epidemic prediction of COVID-19 and its significance to epidemic prevention and control measures. JCSR, 2020; 1(1):19–36; doi: 10.14302/issn.2766-8681.jcsr-21-371919

, Qian

, Zhang

, et al. Graph neural network-based diagnosis prediction. Big Data, 2020; 8(5):379–390; doi: 10.1089/big.2020.0070

Majbouri Yazdi

, Majbouri Yazdi

, Khodayi

, et al. Prediction optimization of diffusion paths in social networks using integration of ant colony and densest subgraph algorithms. JHS, 2020; 26(2):141–153; doi: 10.3233/jhs-200635

Nian

, Ren

, Yu

. Online spreading of topic tags and social behavior. IEEE Trans Comput Soc Syst, 2024; 11(1):1277–1288; doi: 10.1109/tcss.2023.3235011

10.

Floria

, Leon

, Logofătu

. A model of information diffusion in dynamic social networks based on evidence theory. IFS, 2019; 37(6):7369–7381; doi: 10.3233/jifs-179346

11.

Daud

, Ab Hamid

, Saadoon

, et al. Applications of link prediction in social networks: A review. J Network Com Applications, 2020; 166:102716; doi: 10.1007/978-3-319-28922-9_5

12.

Lawrence

, Blackett

, Cradock-Henry

. Cascading climate change impacts and implications. Climate Risk Management, 2020; 29:100234; doi: 10.26686/wgtn.14502882

13.

Borge-Holthoefer

, Banos

, González-Bailón

, et al. Cascading behaviour in complex socio-technical networks. J Complex Networks, 2013; 1(1):3–24; doi: 10.1093/comnet/cnt006

14.

, Zhou

, Zhang

, et al. Casflow: Exploring hierarchical structures and propagation uncertainty for cascade prediction. IEEE Trans Knowl Data Eng, 2023; 35(4):3484–3499; doi: 10.1109/tkde.2021.3126475

15.

Morse

, Gonzalez

, Markuzon

. Persistent cascades: Measuring fundamental communication structure in social networks In: 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2016, pp. 969–75; doi: 10.1109/bigdata.2016.7840695

16.

Kimura

, Saito

, Ohara

, et al. Learning information diffusion model in a social network for predicting influence of nodes. IDA, 2011; 15(4):633–652; doi: 10.3233/ida-2011-0486

17.

Liu

, Ji

, Liu

, et al. Extended resource allocation index for link prediction of complex network. Physica A-Statis Mechanics Appli, 2017; 479:174–183; doi: 10.1016/j.physa.2017.02.078

18.

Xue

, Chen

, Hu

, et al. Twitter discussions and emotions about the COVID-19 pandemic: Machine learning approach. J Med Internet Res, 2020; 22(11):e20550; doi: 10.2196/20550

19.

, Cranmer

, Zheng

, et al. Infectivity enhances prediction of viral cascades in Twitter. PLoS One, 2019; 14(4):e0214453; doi: 10.1371/journal.pone.0214453

20.

Qiu

, Tang

, Ma

, et al. Deepinf: Social influence prediction with deep learning In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 2110–2119; doi: 10.1109/bigdata47090.2019.9005969

21.

Yang

, Tang

, Sun

, et al. Multi-scale information diffusion prediction with reinforced recurrent networks. In: IJCAI, 2019, pp. 4033–4039; doi: 10.24963/ijcai.2019/560

22.

Yuan

, Li

, Zhou

, et al. DyHGCN: A dynamic heterogeneous graph convolutional network to learn users’ dynamic preferences for information diffusion prediction In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part III. Springer, 2021, pp. 347–363; doi: 10.1007/978-3-030-67664-3_21

23.

Wang

, Huang

, Liu

, et al. Dydiff-vae: A dynamic variational framework for information diffusion prediction In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 163–172; doi: 10.1145/3404835.3462934

24.

Fan

, Liu

, Wang

, et al. Sequential recommendation with auxiliary item relationships via multi-relational transformer In: 2022 IEEE International Conference on Big Data (Big Data). IEEE, 2022, pp. 525–534; doi: 10.1109/bigdata55660.2022.10020655

25.

, Wang

, McAuley

. Time interval aware self-attention for sequential recommendation In: Proceedings of the 13th International Conference on Web Search and Data Mining, 2020, pp. 322–330; doi: 10.1145/3336191.3371786

26.

Lei

, Ji

, Li

. Tissa: A time slice self-attention approach for modeling sequential user behaviors In: The World Wide Web Conference, 2019, pp. 2964–2970; doi: 10.1145/3308558.3313495

27.

Kawamae

. Trend analysis model: Trend consists of temporal words, topics, and timestamps In: Proceedings of the fourth ACM international conference on Web search and data mining, 2011, pp. 317–326; doi: 10.1145/1935826.1935880

28.

Lagnier

, Denoyer

, Gaussier

, et al. Predicting information diffusion in social networks using content and user’s profiles In: Advances in Information Retrieval: 35th European Conference on IR Research, ECIR 2013, Moscow, Russia, March 24–27, 2013. Proceedings 35. Springer, 2013, pp. 74–85; doi: 10.1007/978-3-642-36973-5_7

29.

Cheng

, Yang

, Li

, et al. In vivo tracing of superparamagnetic iron oxide-labeled bone marrow mesenchymal stem cells transplanted for traumatic brain injury by susceptibility weighted imaging in a rat model. Chie J Traumatol, 2010; 13(03):173–177; doi: 10.3760/cma.j.issn.1008-1275.2010.03.008

30.

Saito

, Nakano

, Kimura

. Prediction of information diffusion probabilities for independent cascade model In: Knowledge-Based Intelligent Information and Engineering Systems: 12th International Conference, KES 2008, Zagreb, Croatia, September 3-5, 2008, Proceedings, Part III 12. Springer, 2008, pp. 67–75; doi: 10.1007/978-3-540-85567-5_9

31.

. Yang Q. Analyzing user retweet behavior on twitter. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2012, pp. 46–50; doi: 10.1109/asonam.2012.18

32.

, Lin

, Yeh

. Forecasting participants of information diffusion on social networks with its applications. Infor Sci, 2018; 422:432–446; doi: 10.1016/j.ins.2017.09.034

33.

Wang

, Chen

, Li

. Joint learning of user representation with diffusion sequence and network structure. IEEE Trans Knowl Data Eng, 2022; 34(3):1275–1287; doi: 10.1109/tkde.2020.2995075

34.

Wang

, Shen

, Liu

, et al. Cascade dynamics modeling with attention-based recurrent neural network. In: IJCAI. vol. 17, 2017, pp. 2985–2991; doi: 10.24963/ijcai.2017/416

35.

Zhou

, Xu

, Fu

, et al. HID: Hierarchical multiscale representation learning for information diffusion In: International Joint Conference on Artificial Intelligence, 2020, pp. 3385–3391; doi: 10.24963/ijcai.2020/468

36.

Feng

, You

, Zhang

, et al. Hypergraph neural networks In: Proceedings of the AAAI conference on artificial intelligence. vol. 33, 2019, pp. 3558–3565; doi: 10.5373/jardcs/v12sp4/20201622

37.

Bai

, Zhang

, Torr

. Hypergraph convolution and hypergraph attention. Pattern Recognition, 2021; 110:107637; doi: 10.1109/ikt51791.2020.9345609

38.

Xie

, Wang

, Jia

. Independent asymmetric embedding for information diffusion prediction on social networks In: 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 2022, pp. 190–195; doi: 10.1109/cscwd54268.2022.9776071

39.

Wang

, Wei

, Yuan

, et al. Cascade-enhanced graph convolutional network for information diffusion prediction In: Database Systems for Advanced Applications: 27th International Conference, DASFAA 2022, Virtual Event, April 11–14, 2022, Proceedings, Part I. Springer, 2022, pp. 615–31; doi: 10.1016/j.ins.2024.120938

40.

, Liu

. Novel competitive information propagation macro mathematical model in online social network. J Com Sci, 2020; 41:101089; doi: 10.1016/j.jocs.2020.101089

41.

Liu

, Tang

, He

. Double-layer network negative public opinion information propagation modeling based on continuous-time Markov chain. Compu J, 2021; 64(9):1315–1325; doi: 10.1093/comjnl/bxaa038

42.

Luo

, Liu

, Zhang

. A dynamic model of reposting information propagation based on empirical analysis and markov process. J Univers Comput Sci, 2016; 22(3):360–374; doi: 10.3217/jucs-022-03-0360

43.

Liu

, Zhou

, Zhan

, et al. Markov-based solution for information diffusion on adaptive social networks. Appl Mathematics Computation, 2020; 380:125286; doi: 10.1016/j.amc.2020.125286

44.

Zhong

, Fan

, Wang

, et al. Comsoc: Adaptive transfer of user behaviors over composite social network In: Proceedings of The 18th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, 2012, pp. 696–704; doi: 10.1145/2339530.2339641

45.

Sankar

, Zhang

, Krishnan

, et al. Inf-VAE: A variational autoencoder framework to integrate homophily and influence in diffusion prediction In: Proceedings of the 13th International Conference on Web Search and Data Mining, 2020, pp. 510–518; doi: 10.1145/3336191.3371811

46.

Islam

, Muthiah

, Adhikari

, et al. Deepdiffuse: Predicting the’who’and’when’in cascades In: 2018 IEEE International Conference On Data Mining (ICDM). IEEE, 2018, pp. 1055–1060; doi: 10.1109/icdm.2018.00134

47.

Yang

, Sun

, Liu

, et al. Neural diffusion model for microscopic cascade study. IEEE Trans Knowl Data Eng, 2019; 33(3):1–1; doi: 10.1109/tkde.2019.2939796

48.

Feng

, Zhao

, Fang

, et al. H-Diffu: Hyperbolic representations for information diffusion prediction. IEEE Trans Knowl Data Eng, 2023; 35(9):8784–8798; doi: 10.1109/tkde.2022.3209067