Feature difference-aware graph neural network for telecommunication fraud detection

Abstract

With the continuous escalation of telecommunication fraud modes, telecommunication fraud is becoming more and more concealed and disguised. Existing Graph Neural Networks (GNNs)-based fraud detection methods directly aggregate the neighbor features of target nodes as their own updated features, which preserves the commonality of neighbor features but ignores the differences with target nodes. This makes it difficult to effectively distinguish fraudulent users from normal users. To address this issue, a new model named Feature Difference-aware Graph Neural Network (FDAGNN) is proposed for detecting telecommunication fraud. FDAGNN first calculates the feature differences between target nodes and their neighbors, then adopts GAT method to aggregate these feature differences, and finally uses GRU approach to fuse the original features of target nodes and the aggregated feature differences as the updated features of target nodes. Extensive experiments on two real-world telecom datasets demonstrate that FDAGNN outperforms seven baseline methods in the majority of metrics, with a maximum improvement of about 5%.

Keywords

Fraud detection graph neural networks telecommunication networks feature fusion

1 Introduction

With the rapid development of the telecommunication industry, telecom fraud is becoming more and more prevalent, causing significant economic losses worldwide [1]. Meanwhile, it is increasingly difficult to distinguish fraudsters from normal users simply by user features, as fraud modes gradually shift from a wide spread to a precise fraud. Therefore, traditional machine learning methods based on feature engineering reach a performance bottleneck when dealing with such problems. As a result, researchers have started to investigate graph-based approaches [2–4], which can increase the modeling for user interaction while preserving the user features.

Due to the excellence of GNNs represented by GCN [5], GraphSAGE [6], and GAT [7] in modeling irregular data, many researchers have introduced GNNs techniques to the field of communication networks [8–10] and fraud detection in recent years. [11–15] employ GNNs to detect financial fraud on Alipay, Tmall, and Taobao platforms, which are owned by China’s largest e-commerce company, the Alibaba Group. In [16–22], GNNs are utilized to detect fraud on social media platforms and combate underground black production. Moreover, [23–25] use GNNs for detecting telecom fraud. These work focus on constructing interaction graphs according to the actual business characteristics, and they still essentially update the features of target nodes by aggregating the neighbor features of them. That is, these studies are still focus on the commonality of neighbor features.

GraphConsis [26] and CARE-GNN [27] calculate the similarity between target nodes and their neighbors based on node features, and aggregate only the features of the neighbors that are highly similar to the target nodes. These efforts are an enhancement to commonality of neighbor features. [28–30] start to pay attention to the differences between target nodes and their neighbors. However, there is still room for improvement. In summary, The majority of existing efforts ignore the differences of features between target nodes and their neighbors. Taking Fig. 1 for example, if all the neighbor features of User 6 are aggregated as its updated features, then User 6, as a fraudster, will be difficult to distinguish from normal users. Even if only the most similar neighbors are aggregated, User 6 is very likely to escape.

Fig. 1

Local View of the Telecom Call Network. Note: The black lines with double arrows represent the communication relations between users.

In order to address the shortcomings mentioned above, Feature Difference-aware Graph Neural Network 1 (FDAGNN for short) is proposed. FDAGNN adopts GAT method to aggregate feature differences between the target nodes and their neighbors, then employ Gate Recurrent Unit (GRU) [31] approach to fuse the original features of target nodes and aggregated differences. Furthermore, FDAGNN improves the focal loss function [32] instead of the standard cross-entropy loss function, in order to further enhance model performance.

The specific contributions of this paper are summarized as follows:

A new GNN-based fraud detection model called FDAGNN is proposed. FDAGNN aggregates the differences between target nodes and their neighbors based on an attention mechanism, instead of aggregating the raw features of neighbor nodes like existing GNNs. The proposed model can be generalized to multiple fraud detection domains, such as telecommunications, finance.

GRU approach is adopted to fuse the raw features of target nodes and aggregated differences. Compared with the other common methods of feature fusion, the GRU method shows excellent efficiency and effectiveness.

The focal loss function is improved to deal with the problem of class imbalance. Experiments show that the improved loss function can effectively enhance the model performance.

Experiments indirectly demonstrate that it is very necessary to consider the actual interaction between users when constructing graphs to detect fraud, especially telecommunication fraud with sparse data.

2 Related work

The work of this paper focuses on telecommunication fraud detection based on graph neural networks, and telecommunication fraud detection is a specific implementation of fraud detection technology in the field of telecommunication. Therefore, a brief overview of related work is given in three areas: (1) graph neural networks, (2) graph-based fraud detection, and (3) graph-based telecom fraud detection.

2.1 Graph neural networks

GNNs have received extensive attention in recent years as an efficient method of dealing with irregular data. A number of GNNs employ a mechanism based on message passing and aggregation [33]. When updating the feature of a node in the graph, the features of its neighbors are transferred to the node firstly, according to the topological association of the node in the graph. Then, all neighbor features received by the node are aggregated as its own updated features by a specific method. Different graph neural networks adopt different aggregation methods. In GCN [5], the target node is regarded as one of its neighbors, and the neighborhood features are then averaged. GraphSAGE [6] proposes four aggregation modes. The first is similar to GCN. Second, the target node concatenates the average of its neighbors with itself. Third, it adopts an elementwise max-pooling operation on the features of neighbors transformed by a fully-connected neural network, and the last is Long Short-Term Memory (LSTM) [34] mode. GAT [7] assigns different weights to neighbor nodes. As of today, GNNs are being used in a wide range of fields, such as recommendation systems [35–37], chemistry [38, 39], and fraud detection.

2.2 Graph-based fraud detection

Graph-based fraud detection has witnessed a growing interest. GeniePath [40] takes LSTM [34] to adaptively select neighbor nodes in different hops for target nodes. At the same time, an attention mechanism is employed to assign different weights to the neighbors in the same hop. ASA [16] constructs multiple homogeneous graphs based on different types of nodes, then uses GCN to update node features in each homogeneous graph. In Player2Vec [22], multiple homogeneous graphs are constructed based on different meta-paths. Then, Player2Vec updates node features for each homogeneous graph using GCN and fuses the features of same nodes among different homogeneous graphs via an attention mechanism.

CARE-GNN [27] builds a multi-relationship heterogeneous graph. Firstly, it measures the similarity between the target node and each of its neighbors under different relationships with l₁-distance, then it samples those neighbors that are more similar to the target node by using reinforcement learning technology, and finally it applies aggregation operations within each relation and among relations. It is important to note that CARE-GNN satisfies the homogeneity assumption of GNNs, that is, nodes are more inclined to establish edges with nodes of the same type. FRAUDRE [28] constructs multiple homogeneous graphs based on different relationships, then utilizes a fraud-aware graph convolution module to combine two distinct parts of the message: one is based on the average aggregation of neighbor features, while the other is based on the average aggregation of feature differences between the target node and its neighbors. Lastly, it aggregates embedding features among relations with an attention mechanism.

2.3 Graph-based telecom fraud detection

[23] detects international telecommunication fraud. It models the call network as a directed attribute bipartite graph and provides a dynamic anomaly scoring approach based on both time information and network attributes. AGRM [24] extracts a k-order subgraph of target nodes based on the sampling idea of GraphSAGE, employs an attention mechanism to aggregate messages of neighbors, except for targets themselves, and finally adds the aggregated neighbor features to the target nodes. MRG-GNN [25] models the call network as an attribute graph with edge features. The technology of Short walks are introduced to sample high-order neighbors. MRG-GNN accumulates the neighbor features as well as the features of edges in a single convolution layer. Furthermore, it integrates the output of different convolution layers in a GRU-like manner.

3 Problem definition

Definition 1. [Graph] $G = {V, X, E, A, Y}$ represents a graph, where $V = {v_{i}}_{i = 1}^{N}$ denotes the node set of the graph $G$ . $x_{i} \in X$ represents a feature vector of node v_i and $x_{i} \in ℝ^{d}, X \in ℝ^{N \times d}$ . An edge between nodes v_i and v_j is denoted by $e_{ij} \in E$ . The adjacency matrix $A$ represents the topology of $G$ , $A_{i, j} = 1$ , if $e_{i, j} \in E$ , 0 otherwise. $y_{i} \in Y$ denotes the label of node v_i.

Definition 2. [Graph-based Telecom Fraud Detection] From Call Detail Records(CDRs), we construct a telecom graph $G = {V, X, E, A, Y}$ , $V = {v_{i}}_{i = 1}^{N}$ stand for phone numbers(accounts), $x_{i} \in ℝ^{d}$ represents a d-dimensional feature vector extracted from call behaviors of v_i. $e_{ij} \in E$ is an edge denotes the call relationship between v_i and v_j. $y_{i} \in Y$ denotes the label of node v_i, i.e., normal user, fraudster, or other. FDAGNN aims to learn a low-dimensional representation for each node, then converts telecom fraud detection to binary or multi-class classification problem.

4 The proposed method

In this section, the proposed model named FDAGNN is introduced. First, Section 4.1 summaries the overview framework of FDAGNN. Then the core modules are detailed, i.e., the difference-aware graph convolution module in Section 4.3, and the classification module in Section 4.4, respectively.

4.1 Overview

The proposed FDAGNN framework consists of three different modules, and its pipeline is shown in Fig. 2. The first module is the input component, whose main function is to construct and preprocess graphs. The difference-aware graph convolution module and the classification module are the core of FDAGNN. The difference-aware convolution module mainly completes hierarchical feature aggregations, which include the aggregation of feature differences between target nodes and their neighbors, and the fusion of target nodes themselves and the aggregated differences. The focal loss function is introduced, which originated in the field of computer vision, and further improve it in the classification module.

Fig. 2

The Overall Framework of FDAGNN. Note: Red circles represent fraudulent users and other coloured circles represent benign users in the Telecom Graph. Taking the feature update of node v as an example, $h_{v}^{(0)}$ denotes the original feature of node v and { $h_{v}^{(0)} - h_{v_{1}}^{(0)}$ } represents the feature differences between node v and v₁ (v₁ is a neighbor node of v). After the first graph convolution layer, i.e., Layer-1, the feature of node v is updated as $h_{v}^{(1)}$ . $h_{v}^{(1)}$ denotes the original feature in Layer-2. In this way, the final feature of node v is obtained after n graph convolution layers, then is fed into the classification module to determine if it is a fraudster.

4.2 Graph construction module

The procedures of constructing or preprocessing graphs on the two datasets (see Table 1) are different. For the first dataset, it has been processed into a directed graph by [24]. Here, it is pre-processed into an undirected graph. For the second dataset, since the open data is the raw data of the actual running system, the method, provided in [42], is utilized to construct the raw data as an undirected graph.

Table 1
Statistics Information of datasets

Dataset Nodes Edges Features User Classes User Numbers Average Degree

BUPT 116383 350751 39 Benign 99861 3.013

Fraud 8448

Courier 8074

SC 6106 838528 55 Benign 4144 137.329

Fraud 1962

Dataset	Nodes	Edges	Features	User Classes	User Numbers	Average Degree
BUPT	116383	350751	39	Benign	99861	3.013
	Fraud	8448
	Courier	8074
SC	6106	838528	55	Benign	4144	137.329
	Fraud	1962

Note: The edges represent real interactions between users in BUPT. In SC, the edges are constructed manually based on feature similarity between users.

4.3 Difference-aware graph convolution module

AM-GCN [41] demonstrates that GCN cannot adequately merge network topological structures and node features to extract the most correlated information. Specially, GCN even damages the original feature information for those networks with rich features, to some extent. So, the original features are preserved in the module: ${hf}_{i}^{(l)} = σ (W_{(hf, l - 1)} h_{i}^{(l - 1)}),$ (1) where $h_{i}^{(l - 1)} \in ℝ^{d}$ is the representation of node i in the (l - 1) _th layer and ${hf}_{i}^{(l)} \in ℝ^{\tilde{d}}$ represents the feature of node i in the embedding space of the l_th layer. $W_{(hf, l - 1)} \in ℝ^{d \times \tilde{d}}$ is a learnable weight matrice of a fully-connect layer in the (l - 1) _th layer and σ is a nonlinear activation function.

Difference Aggregation. Existing GNNs are not specially designed for the fraud detection problem, and they cannot describe anomalies of nodes effectively. In practical scenarios, directly aggregating neighbor features can easily make fraudsters mix with normal users. This makes it difficult to distinguish the two classes. However, this disadvantage may be improved by aggregating the feature differences between target nodes and their neighbors. To further enhance the ability to discriminate between normals and fraudsters, an attention mechanism is employed in the aggregation of feature differences. In brief terms, the process can be formulated as follows:

$\begin{matrix} {hd}_{i}^{(l)} = σ (\sum_{j \in N_{i}} α_{ij}^{(l - 1)} W_{(hd, l - 1)} (h_{i}^{(l - 1)} - h_{j}^{(l - 1)})), \end{matrix}$ (2)

where $h_{i}^{(l - 1)} \in ℝ^{d}, h_{j}^{(l - 1)} \in ℝ^{d}$ denotes the input representation of node i and j, $N_{i}$ denotes the set of neighbors of node i and ${hd}_{i}^{(l)}$ is the aggregation of feature differences between node i and its neighbors in the (l - 1) _th layer. $W_{(hd, l - 1)} \in ℝ^{d \times \tilde{d}}$ is a trainable weight matrice and σ is a nonlinear activation function. Here, $α_{ij}^{(l - 1)}$ is the attention coefficient between node i and node j in the (l - 1) _th layer, which can be formulated as: $\begin{matrix} α_{ij}^{(l - 1)} = \\ \frac{\exp (σ ({\vec{a}}^{T} W_{(hd, l - 1)} (h_{i}^{(l - 1)} - h_{j}^{(l - 1)})))}{\sum_{k \in N_{i}} \exp (σ ({\vec{a}}^{T} W_{(hd, l - 1)} (h_{i}^{(l - 1)} - h_{k}^{(l - 1)})))}, \end{matrix}$ (3) where $\vec{a} \in ℝ^{\tilde{d}}$ is the attention vector that allocates different importance to feature differences between node i and its different neighbors. Furthermore, the tanh function is adopted as the activation function, due to lots of negative values in the feature differences.

Feature Fusion. Inspired by LSTM aggregator in GraphSAGE, a similar strategy is applied to fuse ${hf}_{i}^{(l)}$ and ${hd}_{i}^{(l)}$ , then get the final representation of node i in the l_th layer. Compared with LSTM, GRU is much simpler to compute and implement. As a result, GRU is used to fuse ${hf}_{i}^{(l)}$ and ${hd}_{i}^{(l)}$ . Fig. 3 illustrates the procedure.

Fig. 3

The Procedure of Feature Fusion in the l_th layer. Note: ${hf}_{i}^{(l)}$ and ${hd}_{i}^{(l)}$ are the inputs to GRU-1 and GRU-2 respectively. $h_{0}^{(l)}$ denotes initial hidden state. GRU-1 implements the learning for ${hf}_{i}^{(l)}$ and obtains the output as ${ht}_{i}^{(l)}$ . At this point, ${ht}_{i}^{(l)}$ contains the information of ${hf}_{i}^{(l)}$ . After GRU-2, the information from ${hf}_{i}^{(l)}$ and ${hd}_{i}^{(l)}$ is effectively fused.

Taking the learning of ${hf}_{i}^{(l)}$ as an example, the procedure is described by Eqs. (4). $\begin{matrix} r_{t} & = σ (W_{r} \cdot [h_{0}^{(l)}, {hf}_{i}^{(l)}] + b_{r}) \\ z_{t} & = σ (W_{z} \cdot [h_{0}^{(l)}, {hf}_{i}^{(l)}] + b_{z}) \\ \tilde{h_{t}} & = \tanh (W_{h} \cdot [r_{t} \cdot h_{0}^{(l)}, {hf}_{i}^{(l)}] + b_{h}) \\ {ht}_{i}^{(l)} & = (1 - z_{t}) \cdot h_{0}^{(l)} + z_{t} \cdot \tilde{h_{t}} \end{matrix}$ (4) where $h_{0}^{(l)}$ denotes initial hidden state, usually zeros, in the l_th convolution layer. As hidden state, ${ht}_{i}^{(l)}$ participates in the computation of the second GRU component. In GRU-2, ${hd}_{i}^{(l)}$ is the input, and the final result, $h_{i}^{(l)}$ , is the output of the l_th convolution layer.

4.4 Classification module

After the Difference-aware Graph Convolution Module, the final representations of the target nodes are obtained, denoted as $h_{i}^{(final)}$ in the embedding space. Then, $h_{i}^{(final)}$ is fed into a fully-connected layer in order to get the representation, z_i, which is used to predict the eventual category. Then a multi-class cross-entropy loss function is employed as the objective function.

$z_{i} = W_{fc} h_{i}^{(final)} \in ℝ^{C}$ (5) $p_{i} = softmax (z_{i}) \in ℝ^{C}$ (6) $ℒ = - \frac{1}{| V |} \sum_{i \in V} l o g (p_{i, c}), c \in C$ (7) where W_fc represents the weight matrice of the fully-connected layer. p_i denotes the predicting probability vector of node i, which represents the predicting probability that node i belongs to each class. So, p_i,c represents the predicting probability that node i belongs to class c, while c is the ground-truth label for node i. $C$ is the set of ground-truth labels, yet C denotes the number of label classes.

All types of misclassification errors are evaluated equally by the object function. However, in real-world scenarios, this is not feasible. The real cost of misclassifying a fraudster as a normal may be greater than that of misclassifying a normal as a fraudster when it comes to detecting telecommunication fraud. Moreover, if the aforementioned objective function is applied directly to a dataset that is class-imbalanced, it is easy to overfit the majority class while underfitting the minority class. Consequently, the objective function is further improved to focal loss function by drawing on the experience in handling the imbalanced classes problem in dense object detection.

$ℒ = - \frac{1}{| V |} \sum_{i \in V} {(1 - p_{i, c})}^{γ} l o g (p_{i, c})$ (8) where γ ≥ 0 is a hyperparameter. In this way, the weight of hard-classified instances in the total loss will increases. On the other hand, minority classes are generally more challenging to be classified correctly. That is to say, this can increase the contribution of minorities in the objective function, thus alleviating the limitations of the standard multi-class cross-entroy loss function in dealing with imbalanced classification problem.

The training procedure is presented in Algorithm 1.

Algorithm 1 FDAGNN

Input: $G :$ A telecom graph; $V_{train} :$ The set of training nodes; $C_{train} :$ The set of labels of training nodes; E and B : The epochs and batches for training; $V_{train, b} :$ The set of training nodes of the b_th batch; L : The number of graph convolution layers.

Output: Model parameters θ

1: // Initialization and data preparation;

2: for e = 1, 2, …, E do

3: for b = 1, 2, …, B do

4: for l = 1, 2, …, L do

5: ${hf}_{i}^{(l)}$ ← Eq.(1), $i \in V_{train, b}$

6: ${hd}_{i}^{(l)}$ ← Eq.(2) and (3) $i \in V_{train, b}$

7: $h_{i}^{(l)}$ ← Eqs.(4) $i \in V_{train, b}$

8: $h_{i}^{(final)} = h_{i}^{(L)}$

9: z_i ← Eq.(6) $i \in V_{train, b}$

10: $L$ ← Eq.(8)

11: //Back-propagation and update weights

12: //Validate and decide whether to stop early

13: Return model parameters θ

5 Experiments and discussions

In this section, a series of experiments were conducted on two real-world telecom network fraud detection datasets to primarily investigate the following themes:

How does FDAGNN perform comparing with existing baseline methods?

How do different components affect the performance of FDAGNN?

Is FDAGNN sensitive to the hyperparameters?

How efficiently does FDAGNN run?

5.1 Experimental setup

Datasets. The performance of FDAGNN is evaluated based on two real-world datasets for fraud detection in telecom networks. The first dataset, abbreviated as BUPT, was released by [24] in 2019 2 . The authors extracted 39-dimensional features, which include call ID, frequency of the calls, the rate of successful connection, average duration of the calls, average duration of the ringing, call types, the distribution of call timestamp, the distribution of call duration, the rate of being hung up and the dispersion of callee numbers from the original CDRs data. As a result of mass user feedback and manual testing, four types of users were identified, respectively normal users, salesmen, couriers, and fraudsters. The edges represent the actual call relationship between these users. Specifically, it should be noted that the dataset released on Github did not contain any features for salesmen, only labels and edges. Therefore, the labels and edges associated with salesmen were removed. Furthermore, the original directed graph was constructed as an undirected graph. The second dataset, briefly denoted as SC, was released by Sichuan Mobile in the 2020 Digital Sichuan Innovation Competition 3 . Due to privacy policies, the dataset had been desensitized to remove sensitive information, such as personal identifiers. This dataset is CDRs data of subscribers, consisting of call records (call object, call duration, call type, call timestamp, call location), SMS records (SMS object, SMS type, SMS timestamp), Internet access records (APP name and traffic consumption of each APP), and expense records within eight months. 55 dimensions of features from the raw CDRs data were extracted. Considering that the initial call edges between users to be detected were too sparse to form a connected graph, edges were constructed based on the similarity of features between users. The procedure of feature extraction and edge construction is same as [42]. There are two types of users in the dataset SC, benign users and fraudsters. The detailed statistics of the two datasets can be found in Table 1.

Baselines. To demonstrate the effectiveness of FDAGNN in fraud detection, several GNNs algorithms in recent years were chosen as the benchmark to compare with the proposed algorithm.

GCN aggregates features of neighbor nodes in an averaged pattern.

GAT [7] assigns different weights to different neighbor nodes when aggregating features.

GeniePath [40] adopts LSTM approach to adaptively select neighbor nodes of different hops for target nodes, then assigns different weights to them.

GEM [12] also learns the user behavior matrix when using GCN to update node features.

CARE-GNN [27] only aggregates features of neighbor nodes, which are more similar to the target nodes.

FRAUDRE [28] aggregates both the original features of neighbor nodes and the differences with target nodes.

H²-FDetector [29] identifies the homophilic and heterophilic connections with the supervision of labeled nodes, then the homophilic connections propagate similar information and the heterophilic connections propagate difference information.

Experimental Setting. In FDAGNN, Adam was chosen as the optimizer, and the hidden dimensions were set to 32, the number of graph convolution layers to 1 and the gamma value of focal loss function to 1. The other hyperparameter settings are shown in Table 2.

Table 2
Hyperparameter settings for FDAGNN

Dataset lr weight-decay feat-drop attn-drop

BUPT 0.004 0.0006 0 0

SC 0.001 0.0006 0.3 0.7

Dataset	lr	weight-decay	feat-drop	attn-drop
BUPT	0.004	0.0006	0	0
SC	0.001	0.0006	0.3	0.7

Note: lr is the learning rate of Adam optimizer, and weight-decay is the decay rate of lr. feat-drop is the dropout rate of node features. attn-drop is the dropout rate of attention in the phase of aggregating feature differences.

GCN, GAT, GeniePath, and CARE-GNN were implemented through an open-source framework, DGL v0.8.0post2 4 . GEM was performed using an open-source toolbox for fraud detection 5 , FRAUDRE and H²-FDetector using the source code released by its corresponding authors. All experiments were conducted by Python 3.7.10, 1 GeForce RTX 3090 GPU, 48 GB RAM, 12 cores Intel(R) Xeon(R) Gold 5218 CPU @2.30GHz and Linux Server.

Experimental Metric. In fraud detection, managers usually pay more attention to fraudulent entities. Therefore, Precision, Recall, F1 of fraud class, and Accuracy of all samples are chosen as evaluation metrics. Precision presents the proportion of predicted fraudsters that are ground-truth. Recall denotes the proportion of real fraudsters detected. Accordingly, F1 represents the trade-off between precision and recall. Accuracy refers to the proportion of correctly classified individuals in all samples. Therefore, the experimental results are analyzed mainly in terms of the two metrics of F1 and Accuracy.

5.2 Performance comparison

Table 3 reports the experiment results of FDAGNN and the baselines. The percentage of training entities varied from 5% to 40%. The percentage of validating entities remained at 20%, while the remaining entities were used as test cases. In all experiments, the early-stopping mechanism was utilized, which was controlled by the accuracy under the validating set.

Table 3
Performance Comparison of FDAGNN and Baselines (under different training ratios)

Dataset Metrics Train(%) GCN GAT GeniePath GEM CARE-GNN FRAUDRE H²-FDetector FDAGNN

BUPT Precision 5% 81.55 82.98 85.05 28.70 71.15 70.98 71.34 86.36

10% 83.10 82.84 85.33 83.33 68.10 68.91 70.23 89.35

20% 85.85 86.26 87.23 76.47 69.96 72.13 73.46 90.93

40% 86.61 86.43 89.62 71.43 70.64 73.06 75.64 90.31

Recall 5% 74.79 73.31 83.58 1.96 88.66 86.39 89.31 83.32

10% 74.28 75.47 85.83 0.42 91.41 91.89 92.05 84.34

20% 75.46 74.10 87.38 0.77 93.55 93.16 93.33 86.18

40% 76.43 74.08 89.27 0.89 94.17 93.60 93.86 87.94

F1 5% 78.02 77.85 84.31 3.66 78.95 77.93 79.32 84.81

10% 78.44 78.99 85.58 0.84 78.05 78.76 79.67 86.77

20% 80.32 79.72 87.31 1.52 80.05 81.31 82.21 88.49

40% 81.20 79.78 89.44 1.75 80.72 82.06 83.77 89.11

Accuracy 5% 91.49 91.77 93.46 85.84 86.64 88.93 89.54 93.46

10% 91.83 92.00 92.87 85.83 84.94 87.45 88.56 93.95

20% 92.40 92.34 93.98 85.85 85.41 88.36 89.08 94.60

40% 92.79 92.50 94.79 85.84 85.90 87.30 88.47 94.92

SC Precision 5% 90.11 92.53 89.17 80.08 84.23 77.79 78.36 85.46

10% 89.96 91.52 88.95 88.51 81.86 78.64 78.92 88.86

20% 89.19 92.73 85.96 86.34 84.59 82.21 83.25 91.73

40% 91.08 92.50 88.66 87.64 87.06 83.54 84.34 91.82

Recall 5% 71.95 69.53 75.83 68.82 82.83 79.29 79.44 76.94

10% 75.62 72.10 80.40 62.81 82.38 81.64 81.78 79.07

20% 76.06 72.34 82.54 66.61 84.66 81.91 82.33 79.61

40% 76.24 72.14 81.72 68.66 86.19 83.96 84.46 79.60

F1 5% 80.02 79.40 81.96 74.02 83.52 78.53 78.90 80.98

10% 82.17 80.66 84.46 73.48 82.66 80.12 80.32 83.68

20% 82.11 81.27 84.22 75.2 84.63 82.06 82.79 85.24

40% 83.01 81.06 85.05 77.00 86.63 83.75 84.40 85.28

Accuracy 5% 88.67 88.62 89.48 84.48 89.69 86.33 86.48 88.60

10% 89.54 88.98 90.57 85.43 88.84 87.09 87.76 90.18

20% 89.79 89.74 90.47 85.89 90.53 88.97 89.23 91.51

40% 89.73 88.91 90.54 86.82 91.24 89.28 89.79 90.95

Dataset	Metrics	Train(%)	GCN	GAT	GeniePath	GEM	CARE-GNN	FRAUDRE	H²-FDetector	FDAGNN
BUPT	Precision	5%	81.55	82.98	85.05	28.70	71.15	70.98	71.34	86.36
		10%	83.10	82.84	85.33	83.33	68.10	68.91	70.23	89.35
		20%	85.85	86.26	87.23	76.47	69.96	72.13	73.46	90.93
		40%	86.61	86.43	89.62	71.43	70.64	73.06	75.64	90.31
	Recall	5%	74.79	73.31	83.58	1.96	88.66	86.39	89.31	83.32
		10%	74.28	75.47	85.83	0.42	91.41	91.89	92.05	84.34
		20%	75.46	74.10	87.38	0.77	93.55	93.16	93.33	86.18
		40%	76.43	74.08	89.27	0.89	94.17	93.60	93.86	87.94
	F1	5%	78.02	77.85	84.31	3.66	78.95	77.93	79.32	84.81
		10%	78.44	78.99	85.58	0.84	78.05	78.76	79.67	86.77
		20%	80.32	79.72	87.31	1.52	80.05	81.31	82.21	88.49
		40%	81.20	79.78	89.44	1.75	80.72	82.06	83.77	89.11
	Accuracy	5%	91.49	91.77	93.46	85.84	86.64	88.93	89.54	93.46
		10%	91.83	92.00	92.87	85.83	84.94	87.45	88.56	93.95
		20%	92.40	92.34	93.98	85.85	85.41	88.36	89.08	94.60
		40%	92.79	92.50	94.79	85.84	85.90	87.30	88.47	94.92
SC	Precision	5%	90.11	92.53	89.17	80.08	84.23	77.79	78.36	85.46
		10%	89.96	91.52	88.95	88.51	81.86	78.64	78.92	88.86
		20%	89.19	92.73	85.96	86.34	84.59	82.21	83.25	91.73
		40%	91.08	92.50	88.66	87.64	87.06	83.54	84.34	91.82
	Recall	5%	71.95	69.53	75.83	68.82	82.83	79.29	79.44	76.94
		10%	75.62	72.10	80.40	62.81	82.38	81.64	81.78	79.07
		20%	76.06	72.34	82.54	66.61	84.66	81.91	82.33	79.61
		40%	76.24	72.14	81.72	68.66	86.19	83.96	84.46	79.60
	F1	5%	80.02	79.40	81.96	74.02	83.52	78.53	78.90	80.98
		10%	82.17	80.66	84.46	73.48	82.66	80.12	80.32	83.68
		20%	82.11	81.27	84.22	75.2	84.63	82.06	82.79	85.24
		40%	83.01	81.06	85.05	77.00	86.63	83.75	84.40	85.28
	Accuracy	5%	88.67	88.62	89.48	84.48	89.69	86.33	86.48	88.60
		10%	89.54	88.98	90.57	85.43	88.84	87.09	87.76	90.18
		20%	89.79	89.74	90.47	85.89	90.53	88.97	89.23	91.51
		40%	89.73	88.91	90.54	86.82	91.24	89.28	89.79	90.95

1. Note: Precision, Recall and F1 are related to fraud. Accuracy is about all samples.

2. Note: The edges in BUPT represent real interactions between users. In SC, the edges are constructed manually based on feature similarity between users. For specific performance analysis, see Section 5.2.

Before performing an in-depth analysis, we should pay attention to the characteristics of the two telecom graphs constructed from real-world datasets, BUPT and SC. Firstly, the edges of BUPT are true call relationships, while those of SC are artificially manufactured based on feature similarity between nodes. Secondly, BUPT is unbalanced and the proportion of positive samples, i.e., frauder, is about 7.26%. The graph constructed on BUPT is relatively sparse. However, SC is relatively balanced and the proportion of positive samples is about 32.13%. The graph constructed on SC is dense.

On BUPT, FDAGNN performs best in the metric of precision and F1 for frauders and accuracy for all users under all training ratios (except F1 in the ratio of 40%). The maximum improvement is about 5%. Thought CARE-GNN, FRAUDRE, and H²-FDetector get higher scores than FDAGNN in the metric of recall for frauders, the precision for frauders is quite low. Moreover, the accuracy is also bad. That is, CARE-GNN, FRAUDRE, and H²-FDetector incorrectly identified many normals as frauders. This is not obvious in the FDAGNN.

On SC, considering the metric of F1 for frauders and accuracy for all users, FDAGNN has some competitiveness and performs best under some training ratios. Specially, GAT get highest score in the metric of precision for frauders, but the recall is almost the worst. CARE-GNN is best for recall, however, the precision is worse than FDAGNN. FDAGNN obtains a high precision on the basis of a relatively ideal recall. In summary, FDAGNN does not perform as well on SC as it does on BUPT. The most possible reason is that the edges in the graph of SC are artificially manufactured based on feature similarity between nodes. That is, the edges do not represent real call relationships on SC. This implicitly indicates that it is necessary to exploit real interactions between users when adopting GNNs methods.

5.3 Ablation experiments

To verify the effectiveness of each module or component in FDAGNN, three sets of ablation experiments were conducted: (i) FDAGNN-Fusion: which designed three other variants for the component of feature fusion. These variants were FDAGNN_add, FDAGNN_att and FDAGNN_cat, which respectively involved addition, attention, and concatenation to replace the GRU component of FDAGNN. (ii) FDAGNN-Loss: which applied the standard cross-entropy loss function instead of the focal loss function in FDAGNN. It was denoted as FDAGNNce. (iii) FDAGNN-no-differences: which did not take advantage of the feature differences, but only used the features. It was presented as FDAGNN_/diff, which was essentially equivalent to MLP (multilayer perceptron).

As shown in Fig. 4 and Fig. 5, FDAGNN was superior to the other three variants with respect to almost all evaluation metrics under a variety of training ratios on the BUPT dataset. On SC, FDAGNN performed better than the others in terms of F1 and Accuracy metrics. This demonstrated that the GRU fusion method was more effective than the others. It can be observed that FDAGNN_att performed the worst out of all fusion patterns, both on BUPT and SC. Possible causes were as follows. In aggregating the feature differences, the proposed model adopted the attention mechanism. Thus, FDAGNN_att increased the complexity of the model, which resulted in overfitting of the model. FDAGNN_add was equivalent to summing the original features and feature differences with a fixed weight of 0.5 respectively. It was not adaptable for different nodes. FDAGNN_cat did not perform the fusion operation explicitly in the Difference-aware Graph Convolution Module, and its operation of fusion was completed implicitly during the process of transforming from the hidden space of embedding to the target space of classification. When compared with FDAGNN_add, FDAGNN_cat can adapt to different nodes to some extent. Therefore, it performed better than FDAGNN_add but worse than FDAGNN.

Fig. 4

Performance Comparison of FDAGNN Using Different Feature Fusion Methods on the Dataset BUPT.

Fig. 5

Performance Comparison of FDAGNN Using Different Feature Fusion Methods on the Dataset SC.

In the experiment of FDAGNN-Loss (results are shown in Fig. 6 and Fig. 7), all the metrics of FDAGNN improved under different training ratios on SC, except for 40%. Among them, the largest boost reached one percentage point. However, FDAGNN performed worse than FDAGNNce when the training ratio was 40%. The possible reasons for this may be as follows. The topology graph of SC was constructed artificially using a similarity measure. In other words, we assigned an edge between two nodes when the feature similarity between them exceeded a certain threshold. Because of this processing, the feature similarity between the target node and its neighbor nodes is very high, and it is not easy to distinguish them with the cross-entropy loss function. In the case of samples whose features are very similar and not easily distinguishable, we refer to them as hard entities. While focal loss function is capable of improving the ability for discriminating hard entities. Therefore, when the training ratio was low, FDAGNNce could not assist with hard samples, but FDAGNN could. As the training ratio increased to a certain level, FDAGNNce became capable of handling the hard samples effectively. Nevertheless, if focal loss were introduced at that time, it may damage the model. With the dataset of BUPT, F1 rose only a little, not more than 0.4 percentage points, across all training scales. As for Accuracy, it was nearly the same. The topology of BUPT was based on real communication, FDAGNNce was able to separate different classes of samples based on the feature differences. In other words, the hard entities of BUPT were less, and as a result, focal-loss has a small impact on performance.

Fig. 6

Performance Comparison of FDAGNN Using Different Loss Functions on the Dataset BUPT.

Fig. 7

Performance Comparison of FDAGNN Using Different Loss Functions on the Dataset SC.

Fig. 8 and Fig. 9 illustrate diverse comparison results. On the dataset of BUPT, FDAGNN exceeded FDAGNN_/diff on all evaluation metrics under a variety of training ratios. F1 rose by more than 5 percentage points at maximum. In spite of the minimum training ratio of 5%, F1 improved by almost two percentage points. This proves that it is necessary to take advantage of feature differences. However, the results were opposite for SC. Apart from Recall under the 10% and 20% ratios, FDAGNN was worse than FDAGNN_/diff. The possible reason was that FDAGNN took advantage of the differences of the features between target nodes and their neighbors. However, the neighbors of the target nodes in the graph of SC were not the real neighbors in actual communication relationships. Therefore, the feature differences acted as noise for the original features. Furthermore, this result also confirmed the importance of the interaction behavior of calls, which was mentioned in the introduction of this paper, in telecom fraud detection.

Fig. 8

Performance Comparison of FDAGNN Using or not Using the Feature Differences on the Dataset BUPT.

Fig. 9

Performance Comparison of FDAGNN Using or not Using the Feature Differences on the Dataset SC.

5.4 Hyperparameter sensitivity

In this section, the sensitivity of FDAGNN to the hyperparameters was investigated with regard to different hidden dimensions, gamma values of the loss function and numbers of graph convolution layers (Fig. 10, Training ratio is 0.2). The Accuracy remains relatively stable for both BUPT and SC in terms of all the three hyperparameters. Due to the fact that Precision, Recall, and F1 of the fraud category are used as the evaluation metrics, different values of these parameters result in different results. Based on the performance with different values of hyperparameters (as shown in Fig. 10), the three hyperparameters were finally determined to be 32, 1 and 1 respectively.

Fig. 10

Note: Hidden-Dimension represents the dimension of feature embedding. Gamma denotes the gamma value in the loss function of FDAGNN. Layer represents the number of graph convolution layers in FDAGNN.

5.5 Run efficiency

It is essential to consider the runtime efficiency of algorithms in real-world applications. The number of graph convolution layers for all algorithms was set to 2, and the other parameters were set as described in corresponding papers. All algorithms were trained in a transductive manner. Then, the training time of each algorithm is recorded per 100 epochs. From these data, the statistical mean time per epoch is calculated. As shown in Table 4, FDAGNN shows a significant advantage over CARE-GNN and GEM on BUPT, CARE-GNN FRAUDRE and H²-FDetector on SC.

Table 4
Comparison of training time (seconds) per epoch between FDAGNN and baselines

Dataset GCN GAT GeniePath GEM CARE-GNN FRAUDRE H²-FDetector FDAGNN

BUPT 0.012 0.024 0.154 3.115 91.041 0.554 0.786 1.390

SC 0.010 0.016 0.052 0.739 5.061 2.288 3.125 2.017

Dataset	GCN	GAT	GeniePath	GEM	CARE-GNN	FRAUDRE	H²-FDetector	FDAGNN
BUPT	0.012	0.024	0.154	3.115	91.041	0.554	0.786	1.390
SC	0.010	0.016	0.052	0.739	5.061	2.288	3.125	2.017

6 Conclusion

In this paper, a graph convolution algorithm named FDAGNN for detecting telecommunication fraudsters is proposed. FDAGNN consists of two main modules: a difference-aware graph convolution module and a classification module that takes sample imbalance into consideration. The Difference-aware graph convolution module is made up of three components: the original feature transformation component based on a fully-connected layer, the feature differences aggregation component based on the graph attention mechanism, and the fusion component of the original feature and feature differences in embedding space in the GRU approach. Extensive experiments on two real-world telecom datasets demonstrate that FDAGNN outperforms seven baseline methods in terms of Precision, Recall, F1 of fraud, and Accuracy of total samples. For future work, the category imbalance problem will be further investigated, so as to improve the detection for telecom fraudsters. Meanwhile, the future research will involve privacy and security problems.

Footnotes

Acknowledgments

This work was supported by Major Science and Technology Special Projects of Henan Province in China (No.221100210100) and the Central Plains Talent Foundation of China (No.212101510002). The authors gratefully appreciate the anonymous reviewers for the valuable comments.

References

China Academy of Information and Communications Technology Research report on telecommunication network fraud management under the new situation, (2020).

Cui

, Wang

, Pei

and Zhu

, A survey on network embedding, IEEE Transactions on Knowledge and Data Engineering 31(5) (2018), 833–852.

Liu

, Ji

, Liu

and Bai

, Extended resource allocation index for link prediction of complex network, Physica A: Statistical Mechanics and its Applications 479 (2017), 174–183.

Liu

, Ji

, Liu

and Bai

, Similarity indices based on link weight assignment for link prediction of unweighted complex networks, International Journal of Modern Physics B 31(2) (2017), 1650254.

Kipf

T.N.

and Welling

, Semi-supervised classification with graph convolutional networks, arXiv preprint arXiv:1609.02907 (2016).

Hamilton

W.L.

, Ying

and Leskovec

, Inductive representation learning on large graphs, Advances in Neural Information Processing Systems 30 (2017).

Velicković

, Cucurull

, Casanova

, Romero

, Lio

and Bengio

, Graph attention networks, arXiv preprint arXiv:1710.10903 (2017).

, Xiong

, Ou

et al., An overview on the application of graph neural networks in wireless networks, IEEE Open Journal of the Communications Society 2 (2021), 2547–2565.

Jiang

, Graph-based deep learning for communication networks: A survey, Computer Communications (2021).

10.

Lee

, Yu

, Dai

et al., Graph neural networks meet wireless communications: Motivation, applications, and future directions, IEEE Wireless Communications 29(5) (2022), 12–19.

11.

Liang

, Liu

, Zhou

, Li

, Yang

and Qi

, Uncovering insurance fraud conspiracy with network learning, In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval 2019, pp. 1181–1184.

12.

Liu

, Chen

, Yang

, Zhou

, Li

and Song

, Heterogeneous graph neural networks for malicious account detection, In Proceedings of the 27th ACM International Conference on Information and Knowledge Management 2018, pp. 2077–2085.

13.

Wang

, Lin

, Cui

et al., A semi-supervised graph attentive network for financial fraud detection, pp, In 2019 IEEE International Conference on Data Mining (ICDM) (2019), 598–607.

14.

Liu

, Sun

, Ao

, Feng

, He

and Yang

, Intentionaware heterogeneous graph attention networks for fraud transactions detection, In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining 2021, pp. 3280–3288.

15.

Liang

, Zeng

, Zhong

, Chi

, Feng

, Ao

and Tang

, Credit risk and limits forecasting in e-commerce consumer lending service via multi-view-aware mixture-of-experts nets, In Proceedings of the 14th ACM International Conference on Web Search and Data Mining 2021, pp. 229–237.

16.

Wen

, Wang

, Wu

and Xiong

, Asa: Adversary situation awareness via heterogeneous graph convolutional networks, In Companion Proceedings of the Web Conference (2020), pp. 674–678.

17.

Bian

, Xiao

, Xu

, Zhao

, Huang

, Rong

and Huang

, Rumor detection on social media with bi-directional graph convolutional networks, In Proceedings of the AAAI Conference on Artificial Intelligence 34(01) (2020), 549–556.

18.

Alhosseini

S. Ali

, Tareaf

R. Bin

, Najafi

and Meinel

, Detect me if you can: Spam bot detection using inductive rep-resentation learning, In Companion Proceedings of The 2019 World Wide Web Conference 2019, pp. 148–153.

19.

, Qin

, Liu

, Yang

and Li

, Spam review detection with graph convolutional networks, In Proceedings of the 28th ACM International Conference on Information and Knowledge Management 2019, pp. 2703–2711.

20.

Wang

, Wen

, Wu

, Huang

and Xiong

, Fdgars: Fraudster detection via graph convolutional networks in online app review system, In Companion Proceedings of The 2019 World Wide Web Conference 2019, pp. 310–316.

21.

Weber

, Domeniconi

, Chen

, Weidele

D.K.I.

, Bellei

, Robinson

and Leiserson

C.E.

, Anti-money laundering in bitcoin: Exerimenting with graph convolutional networks for financial forensics. arXiv preprint arXiv:1908.02591, 2019.

22.

Zhang

, Fan

, Ye

, Zhao

and Shi

, Key player identification in underground forums over attributed heterogeneous information network embedding framework, In Proceedings of the 28th ACM International Conference on Information and Knowledge Management 2019, pp. 549–558.

23.

Yan

, Jiang

and Liu

, Telecomm fraud detection via attributed bipartite network, In 2018 15th International Conference on Service Systems and Service Management (ICSSSM) 2018, pp. 1–6.

24.

Liu

, Liao

, Wang

and Qi

, AGRM: Attention-based graph representation model for telecom fraud detection, In ICC 2019-2019 IEEE International Conference on Communications (ICC) 2019, pp. 1–6.

25.

, Li

, Yuan

and Lu

, Multi-range gated graph neural network for telecommunication fraud detection, In 2020 International Joint Conference on Neural Networks (IJCNN) (2020), pp. 1–6.

26.

Liu

, Dou

, Yu

P.S.

, Deng

and Peng

, Alleviating the inconsistency problem of applying graph neural network to fraud detection, pp, In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval 2020, 1569–1572.

27.

Dou

, Liu

, Sun

, Deng

, Peng

and Yu

P.S.

, Enhancing graph neural network-based fraud detectors against camouflaged fraudsters, In Proceedings of the 29th ACM International Conference on Information & Knowledge Management 2020, pp. 315–324.

28.

Zhang

, Wu

, Yang

, Beheshti

, Xue

, Zhou

and Sheng

Q.Z.

, FRAUDRE: Fraud detection dual-resistant to graph inconsistency and imbalance, In 2021 IEEE International Conference on Data Mining (ICDM) 2021, pp. 867–876.

29.

Shi

, Cao

, Shang

et al., H2-FDetector: A GNN-based Fraud Detector with Homophilic and Heterophilic Connections, Proceedings of the ACM Web Conference 2022, pp. 1486–1494.

30.

, He

, Xu

et al., Dual-Augment Graph Neural Network for Fraud Detection, Proceedings of the 31st ACM International Conference on Information & Knowledge Management 2022, pp. 4188–4192.

31.

Cho

, Van Merriïnboer

, Gulcehre

, Bahdanau

, Bougares

, Schwenk

and Bengio

, Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.

32.

Lin

T.Y.

, Goyal

, Girshick

, He

and Dollár

, Focal loss for dense object detection, In Proceedings of the IEEE International Conference on Computer Vision 2017, pp. 2980–2988.

33.

Gilmer

, Schoenholz

S.S.

, Riley

P.F.

, Vinyals

and Dahl

G.E.

, Neural message passing for quantum chemistry, In International Conference on Machine Learning 2017, pp. 1263–1272.

34.

Hochreiter

and Schmidhuber

, Long short-term memory, Neural Computation 9(8) (1997), 1735–1780.

35.

, Sun

, Fu

, Hong

, Wang

and Wang

, A neural influence diffusion model for social recommendation, In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval 2019, pp. 235–244.

36.

, Zhang

, Gao

, He

, Weng

, Gao

and Chen

, Dual graph attention networks for deep latent representation of multifaceted social effects in recommender systems, In The World Wide Web Conference 2019, pp. 2091–2102.

37.

, Chen

, Zhang

, Liu

, Huang

and Pei

, A directed link prediction method using graph convolutional network based on social ranking theory, Intelligent Data Analysis 25(3) (2021), 739–757.

38.

Shang

, Xiao

, Ma

, Li

and Sun

, Gamenet: Graph augmented memory networks for recommending medication combination, In proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, No. 01, 2019, pp. 1126–1133.

39.

, Wang

, Chen

, Tao

and Zhao

, Mr-gnn: Multiresolution and dual graph neural network for predicting structured entity interactions. arXiv preprint arXiv:1905.09558, 2019.

40.

Liu

, Chen

, Li

, Zhou

, Li

, Song

and Qi

, Geniepath: Graph neural networks with adaptive receptive paths, In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, No. 01, (2019), pp. 4424–4431.

41.

Wang

, Zhu

, Bo

, Cui

, Shi

and Pei

, Am-gcn: Adaptive multi-channel graph convolutional networks, In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2020, pp. 1243–1253.

42.

, Chen

, Liu

et al., BTG: A Bridge to Graph machine learning in telecommunications fraud detection, Future Generation Computer Systems 137 (2022), 274–287.

Feature difference-aware graph neural network for telecommunication fraud detection

Abstract

Keywords

1 Introduction

2.1 Graph neural networks

2.2 Graph-based fraud detection

2.3 Graph-based telecom fraud detection

3 Problem definition

4 The proposed method

4.1 Overview

Table 1 Statistics Information of datasets Dataset Nodes Edges Features User Classes User Numbers Average Degree BUPT 116383 350751 39 Benign 99861 3.013 Fraud 8448 Courier 8074 SC 6106 838528 55 Benign 4144 137.329 Fraud 1962

5.1 Experimental setup

Table 2 Hyperparameter settings for FDAGNN Dataset lr weight-decay feat-drop attn-drop BUPT 0.004 0.0006 0 0 SC 0.001 0.0006 0.3 0.7

Table 4 Comparison of training time (seconds) per epoch between FDAGNN and baselines Dataset GCN GAT GeniePath GEM CARE-GNN FRAUDRE H2-FDetector FDAGNN BUPT 0.012 0.024 0.154 3.115 91.041 0.554 0.786 1.390 SC 0.010 0.016 0.052 0.739 5.061 2.288 3.125 2.017

Footnotes

Acknowledgments

References

Table 1
Statistics Information of datasets

Dataset Nodes Edges Features User Classes User Numbers Average Degree

BUPT 116383 350751 39 Benign 99861 3.013

Fraud 8448

Courier 8074

SC 6106 838528 55 Benign 4144 137.329

Fraud 1962

Table 2
Hyperparameter settings for FDAGNN

Dataset lr weight-decay feat-drop attn-drop

BUPT 0.004 0.0006 0 0

SC 0.001 0.0006 0.3 0.7

Table 4
Comparison of training time (seconds) per epoch between FDAGNN and baselines

Dataset GCN GAT GeniePath GEM CARE-GNN FRAUDRE H²-FDetector FDAGNN

BUPT 0.012 0.024 0.154 3.115 91.041 0.554 0.786 1.390

SC 0.010 0.016 0.052 0.739 5.061 2.288 3.125 2.017