Collusive anomalies detection based on collaborative markov random field

Abstract

Abnormal collusive behavior, widely existing in various fields with concealment and synergy, is particularly harmful in user-generated online reviews and hard to detect by traditional methods. With the development of network science, this problem can be solved by analyzing structure features. As a graph-based anomaly detection method, the Markov random field (MRF)-based model has been widely used to identify the collusive anomalies and shown its effectiveness. However, existing methods are mostly unable to highlight the primary synergy relationship among nodes and consider much irrelevant information, which caused poor detectability. Therefore, this paper proposes a novel MRF-based method (ACEagle), considering node-level and community-level behavior features. Our method has several advantages: (1) based on the analysis of the nodes’ local structure, the community-level behavioral features are combined to calculate the nodes’ prior probability to close the ground truth, (2) it measured the behavior’s collaborative intensity between nodes by time and weight, constructing MRF by the synergic relationship exceeding the threshold to filter irrelevant structural information, (3) it operates in a completely unsupervised fashion requiring no labeled data, while still incorporating side information if available. Through experiments in user-reviewed datasets where abnormal collusive behavior is most typical, the results show that ACEagle is significantly outperforming state-of-the-art baselines in collusive anomalies detection.

Keywords

Collusive behavior structure features graph-based anomaly detection markov random field user-generated online reviews

1. Introduction

With the development of anti-anomaly detection, anomalies adopting various camouflage methods [1] to avoid identification, present diversification, and the mainstream variation trend is from isolated to collusive, such as opinion spammer [2, 3], network intrusion [4], and telecommunications fraud [5, 6]. At present, it is still an urgent problem to be solved how to effectively detect hidden and variable abnormal behaviors, especially in user-generated online reviews, in which abundant spammers will severely affect users’ decisions. To tackle the abnormality detection problem, many features, text content feature [7], user attribute feature [8], traffic feature [9], and structure feature [10, 11] have been considered in the past decades. Among them, structures feature acquired by abstracting behavior as a graph, which is difficult to forge and explanatory [11, 12, 13], has been becoming the main focus and developed a series of methods [14, 15, 16, 17].

The graph-based anomaly detection methods abstract the behavior between entities, transforming the problems that anomaly detection into identifying abnormal structures in a complex network. According to the different definitions of anomaly structure, graph anomaly detection dividing into four categories: structure-based, probability-based, community-based, and decomposition-based [18]. There have been in-depth studies on detection through isolated anomaly structure, Ref. [19] is based on the egonet of nodes and builds feature space to find abnormal nodes by analyzing its relationship. Ref. [20, 21] further considered the directional characteristics. However, the above methods are not ideal in detecting collusion behavior.

Compared to supervised methods, unsupervised methods have received widespread attention in research in recent years, with the advantages laying in the low dependence on historical anomalous features, high precision, and robustness for new exceptions. Recently, the Markov random field (MRF)-based method such as FraudEagle [22] and SpEagle [23], as a classic unsupervised model, which utilizes the original network structure to distinguish anomalies behaviors, has shown superior to other ranking methods. ColluEagle [24] extends the model by reconstructing MRF further deliberate both network effects and time effects. Nevertheless, prior probability calculation of the above techniques ignores anomaly cooperative behavior characteristics, making it difficult to distinguish between spammers and benign communities. In this paper, considering both the features of individual and group cooperation, we propose a novel MRF-based model named ACEagle that can detect both collusive spammers and efficiently avoid involving ordinary communities. We summarize the contributions of this work as follows.

•
We propose ACEagle, a novel pairwise-MRF model, which elegantly embeds the collusive phenomenon to formulate it as a reorganized network $G_{C}$ . The potential function endowing to edge in $G_{C}$ considers the joint conditional probability of prior probability and synergistic strength. Then LBP (Loopy Belief Propagation) is utilized to infer each node’s approximate accurate marginal probability on pairwise-MRF.
•
Considering the diversity of nodal local structure abstracted by entities behavior, we design node-level features which extract variable entity’s behavior characteristics to illustrate this discrepancy. Then, the review system was selected as the specific scenario, used to analyze the strange properties of spammer reviews and design corresponding structural features. To specifically quantify this abnormity, we adopt two normalizations, namely cumulative distribution function (CDF) and logarithmic function (Log), to highlight numerical differences or deviation from the overall distribution of different behavioral characteristics.
•
To accurately calculate the anomaly prior probability of nodes, we design community-level features which combine network effects and node-level behavior features, to measure the suspicion of collusive behaviors among nodes. The anomaly prior probability of nodes is adjusted by analyzing the structural characteristics of network $G_{S}$ composed of suspicious behaviors.

The remainder of this paper is organized as follows. Section 2 discusses the related work of collusive anomalies detection and includes the MRF model and its inference method. In Section 3, the description of ACEagle is divided into three parts, namely the feature extraction and aggregation, constructing collaborative network $G_{C}$ , and the inference of marginal probability on MRF. Section 4 gives the experimental results on YelpZip and YelpNYC, datasets with actual abnormity, and compare with baselines. We conclude our work in Section 5.
2. Related work

Akoglu and Tong first systematically summarized the graph-based anomaly detection methods and studying the application scenarios of collusion behavior [14]. Pourhabibi further surveys each specific scene in detail and outlines the relevant techniques in recent years [18]. This issue aroused extensive interest and many works in this research field to solve it [25, 26, 27, 28, 29].

Xu et al. [30] first proposed SCAN algorithm based community clustering on the undirected graph, and Bryan further considered behavioral directivity [31], which detect anomaly behaviors by expressing as a community. However, it limits detectability that how to distinguish the normal community resembles abnormal’s which has similar behavior habits, preferences, and others [32]. Ye et al. [33] quantify the potential of nodes as targets for aberrant collusive behavior from the local-structure information, analyzing the overlapping neighbors of the node with higher target outlier to find the strange group. Jiang et al. [34] believe rarity and synchronization exist in abnormal collaborative behavior, quantifying behavior with neighbor behavior and mapping into feature space to find deviant collaborative groups. However, it still fails to distinguish the normal cooperative behaviors which similar to abnormal accurately.

Assuming that abnormal individuals have a solid cooperative relationship while normal’s close but relatively irregular, Ref. [22] proposed FraudEagle taking the original network as MRF and applying LBP to detect anomalies by inferring the anomaly probability under the premise of random labeling of nodes. Nonetheless, the accuracy of FraudEagle is inadequate without prior knowledge, SpEagle [23] designed a series of abnormal features for specific scenes to calculate the initial prior probability of nodes, improving the detection capability significantly and proving that LBP is effective and extensible. ColluEagle [24] reconstructs a collaborative network as MRF composed of the synergy relationship considering the effects of time and network, further improving detection efficiency. The above MRF-based methods though the interconnection potential function of closer cooperative relationship among abnormal nodes converge the likelihood of anomalies to higher anomaly probability interval, while the benign individuals lower. Its strong universality and low computational complexity make the MRF-based methods applied to a variety of abnormal scenarios and widely used in the field of anomaly detection in recent years [35, 36, 37]. Its specific concepts will introduce as follows.

2.1 Markov random field

As one of the classical models of probability graphs, the Markov random field, regarded as a Markov undirected network composed of multiple Markov chains. Therefore, it also applies to the Markov process, namely the implicit state of any node is determined jointly by the neighbor nodes’, using to solve reasoning problems in uncertain data by modeling the implicit states of nodes and their relations [38, 39]. As shown in Fig. 1, each node in MRF has arbitrary possible states expressed as a hollow node named hidden node constituting an undirected network structure. For a hidden node $i$ with n possible states, each state is denoted as $l_{i}$ and assumed to appear in probability $x_{i}$ , represented by solid nodes.

Figure 1.

Schematic diagram of Markov random field model.

2.2 Belief propagation model

Based on the above concepts, for network $G=(V,E)$ , which consist of $N$ nodes $\mathcal{V}=(v_{1},v_{2}\ldots,v_{N})$ , the corresponding state set of each node $v_{i}$ is $\mathcal{L}=\{l_{1},l_{2}\ldots,l_{k}\}$ . $x_{i}$ represents the probability that a node in state $l_{i}$ . Therefore, nodes can be regarded as discrete random variables, $\mathcal{X}=(X_{1},X_{2}\ldots,X_{N})$ represents the state vector formed by $N$ nodes in the network in a certain state, and $p(\mathcal{X})$ is the marginal probability of network $G$ in a state $\mathcal{X}$ . As the direct communication relation between nodes, The links are taken as the joint probability potential function $\psi(v_{i},v_{j})$ between nodes to reflect the influence of inter-node behavior on probability transmission accurately. Any node pairs linked can be regarded as a clique that constitutes pairwise-MRF. Then, $p(\mathcal{X})$ is calculated as follows, where $Z$ is the partition function.

$\displaystyle Z=\sum\limits_{\mathcal{L}}\prod\limits_{v_{i}\in V}\psi(v_{i})% \prod\limits_{(v_{i},v_{j})\in E}\psi(v_{i},v_{j})$ (1) $\displaystyle p(\mathcal{X})=\frac{1}{Z}\prod\limits_{v_{i}\in V}\psi(v_{i})% \prod\limits_{(v_{i},v_{j})\in E}\psi(v_{i},v_{j})$ (2)

For the network formed by $N$ nodes, if each node has $k$ possible states, then the state vector has $k^{N}$ possible ways of existence. With the expansion of the network scale, the computation complexity using precise reasoning methods of the network edge probability will increase exponentially, which is unacceptable. Therefore, LBP adopts the approximate reasoning method on MRF having a good effect on the network structure with loop, which can significantly reduce the computational complexity and achieve high accuracy. As shown in the Eqs (3) and (4), the node according to its prior probability and the boundary potential function iteratively to neighbors on a network message, $N(v_{i})$ is the neighbor set of $v_{i}$ , by the same token, also based on node around the passed the message to update their probability, probability convergence until all nodes on the network, get according to the network nodes of each node prior probability reasoning in the edge probability $b(v_{i})$ .

$\displaystyle m_{v_{i}\rightarrow v_{j}}(v_{j})=\frac{1}{Z}\prod\limits_{v_{i}% \in\mathcal{L}}\psi(v_{i})\psi(v_{i},v_{j})\prod\limits_{v_{k}\in N(v_{i})% \backslash v_{j}}m_{v_{k}\rightarrow v_{i}}(v_{i})$ (3) $\displaystyle b(v_{i})=\frac{1}{Z}\psi(v_{i})\prod\limits_{v_{k}\in N(v_{i})}m% _{v_{k}\rightarrow v_{i}}(v_{i})$ (4)

3. Methodology

The belief propagation method infers based on the prior probability evaluated from the anomaly feature, mainly depending on the depiction of symbiosis in MRF construction and potential function design. At present, most of the MRF-based methods are less focused on the construction of MRF, which taking the original network as MRF or reconstructs the MRF by simply considering the Spatio-temporal synchronization relationship [24]. The inaccurate information received by neighbors may fail to highlight principal correlations between individuals lead to the low accuracy of anomaly detection through inference. Hence, we propose ACEagle, which is an entirely unsupervised fashion and can easily accommodate labels when available, analyzing node-level and community-level behavior features to detect collusive anomalies.

In Section 3.1, we first introduce the node-level and community-level behavior features, aggregating the factors to evaluate nodes’ prior probability. The reconstruction of MRF and the design of potential function described in Section 3.2. The process of ACEagle is shown in Fig. 2.

Figure 2.

The anomaly detection process of ACEagle. Network $G$ consists of 8 normal nodes and 5 anomalous nodes. Anomaly behaviors between nodes are shown as dotted links and the behavior difference in weight and time between node pairs with common neighbor shown as $\triangle w$ and $\triangle t$ . The $P_{v_{i}}$ represents the prior probability of node $v_{i}$ and CDF denoted as $F(\ast)$ . Network $G_{S}$ consists of suspicious collusive behaviors and Network $G_{C}$ consists of cooperative behaviors.

3.1 Feature extraction

Abnormal collusive behavior is close to normal whereas having apparent Spatio-temporal synchronization. This trait is particularly evident in the spam review of user-generated online reviews. The first step of our model needs to extract behavior features to calculate the nodes’ prior probability. Therefore, we select two real commodity review datasets with spurious reviews to extract features, YelpZip and YelpNYC, which were acquired by a user-generated online reviews system, describing the scenario that spammers originate collusive review behavior to disturb products’ rating. These datasets can be abstracted into corresponding graphs, where the nodes exist in two categories: product and user. In addition, edges represent the user’s reviews, which timestamp and rating represent the edge’s appearance time and the weight respectively.

3.1.1 Node-level behavior feature

To preliminarily outline the anomaly degree of nodes in $G$ , time and weight of individual behaviors are used to characterize user behaviors, as shown in Table 1. According to the features designed based on the local connection, we use the time, weight, and structure to portray behavior. Then, on the complex network $G$ formed by individual communication relations, the prior probability calculation process of nodes based on node-level behavior features can be divided into the following two steps.

Table 1
Description of abnormal feature of node-level behavior

Feature	Normalization ({H, L} $\|$ {L, C})	Types (A, M, S)	Description
MNR	(H, L)	A	Maximum of reviews written in a day [40, 41, 42].
WRD	(H, C)	S	The weight of rank order among the reviews of product [43].
ERD	(L, C)	M	Entropy of rating distribution of users reviews [43].
TRD	(L, C)	S	The proportion of user’s review rating in the reviews that corresponding products received.
SRD	(H, C)	M	The difference between the ratio of positive and negative review of user.
WRF	(H, C)	M	The joint weight of user $v_{i}$ ’s review frequency and the deviation of the rating of $v_{i}$ from the average rating of product $v_{j}$ , where $t_{v_{i}}^{\textit{first}}-t_{v_{i}}^{\textit{last}}$ is the lifetime between last and first review of user $v_{i}$ , and $D(v_{i})$ is the number of reviews at the period of time.
			$\textit{WRF}(v_{i})=\frac{D(v_{i})}{t_{v_{i}}^{\textit{first}}-t_{v_{i}}^{% \textit{last}}}*\textit{avg}_{v_{j}\in N(v_{i})}(\|w_{v_{i}v_{j}}-\textit{avg}_% {v_{k}\in N(v_{j})}w_{v_{k}v_{j}}\|)$
MRD	(H, L)	A	The maximum deviation between the user rating and the average rating of the corresponding product.
			$\textit{MRD}(v_{i})=\max_{v_{j}\in N(v_{i})}(w_{v_{i}v_{j}}-\textit{avg}_{v_{k% }\in N(v_{j})}w_{v_{k}v_{j}})$

•

Step1 Normalize. To eliminate the different scales of each feature and unify them into a comparable scale, we consider the features sensitive to numerical differences, such as MNR and MRD, adopting the log normalization method to highlight the degree of node-level anomalies. At the same time, CDF (Cumulative Distribution Function) is used to normalize the characteristics that are more sensitive to the differences compared with the overall distribution, such as WRD, ERD, TRD, SRD, and WRF, to highlight the degree of anomalies between nodes and the whole distribution. For each feature $l$ , $1\leqslant l\leqslant F$ , where $F$ is the total number of features for that kind of nodes, and the corresponding value of node $v_{i}$ , denoted by $x_{li}$ . The value of features is positively correlated with the intensity of anomaly or not, denoted as $\{H,L\}$ , where $H$ represents a positive correlation between the characteristic value and abnormity, and $L$ represents a negative correlation. Meanwhile, the normalization method set is denoted as $\{L,C\}$ , where $C$ represents CDF, and $L$ represents the Log normalization method. Therefore, we compute

$\displaystyle f(x_{li})=\left\{\begin{array}[]{lcl}1-P(X_{l}\leqslant x_{li})&% &(H,C)\\ P(X_{l}\leqslant x_{li})&&(L,C)\\ 1-\log_{\max(x_{i}+1)}({x_{li}+1})&&(L,L)\\ \log_{\max(x_{i}+1)}({x_{li}+1})&&(H,L)\end{array}\right.$ (5)

Meanwhile, partial features are only applicable to nodes with corresponding local structures owing to differences, such as SRD, ERD, and WRF, which not suitable for nodes with a single edge, failing to distinguish them will interfere with the measurement of anomaly probability. Therefore, In Table 1, the appropriate feature types are divided into $\{A,M,S\}$ , where $M$ means the feature is only applicable to the multi-link node, $S$ is only suitable to the node owned single link, and $A$ represents that it applies to all nodes.

•

Step 2 Combine. Given $F$ features of node $v_{i}$ , the prior probability measured by node-level behaviors of node $v_{i}$ is computed by combining all the normalized feature values $f(x_{li})$ of node $v_{i}$ , i.e.,

$\displaystyle I(v_{i})=1-\sqrt{\frac{\sum_{l=1}^{F}f(x_{li})^{2}}{F}}$ (6)

3.1.2 Community-level behavior feature

Since the above abnormal probability obtained merely from the individual local structure is bound to have a certain degree of deviation, the magnitude of such variation will directly affect the final result of LBP. To narrow the deviation as possible to improve accuracy, the network structure used from the overall perspective measures the suspicion of the collaborative relationship among nodes to enhance the anomaly probability of anomalies and suppress ordinary respectively.

Edge-edge suspicious collusion score

Given node pair $(v_{i},v_{j})$ with common neighbor $v_{k}$ , we define the time interval between the edges from $(v_{i},v_{j})$ to the $v_{k}$ is $\triangle t_{v_{k}}=|t_{v_{i}v_{k}}-t_{v_{j}v_{k}}|$ , while the weight interval denotes as $\triangle w_{v_{k}}=|w_{v_{i}v_{k}}-w_{v_{j}v_{k}}|$ . Hence, we can measure the difference using CDF, denoted as $F(\ast)$ , to normalize the collusion degree between edges and convert to $[0,1]$ . Defining the synergistic relationship intensity are $F(\triangle w_{v_{k}})$ , $F(\triangle t_{v_{k}})$ , and assuming it is independent, so the joint probability $F(\triangle w_{v_{k}},\triangle t_{v_{k}})$ satisfies $F(\triangle w_{v_{k}},\triangle t_{v_{k}})=F(\triangle w_{v_{k}})F(\triangle t% _{v_{k}})$ . Then, combing $p_{v_{i}}$ based node-level behavior feature, which means the anomaly probability of node $v_{i}$ . We adopted $\textit{tanh}(x)$ which is the activation function to quantify the degree of suspicious relationship $w(v_{i},v_{j},v_{k})$ of the edges from $(v_{i},v_{j})$ to common neighbor $v_{k}$ as:

$\displaystyle w(v_{i},v_{j},v_{k})=\frac{2F(p_{v_{i}})F(p_{v_{j}})}{1+e^{-2F(% \vartriangle w_{v_{k}})F(\vartriangle t_{v_{k}})}}-1$ (7)

As shown in Eq. (7), $w(v_{i},v_{j},v_{k})\in[-1,1]$ , for node pair $(v_{i},v_{j})$ both with high prior probability, in which the time interval of edges aimed at common neighbor $v_{k}$ is small, and weights are similar, will obtain a higher value of $w(v_{i},v_{j},v_{k})$ , more likely be anomaly collaborator.

Node-node suspicious collusion score

There may be multiple common neighbors between $v_{i}$ and $v_{j}$ , $w_{\textit{avg}}(v_{i},v_{j})$ represent the average of $w(v_{i},v_{j},v_{k})$ between $(v_{i},v_{j})$ , As shown in Eq. (8). Given nodes $v_{i}$ and $v_{j}$ , let $N(v_{i})$ and $N(v_{j})$ represent the neighbor set respectively:

$\displaystyle w_{\textit{avg}}(v_{i},v_{j})=\frac{\sum\limits_{N(v_{k})\in N(v% _{i})\bigcap N(v_{j})}w(v_{i},v_{j},v_{k})}{|N(v_{i})\bigcap N(v_{j})|}$ (8)

Through the above analysis, most behaviors can be preliminarily distinguished, but it is still possible for the cooperative relation between benign node pair $(v_{i},v_{j})$ with similar behavior habits which wrongly assigned a high prior probability will mistakenly endowing with high $w_{\textit{avg}}(v_{i},v_{j})$ . To tackle the barrier, we assume the difference exists in network structure between various role nodes that the proportion of common neighbors between abnormal nodes should be higher than normal nodes. we further measure the structural similarity of nodes $v_{i}$ and $v_{j}$ adopting Jaccard similarity to distinguish. So we define the suspicious collusion between $v_{i}$ and $v_{j}$ as:

$\displaystyle\textit{Jaccard }(v_{i},v_{j})=\frac{|N(v_{i})\bigcap N(v_{j})|}{% |N(v_{i})\bigcup N(v_{j})|}$ (9) $\displaystyle W(v_{i},v_{j})=w_{\textit{avg}}(v_{i},v_{j})*\textit{Jaccard }(v% _{i},v_{j})=\frac{\sum\limits_{v_{k}\in N(v_{i})\bigcap N(v_{j})}w(v_{i},v_{j}% ,v_{k})}{|N(v_{i})\bigcup N(v_{j})|}$ (10)

To include all possible suspicious relations as far as possible, setting the threshold $\epsilon=0$ and keep all links that satisfy $W(v_{i},v_{j})>\epsilon$ to build suspicious collaboration network $G_{S}$ . The cooperative behavior between $v_{i}$ and $v_{j}$ with a great value of $W(v_{i},v_{j})$ represent the suspicious collusive behavior that satisfies the conditions: (i) the neighborhood is highly consistent; (ii) the behaviors are similar in time and weight; (iii) both $v_{i}$ and $v_{j}$ have high anomaly probability. On the contrary, a smaller value means the normal cooperative behavior that does not meet one or more of the above conditions. Therefore, through the analysis of the network characteristics of $G_{S}$ , the abnormal prior probability of nodes with normal cooperative behaviors can be punished, and the strange prior probability of actual abnormal nodes can be improved. The characteristics are shown in Table 2.

Table 2

Description of abnormal feature of community-level behavior

Feature	Normalization ({H, L} $\|$ {L, C})	Description
CLU	(H, C)	The clustering coefficient of node $v_{i}$ ’s egonet in $G_{S}$ .
MSS	(H, L)	The maximum weight of the edges in $G_{S}$ linked node $v_{i}$ .
MES	(H, L)	The average weight of the edges in $G_{S}$ linked node $v_{i}$ .
STS	(L, L)	The standard deviation of the edges weight in $G_{S}$ linked node $v_{i}$ .
LSS	(L, L)	The principal eigenvalue of the adjacency matrix of node $v_{i}$ ’s egonet in $G_{S}$ .

By extracting the egonet of node $v_{i}$ on $G_{S}$ , we can obtain all suspicious relationships that $W(v_{i},v_{j})>0$ between any node pairs in the set consisted of $v_{i}$ and its neighbors. There may exist dubious links with multiple neighbor nodes for benign simultaneously, but the probability of suspicious cooperative behavior among neighbors is lower than in the egonet of abnormity. Similarly, the weight of edges belongs to the benign node egonet on $G_{S}$ is small, and the distribution is irregular. In terms of statistical characteristics reflected in the network structure shown compared to the egonet of typical nodes, the maximum and average of edges belong to an eccentric structure are more prominent, the variance is minor, and the clustering coefficient is high. In addition, the principal eigenvalue of collusive behavior is also lower than normality in that similar characteristics, uniform weights, and a relatively consistent target set. Equation (11) is designed to measure the anomaly of each node on $G_{C}$ under the above cooperation characteristics and is expressing in the form of prior probability. The process of normalization of features is the same as described in Section 3.1.1, where K represent the number of the community-level behavior features.

$\displaystyle C(v_{i})=1-\sqrt{\frac{\sum_{n=1}^{K}f(x_{ni})^{2}}{K}}$ (11)

To explain the statistical process of abnormal cooperative behavior characteristics clearly, we using the specific example given in Fig. 3 to illustrate. Firstly, towards the original network $G$ , the corresponding initial anomaly probability $I(v_{i})$ pre-allocating to each node on the grounds of individual behavior features. Then, for node pairs $(v_{i},v_{j})$ with common neighbors, traverse all their common neighbors and calculate $w(v_{i},v_{j},v_{k})$ concerning any common neighbor $v_{k}$ using Eq. (7). According to the Eq. (10), the $w(v_{i},v_{j},v_{k})$ of all common neighbors and structural similarity between $(v_{i},v_{j})$ considered to measure the suspicious collusiveness between $(v_{i},v_{j})$ denoted as $W(v_{i},v_{j})\in[-1,1]$ . By retaining the edges with $W(v_{i},v_{j})>0$ , we reconstruct the suspicious cooperative network $G_{S}$ which is shown in Fig. 3, black nodes represent normal nodes, and red nodes represent abnormal nodes. Under the strict collaborative constraints of considering the prior probability and the weight, time of behavior, the $W(v_{i},v_{j})$ of most normal synergistic behaviors between nodes are negative and filtered. Although $G_{S}$ still containing the remaining few normal nodes and edges, which composing egonet without consistent purpose compared with the abnormal, leading to the synergism exhibited is relatively unstable, and $W(v_{i},v_{j})$ of edge in it is generally low and fluctuates wildly.

Therefore, we consider the feature both of individual and cooperative behavior when measuring the anomaly prior probability by aggregating them using the following formula, which aims to improve the anomaly likelihood of actual abnormal nodes while suppressing normal nodes.

$\displaystyle S(v_{i})=I(v_{i})C(v_{i})$ (12)

Figure 3.

The process of $G_{S}$ construction and the extraction of cooperative behavior features. (a) is the original network structure $G$ including anomalous nodes. (b) is the reconstructed network consist of the collusive behaviors between the node pair with common neighbors in $G$ . (c) is the network $G_{S}$ retaining the links which value is positive in (b). (d) is the process of feature extraction using the egonet for each node.

3.2 Collaborative markov random field

Since merely considering suspicious synergies in the construction of abnormal synergies graph $G_{S}$ above, results only covering a part of nodes of the original network $G$ . Aiming to infer according to the synergistic relationship of all nodes, without considering whether it is suspicious or not, we extract the collaborative relationships among all nodes in the original network $G$ and use this collaborative relationship network as MRF for belief propagation. The nodes in MRF consist of collaborative relationships reasoning as much as possible according to the adequate joint information of neighbors to avoid the interference of invalid input in the propagation process on the node edge probability calculation.

Table 3
The potential function design of ACEagle

[height=0.8cm,width=2.4cm] $v_{j}$ $v_{i}$	H	F
H	$P(v_{i}=H,v_{j}=H\|\textit{collu}(v_{i},v_{j}))$	$P(v_{i}=H,v_{j}=F\|\textit{collu}(v_{i},v_{j}))$
F	$P(v_{i}=F,v_{j}=H\|\textit{collu}(v_{i},v_{j}))$	$P(v_{i}=F,v_{j}=F\|\textit{collu}(v_{i},v_{j}))$

[b] ACEagle $G$ : network abstracted from individual behavior with weight and time; $I$ : prior probability calculated from individual behavior features; $\epsilon$ : a threshold for generating the suspicious collaboration network $G_{S}$ ; $\delta$ : a threshold for generating the collusive network $G_{C}$ ; Ranked individuals based on the belief;

Description:Calculate suspicious collusiveness $W(v_{i},v_{j})$ of $(v_{i},v_{j})$ in G (V, E);each node pair $(v_{i},v_{j})\in G$ $W(v_{i},v_{j})>\epsilon$ $V_{S}\longleftarrow v_{i},v_{j};$ $E_{S}\longleftarrow(v_{i},v_{j});$ Construct the suspicious cooperative network $G_{S}(V_{S},E_{S})$ ;Calculate prior probability $I(v_{i})$ for node $v_{i}\in G_{S}$ ;Calculate final prior probability $S(v_{i})$ based on $I(v_{i})$ and $C(v_{i})$ using Eq. (12);Calculate collusiveness $\textit{collu}(v_{i},v_{j})$ of $(v_{i},v_{j})$ in G (V, E);each node pair ( $v_{i},v_{j})\in G$ $\textit{collu}(v_{i},v_{j})>\delta$ $V_{C}\longleftarrow v_{i},v_{j};$ $E_{C}\longleftarrow(v_{i},v_{j});$ Construct the collaborative network $G_{C}(V_{C},E_{C})$ ;each node $v_{i}\in V$ $v_{i}\in V_{S}$ $\psi(v_{i})=(1-S(v_{i}),S(v_{i}))$ $\psi(v_{i})=(1-I(v_{i}),I(v_{i}))$ each node pair $(v_{i},v_{j})\in G_{C}$ all messages not stop changing $v_{i}\in V$ update $m_{v_{j}\longrightarrow v_{i}}(v_{i})$ using Eq. (5); Compute belief $b(v_{i})$ using LBP; return Ranked individuals based on the belief using LBP;

Collusiveness of node pair

Suppose there is a common neighbor node $v_{k}$ for the node pair $(v_{i},v_{j})$ , we could extract the time and weight difference of the links between $(v_{i},v_{j})$ and $v_{k}$ by using CDF, denote as $F(\triangle t)$ and $F(\triangle w)$ , and $\sigma(v_{i},v_{j},v_{k})$ is used to represents the collusiveness of node pair $(v_{i},v_{j})$ with $v_{k}$ , as shown in Eq. (13). While there may be multiple neighbors between $v_{i}$ and $v_{j}$ , we take the maximum to highlight the dominating synergies of $(v_{i},v_{j})$ denoted as $collu(v_{i},v_{j})$ , as shown in Eq. (14).

$\displaystyle\sigma(v_{i},v_{j},v_{k})=\log_{2}^{1+F(\bigtriangleup w)F(% \bigtriangleup t)}$ (13) $\displaystyle\textit{collu}(v_{i},v_{j})=\max\limits_{v_{k}\in N(v_{i})\bigcap N% (v_{j})}\sigma(v_{i},v_{j},v_{k})$ (14)

Based on the above analysis, we can construct a collaborative graph $G_{C}(V_{C},E_{C})$ , where $V_{C}=\{v_{i}|v_{i}\in G(V,E)\}$ , $E_{C}=\{(v_{i},v_{j})|collu(v_{i},v_{j})\geqslant\delta,v_{i},v_{j}\in V_{C}\}$ and $\delta$ is a threshold controlling the density of graph $G_{C}$ . We further design the corresponding potential function for each link in $G_{C}$ to construct MRF. Let $\mathcal{L}=\{H,F\}$ represent the node label, where $P(v_{i}=F)$ means the probability of $v_{i}$ as an abnormal node whlie $P(v_{i}=H)$ is the normal. Assuming probability of node is independent, the joint probability between $(v_{i},v_{j})$ is $P(v_{i},v_{j})=P(v_{i})P(v_{j})$ and the conditional probability is $P((v_{i},v_{j})|\textit{collu}(v_{i},v_{j}))$ , assuming that the joint probability $P(v_{i},v_{j},\textit{collu}(v_{i},v_{j}))$ obey exponential distribution,the calculation is as follows.

$\displaystyle P(v_{i},v_{j}|\textit{collu}(v_{i},v_{j}))=\frac{P(v_{i},v_{j},% \textit{collu}(v_{i},v_{j}))}{P(\textit{collu}(v_{i},v_{j}))}=\frac{e^{-(1-P(v% _{i})P(v_{j})\textit{collu}(v_{i},v_{j}))}}{\textit{collu}(v_{i},v_{j})}$ (15)

According to the analysis in Eq. (15), the potential function of node pair $(v_{i},v_{j})$ in MRF shown in Table 3 in detail.

Assuming network $G(V,E)$ abstracted from individual behavior, which include anomaly collusive behaviors, the process of ACEagle proposed in our work is as follow.

4. Experimental study

4.1 Evaluation and comparison

Compared to classified detection methods, ACEagle order nodes in a ranked list based on their anomaly marginal probability to quantify the abnormal degree in fine granularity. The advantage of this approach lies in important exception nodes in the exception community can be identified, while the algorithm detection capabilities of classification methods limited on account of equally treat the node with different degrees of importance of deviant groups. Therefore, We use Precision@k and NDCG@k [22, 23] to evaluate the top-ranked detectability of ACEagle compare with other state-of-the-art ranking-based approaches while use AUC and AP to measure the overall recognition ability of ACEagle.

As for the selection of comparison methods, we choose the most related works to ACEagle, such as FraudEagle [22], SpEagle [23], and ColluEagle [24], which also rank nodes by abnormal probability obtained by inference based on MRF. In addition, we also selected the classic classified community-based method SCAN to compare with various sorting methods to indicate the effect of sorting methods on the accuracy of identifying abnormal nodes. To highlight the effectiveness of each step in ACEagle, Firstly, we measure the anomaly detection capability based on the prior probability calculated by individual behavior features introduced in Section 3.1.1 and denote it as PRIOR $S F$ . Then, we use PRIOR $SF+NF$ to represent the capability combining the cooperative behavior feature introduced in Section 3.1.2, which aims to reflect the necessity of $G_{S}$ construction. The last, to verify the rationality of the potential function design of $G_{C}$ , we based the prior probability measured by the features of individual and cooperative behavior infer the belief of all nodes with the potential function designed in ours work.

4.2 Validation of real dataset

To reveal the actual detection ability of the method in this paper, we select YelpZip and YelpNYC, two real commodity review datasets with fake collaborative reviews, widely used in the verification of abnormal group identification methods [22, 23, 35, 36] and the detailed description is as follows. We extract the comments in these datasets to abstract as links, and the users and products as nodes, so a bipartite network can be used to represent the connection between products and users. Evidently, The collusion anomaly behavior of spammers for multiple targets can be regarded as a collusion anomaly subgraph structure.

Table 4
A description of the dataset with actual exception

Dataset	Reviews (Fakes%)	Reviewers (Spammers%)	Products
YelpNYC	359052 (10.27%)	160225 (17.79%)	923
YelpZip	608598 (13.22%)	260277 (23.91%)	5044

Figure 4.

Curves evaluated by NDCG@k and Precision@k for each algorithm in the range 0–2000. In FraudEagle, the prior probabilities of nodes are set as (0.5, 0.5). In SpEagle and ColluEagle, the prior probabilities calculated using the feature designed by themselves, and the threshold $\sigma$ in ColluEagle is set as 0.6. In ACEagle, the threshold $\delta$ of $G_{C}$ construction is set as 0.8.

As shown in Fig. 4, we compared the performance of Prior $S F$ and Prior $SF+NF$ with NDCG@k and Precision@k, and the results manifested that considering cooperative behavior characteristics could significantly improve the detection accuracy. Then, ACEagle compares with FraudEagle, SpEagle, and ColluEagle models to highlight the structural rationality and effectiveness of its MRF. During the actual experiment, the initial probabilities of nodes in FraudEagle were set to $(0.5,0.5)$ by default. SpEagle and ColluEagle calculated the prior likelihood according to the design features of respective models and the threshold $\sigma=0.6$ in ColluEagle. In our work, the threshold $\delta$ of $G_{C}$ construction is set as $0.8$ .

The above ranking-based approaches taking the anomaly probability of nodes as scores for sorting, measure the precision of anomaly detection by the actual proportion of anomalies in the top-k ranked list. However, it is not comprehensive for evaluating the detectability of methods. To further access the overall classification ability, AP and AUC metrics are utilized in the entire ranked list. In addition, the experiment select the classic graph clustering-based method SCAN to attest ACEagle is superior to the classified-based method, where $k$ is consistent with the size of abnormal samples obtained by SCAN. As shown in Table 5, under the metrics of AP and AUC, ACEagle has indisputable advantages over FraudEagle, about an average of 14% AP and 3% AUC higher than SpEagle, and has an average of 20% AP and 3% AUC higher than ColluEagle. Moreover, ACEagle is distinctly better than FraudEagle and SCAN under Precision@k, which also increased by about 20% compared to SpEagle and ColluEagle.

Table 5

Results of ACEagle and baselines measured by AP, AUC, Precision@k

	AP		AUC		Precision@k
	YelpZip	YelpNYC	YelpZip	YelpNYC	YelpZip	YelpNYC
SCAN	–	–	–	–	0.3047	0.2195
FRAUDEAGLE	0.3091	0.2233	0.6175	0.6062	0.1544	0.0836
SPEAGLE	0.3616	0.2680	0.6710	0.6575	0.4850	0.4664
COLLUEAGLE	0.3178	0.2662	0.6681	0.6534	0.4643	0.4689
PRIOR SF	0.3060	0.2671	0.6107	0.5926	0.4378	0.4485
PRIOR SF+NF	0.3607	0.3075	0.6574	0.6446	0.5662	0.5376
ACEAGLE	0.3932	0.3187	0.7007	0.6633	0.5732	0.5453

The value of threshold $\delta$ will directly determine the ultimate structure density of $G_{C}$ regarded as MRF, which will influence the inference result of LBP. Assuming the synergistic relationship with $\delta>0.5$ is relatively relevant, We take $\delta$ in the range of 0.6–0.9 to study the influence in ACEagle detection capability under different values of $\delta$ . As shown in Fig. 5, When belief propagation carried out on MRF constituted by different $\delta$ values, the detection ability of ACEagle has varying degrees of stable improvement compare with Prior $SF+NF$ , which indicated that the effectiveness of the proposed method is not limited to specific parameter setting, and can be well applied to all abnormal scenarios with good robustness.

Since ACEagle is an MRF-based abnormal subgraph identification method, the prior probability of nodes calculated by the features of cooperative behavior and individual behavior. Then the probability propagation according to the synergistic relationship until convergence. The inference result of the method depends on consistency between the nodes’ prior probability and the actual situation. Therefore, ACEagle can also be seen as a semi-supervised approach when we know some actual labels of networks. Next, we investigate how much the detection performance can be improved by semi-supervision when setting varying amounts of labeled data. As shown in Table 6, we measure the detection capability by AUC, AP and Precision@k.

Table 6

Under partial labeled data, results of ACEagle measured by AP, AUC, Precision@k

	AP		AUC		Precision@k
	YelpZip	YelpNYC	YelpZip	YelpNYC	YelpZip	YelpNYC
ACEAGLE	0.3932	0.3187	0.7007	0.6633	0.5732	0.5453
ACEAGLE (0.5%)	0.4233	0.3312	0.7074	0.6768	0.7182	0.5909
ACEAGLE (1.5%)	0.4851	0.3762	0.7275	0.6808	0.9491	0.7518
ACEAGLE (3%)	0.5599	0.4238	0.7555	0.6971	0.9855	0.9336

Figure 5.

A sensitivity test that observes the effect of changes in $\delta$ on ACEagle anomaly detection capability. NDCG@k and Precision@k are used to measure the influence and compare with Prior SF $+$ NF.

Different labeling abnormal nodes proportions in network be set to measure the detection capability enhancement of ACEagle by using NDCG@k and Precision@k, {0.5%, 1.5%, 3%} for YelpNYC, and {0.5%, 1.5%, 3%} for YelpZip. To reflect the change of top-k sorting ability, we use Precision and NDCG@k to demonstrate how many actual abnormal nodes are included in the top-k list and whether actual abnormal nodes are ranked in the top region of the top-k list, respectively. As shown in Fig. 6, even only a few nodes are labeled, the accuracy of ACEagle has been largely improved. Visible in the experiment result, the model can be an unsupervised anomaly detection method without prior knowledge and can be a semi-supervised method for anomaly collusive subgraph identification when known a few abnormal node labels. Moreover, knowing the more abnormal label, the detection accuracy increases non-linearly.

Figure 6.

The enhancement of detection capability in ACEagle under the condition of labeling abnormal nodes with different proportions in network. NDCG@k and Precision@k are used to measure the influence and compare to ACEagle without label.

5. Conclusion

ACEagle proposed in our work abstracts the communication relationship between entities as complex network, analyzing node-level and community-level behavior features to identify the camouflage and represent synergistic relationships. In the process of detecting anomalies, nodes’ anomalous prior probability is measured by node-level features extracted from time and weight properties of behaviors. Then, the anomalous prior probability is adjusted to ground-truth by considering community-level behavior features extracted from the network constructed by suspicious cooperative behaviors. Experiments on YelpZip and YelpNYC show that ACEagle is superior to baseline methods under various evaluation indicators. For instance, when ACEagle experiments under unsupervised conditions, which AUC metric 15% higher than FraudEagle, about 5% higher than SpEagle and ColluEagle. While ACEagle experiments under semi-supervised conditions, which AUC metric improves 40% than unsupervised conditions. Future work will study the discrepancy in linkage motivation of benign and anomalous groups and the characteristics of anomalous communities evolving on complex networks.

Footnotes

Acknowledgments

This research was supported by the National Natural Science Foundation of China (No. 61803384).

References

Meng

Cui

and Faloutsos

, Suspicious behavior detection: Current trends and future directions, IEEE Intelligent Systems 31(1) (2016).

Wang

Xie

S.H.

and Liu

, Identify online store review spammers via social review graph, ACM Transactions on Intelligent Systems and Technology 3(4) (2012). doi: 10.1145/2337542.2337546.

Byun

Jeong

and Kim

C.K.

, SC-com: Spotting collusive community in opinion spam detection, Information Processing & Management 58(4) (2021). doi: 10.1016/j.ipm.2021.102593.

Wang

Shang

Y.Y.

and He

Y.Z.

, BotMark: Automated botnet detection with hybrid analysis of flow-based and graph-based traffic behaviors, Information Sciences 511 (2020), 284–296. doi: 10.1016/j.ins.2019.09.024.

Liu

Liao

J.X.

and Wang

J.Y.

, AGRM: Attention-based Graph Representation Model for telecom fraud detection, in: IEEE International Conference on Communications 2019, Shanghai, China, 2020.

Gao

Y.S.

and Li

J.L.

, Telecom fraud detection method based on markov random field, Radio Engineering 51(3) (2021), 237–242.

Yang

R.P.

, Research on log anomaly detection and diagnosis, Ph.D. Dissertation, PLA Strategic Support Force Information Engineering University, 2020.

Yuan

D.Y.

Zhang

Y.F.

and Gao

, Abnormal user detection method of sina weibo based on user feature extraction, Computer Science 47(51) (2020), 364-368+385.

Zhou

Y.J.

, Behavior Analysis based Traffic Anomaly Detection and Correlation Analysis for Communication Networks, Ph.D. Dissertation, University of Electronic Science and Technology of China, 2013.

10.

H.T.

and Huang

R.Y.

, Research progress of abnormal user detection technology in social networks, Chinese Journal of Network and Information Security 4(3) (2018).

11.

J.S.

Peng

J.H.

and Liu

S.X.

, Toward link prediction in directed social networks based on common interest and local community, International Journal of Modern Physics C 31(11) (2020). doi: 10.1142/S0129183120501600.

12.

Liu

S.X.

X.S.

and Liu

C.X.

, Similarity indices based on link weight assignment for link prediction of unweighted complex networks, International Journal of Modern Physics B 31(2) (2017). doi: 10.1142/S0217979216502544.

13.

J.S.

Peng

J.H.

and Liu

S.X.

, Predicting missing links in directed networks based on local network structure and investment theory, International Journal of Modern Physics C 31(7) (2020). doi: 10.1142/S0129183120500965.

14.

Akoglu

Tong

H.H.

and Koutra

, Graph based anomaly detection and description: A survey, Data Mining and Knowledge Discovery 29(3) (2015), 626–688. doi: 10.1007/s10618-014-0365-y.

15.

Javed

M.A.

Younis

M.S.

and Latif

, Community detection in networks: A multidisciplinary review, Journal of Network and Computer Applications 108 (2018), 87–111. doi: 10.1016/j.jnca.2018.02.011.

16.

Kaur

and Singh

, Egyptian Informations Journal 17(2) (2016).

17.

Habeeb

R.A.A.

Nasaruddin

and Gani

, Real-time big data processing for anomaly detection: A survey, International Journal of Information Management 45 (2019), 289–307. doi: 10.1016/j.ijinfomgt.2018.08.006.

18.

Pourhabibi

Ong

K.L.

and Kam

B.H.

, Fraud detection: A systematic literature review of graph-based anomaly detection approaches, Decision Support Systems 133 (2020), 1–15. doi: 10.1016/j.dss.2020.113303.

19.

Akoglu

Mcglohon

and Faloutsos

, OddBall: Spotting anomalies in weighted graphs, Lecture Notes in Computer Science 6119(3) (2010).

20.

Z.M.

Xiong

and Liu

, Detecting blackhole and volcano patterns in directed networks, Data Mining & Knowledge Discovery 25(3) (2012).

21.

Z.M.

Xiong

and Liu

, Mining blackhole and volcano patterns in directed graphs: a general approach, ICDM 2010, Sydney, Australia, 2010, 577–602. doi: 10.1007/s10618-012-0255-0.

22.

Akoglu

and Chandy

, Opinion Fraud Detection in Online Reviews by Network Effects, ICWSM 2013, Boston, USA, 2013.

23.

Rayana

and Akoglu

, Collective Opinion Spam Detection, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Singapore, 2015.

24.

Wang

and Chen

, ColluEagle: Collusive review spammer detection using Markov random fields, Data Mining and Knowledge Discovery 34(6) (2020), 1621–1641. doi: 10.1007/s10618-020-00693-w.

25.

Anand

and Kumar

, Anomaly Detection in Online Social Network: A Survey, ICICCT 2017, New Delhi, India, 2017, 456–459.

26.

Campos

G.O.

Moreira

and Meira

, Outlier detection in graphs: A study on the impact of multiple graph models, Computer Science and Information Systems 16(2) (2019), 565–595. doi: 10.2298/CSIS181001010C.

27.

Jiang

Cui

and Faloutsos

, Suspicious behavior detection: Current trends and future directions, IEEE Intelligent Systems 31(1) (2016), 31–39.

28.

Zamini

and Hasheminejad

S.M.H.

, A comprehensive survey of anomaly detection in banking, wireless sensor networks, social networks, and healthcare, Intelligent Decision Technologies 13(2) (2019), 229–270. doi: 10.3233/IDT-170155.

29.

Weller-Fahy

D.J.

Borghetti

B.J.

and Sodemann

A.A.

, A survey of distance and similarity measures used within network intrusion anomaly detection, IEEE Communications Surveys and Tutorials 17(1) (2015), 70–91. doi: 10.1109/COMST.2014.2336610.

30.

Yuruk

and Feng

, SCAN: A Structural Clustering Algorithm for Network, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 2007, pp. 824-+. doi: 10.1145/1281192.1281280.

31.

Hooi

Shin

and Song

H.A.

, Graph-based fraud detection in the face of camouflage, ACM Transactions on Knowledge Discovery from Data 11(4) (2017). doi: 10.1145/3056563.

32.

Wang

S.M.

and Zhao

X.N.

, Graph-based review spammer group detection, Knowledge and Information Systems 55(3) (2018), 571–597. doi: 10.1007/s10115-017-1068-7.

33.

J.T.

and Akoglu

, Discovering opinion spammer groups by network footprints, Machine Learning and Knowledge Discovery in Databases 9284 (2015), 267–282.

34.

Jiang

Cui

and Beutel

, Catching synchronized behaviors in large networks: A graph mining approach, ACM Transactions on Knowledge Discovery from Data 10(4) (2016). doi: 10.1145/2746403.

35.

Wang

B.H.

Gong

N.Z.Q.

and Fu

, GANG: Detecting fraudulent users in online social networks via guilt-by-association on directed graphs, in: IEEE International Conference on Data Mining, New Orleans, USA, 2017, pp. 465–474.

36.

Fan

X.X.

D.Y.

and Bi

J.P.

, Trustworthiness and untrustworthiness inference with group assignment, Lecture Notes in Computer Science, Lecture Notes in Computer Science 10966 (2018), 389–404.

37.

Majadi

Trevathan

and Bergmann

, Collusive shill bidding detection in online auctions using Markov Random Field, Electronic Commerce Research and Applications 34 (2019).

38.

Yedidia

J.S.

Freeman

W.T.

and Weiss

, Constructing free-energy approximations and generalized belief propagation algorithms, IEEE Transactions on Information Theory 51(7) (2005), 2282–2312.

39.

Kschischang

F.R.

Frey

B.J.

and Loeliger

H.A.

, Understanding belief propagation and its generalizations, IEEE Transactions on Information Theory 47(2) (2001), 239–269.

40.

Mukherjee

Kumar

and Liu

, Spotting opinion spammers using behavioral footprints, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 2013, pp. 632–640.

41.

Mukherjee

Liu

and Glance

, Spotting fake reviewer groups in consumer reviews, in: WWW’12-Proceedings of the 21st Annual Conference on World Wide Web, New York, USA, 2012, pp. 191–200.

42.

Mukherjee

Venkataraman

and Liu

, What yelp fake review filter might be doing? ICWSM 2013, Michigan, USA, 2013, 409–418.

43.

Lim

E.P.

Nguyen

V.A.

and Jindal

, Detecting product review spammers using rating behaviors, in: International Conference on Information and Knowledge Management, St.Marten, Netherlands, 2010, pp. 939–948.

[height=0.8cm,width=2.4cm] $v_{j}$ $v_{i}$	H	F
H	$P(v_{i}=H,v_{j}=H\|\textit{collu}(v_{i},v_{j}))$	$P(v_{i}=H,v_{j}=F\|\textit{collu}(v_{i},v_{j}))$
F	$P(v_{i}=F,v_{j}=H\|\textit{collu}(v_{i},v_{j}))$	$P(v_{i}=F,v_{j}=F\|\textit{collu}(v_{i},v_{j}))$

Collusive anomalies detection based on collaborative markov random field

Abstract

Keywords

1. Introduction

2.1 Markov random field

3.1.1 Node-level behavior feature

Table 1 Description of abnormal feature of node-level behavior

Edge-edge suspicious collusion score

Node-node suspicious collusion score

Table 3 The potential function design of ACEagle

Collusiveness of node pair

4.1 Evaluation and comparison

4.2 Validation of real dataset

Table 4 A description of the dataset with actual exception

Footnotes

Acknowledgments

References

Table 1
Description of abnormal feature of node-level behavior

Table 3
The potential function design of ACEagle

Table 4
A description of the dataset with actual exception