A comprehensive survey of anomaly detection in banking,wireless sensor networks,social networks,and healthcare

Abstract

Anomaly detection is an important issue, which has been investigated in various research fields and application domains. Many anomaly detection techniques have been developed exclusively for certain application domains, in contrast, others are more general. This survey aims to create a structured and comprehensive overview of the research on anomaly detection. First, we tried to introduce the concept of anomalies and types of anomaly detection. We have tried to classify anomaly detection according to their application and then categorized their techniques. For each application and technique, we have described key assumptions, which are used by the techniques to distinguish between normal and abnormal behavior. For each application, a basic anomaly detection technique has been provided, in the end; the differences among existing techniques in each specific category are discussed. Furthermore, we tried to describe the advantages and disadvantages of each technique in that field. In addition, we tried to bring some data sets that were used in some papers in order to test your methods with them. We hope that this survey provides a better concept of the various directions, which has been researched on that specific topic.

Keywords

Anomaly detection intrusion detection fraud detection credit card banking industry

1. Introduction

The goal of anomaly detection is to recognize groups of instances, which are infrequent within data. A most likely definition of anomaly is given by Hawkins as [1] “an observation which deviates so much from other observations as to arouse uncertainties that it was produced by an alternate mechanism”.

Although anomaly detection has received significant attention, the automatic classification anomalies still remains an unsolved problem [2].

Anomaly detection refers to the problem of finding patterns, which do not conform to common behavior. These unusual patterns are often referred to as anomalies, outliers, aberrations, surprises, exceptions, contaminants or peculiarities in various application domains. Anomalies and outliers are two well-known terms used in the context of anomaly detection. Anomaly detection have been used extensively in a wide range of applications such as intrusion detection for cyber security, credit card fraud detection [3, 4], insurance or health care, military surveillance for enemy activities and fault detection in safety critical systems. The significance of anomaly detection is that anomalies in data can provide significant (and often critical) actionable information in a wide range of application domains. For example, anomalous traffic pattern in a computer network could mean that a hacker is using sensitive data. An anomaly in health for example in MRI image may show presence of malignant tumors. In credit card transaction data, an anomaly could indicate credit card or identify theft. Over time, a wide range of anomaly detection techniques have been implemented and applied in several research communities. Many of these approaches have been developed for exclusive application domains, while others are more general. In Fig. 1, a timeline of researches in the literature is shown. Growing interest in this domain is obvious in recent years.

The rest of this paper is organized as follows. In Section 2, types of anomalies in various areas are introduced. In Section 3, application of anomaly detection and their methods are discussed separately. For each application and technique, this survey described key assumptions, which are used by the techniques to distinguish between normal and abnormal behavior. For each application, a basic anomaly detection technique has been provided, in the end; the differences among existing techniques in each specific category are discussed. Furthermore, we tried to describe the advantages and disadvantages of each technique in that field. Moreover, according to needs of researchers, some data sets that were used in some papers in order to test the methods are introduced. Section 4 presents online streaming anomaly detection. As data is going to be enormous, researchers are trying to provide different solutions from traditional methods. Finally, conclusions are presented in Section 5.

Figure 1.

Classification of papers based on published year.

2. Types of anomalies

Based on number of parameters, anomalies or the abnormal activities can be classified into different categories. We will discuss types of anomaly below.

2.1 Based on nature of anomalies

We divided anomalies into mainly three categories based on the nature and scope of anomalies:

2.1.1 Point anomalies

If a data instance (i.e., a point) has different behavior than the rest of the data instances, it is referred to as point anomaly or global anomaly. Point anomalies are the simplest types of anomaly in data sets, however, it has some major problems in finding suitable measurements in variation of the instance from other detected point anomalies. For example, in one network if we assume that every nodes of network must have at least two neighbors connected to it, the points that have less than two connected links are said to have anomalous behavior. In Fig. 2, the points in V1 group are predicted to be anomalous points. One of the subsets of point anomalies is called local anomalies. This happens when a point have anomalous behavior related to its local neighborhood. If we assume an example, we group a set of individuals based on their links in the network as friends and check their income (some parameters), a particular individual, let say A, might be having a fairly low income compared to his friends suspecting a local anomaly while overall in the global context his income might be insignificant as many people may have similar income representing a normal behavior. This behavior is depicted in Fig. 3a and b.

Figure 2.

Point anomalies.

Figure 3.

(a) Groups on the basis of friendship links; (b). Groups according to income.

2.1.2 Contextual anomalies

This type of anomaly also is referred to as conditional anomalies in the data set. It is when the data instance has significant deviates from others in a specific context. One example of contextual anomaly is temperature that might be considered as a contextual anomaly. For example, today’s temperature is 28 ${}^{\circ}$ C. Whether it is anomalous or not depends on the time and location. Detecting contextual anomalies leads to define two attributes of the data object:

Behavior attributes: For defining these attributes, characteristics of an object are considered to detect anomalous behavior of the object. In the example above, the temperature and humidity can be used as behavior attributes. According to [5], usually proximity based methods are used for contextual anomaly detection.

Contextual attributes: These attributes define the context of the object. In our example, contextual attributes for temperature are date and location.

2.1.3 Collective anomalies

When a collection of data objects as a whole behaves differently from others, it will be called anomaly, but the data objects individually may not be anomalous. Figure 4 shows a collective anomaly because density of this area compared to others is very high; however, they do not consider each of the points on that area. In reality world, if a model of Car Company has a problem, this is natural but whenever for example hundreds of that car models have the same problem it seems to be collective anomaly.

Figure 4.

Collective anomalies.

For detecting collective anomalies, there are several methods, which were used, but one of best-known methods is to consider the behavior of the group of objects with the background information about the relationship among those data objects.

Horizontal anomalies are another type of anomaly that have been emerged in social networks [6], which depict the presence of anomalies based on the different sources of data available.

2.1.4 Horizontal anomalies

For example, the same user may be present in different communities on different social networks. Similarly, a user may have similar kinds of friends on a number of social networks (e.g., Facebook, Google $+$ ), but completely different kinds of friends for another social network (e.g., Twitter). This depicts an unusual activity, which can be considered as anomalous.

2.2 Based on static/dynamic nature of network/graph structure

According to [7], based on the network structure being used to distinguish anomalies, they are classified as static or dynamic. Bibliographic networks are one of static networks that allow the changes to happen slowly over time. On the other hand, dynamic networks such as mobile applications, allow continuous changes and faster communications in the networks.

Dynamic anomalies: Changes in the network with the passage of time occurs. For example, it may involve changes in the way interactions take place in the network.

Static anomalies: According to the rest of the network by ignoring the time factor, static anomalies occur and only the current behavior of a node is analyzed.

2.3 Based on information available in network/graph structure

According to [7], anomalies based on the type of information available at a node or an edge, can be classified into labeled or unlabeled.

2.3.1 Labeled anomalies

In the labeled anomalies, both structure of the network and the information gathered from vertex or edge attributes are important. An example of these anomalies is labels on nodes that may specify the attributes of individuals involved in the communication activity and on the edges represent their interaction behavior.

2.3.2 Unlabeled anomalies

Unlabeled anomalies that are just related to the network structure. The techniques have been used to detect these types of anomalies consisting of static unlabeled anomalies, static labeled anomalies, dynamic unlabeled anomalies and dynamic labeled anomalies. In static unlabeled anomalies, it occurs when behavior of an individual remains static and the attributes, such as age of individuals involved, type of interactions, and its duration are ignored due to unlabeled nature of the network in which labels on nodes and edges are ignored. In static labeled anomalies, labels on the vertices and edges are also considered, and then the anomalous substructures found are referred to as static labeled anomalies. In the product review system, a bipartite graph with one subset of vertices as users and other as products is taken, in which the edges between the subsets represent the product reviews. Hidden labels are assigned to both users and products. For users, the label can be in the form of honest or fraudulent and for the products, it could be either good or bad. A normal honest user will give accurate results, i.e., for good products, they give positive response and for bad ones, they will give negative reviews whereas fraudulent users are understood to do the reverse. Dynamic unlabeled anomalies arise when we have dynamic networks that change with time. For example, on the pattern of interactions, according to [8], there are maximum of six ways, in which a maximal clique can evolve: shrinking, growing, splitting, merging, appearing or vanishing. When the normal behavior does not result in any network change, it results in any neighborhood changes predicted as an anomalous behavior. When in a dynamic network, an anomalous behavior is observed by considering labels of the vertices and edges; it results in anomalies observed to be classified as dynamic labeled anomalies.

2.4 Based on behavior

Another class of anomalies are named as, “white crow anomalies” and “in-disguise anomalies” (see Fig. 5) is presented by [8].

2.4.1 White crow anomaly

When a data object deviates remarkably from other observations, it is white crow anomaly. For example, if the probability of one event is more than 1, which is impossible, then, it is taken as a white crow anomaly.

Figure 5.

In disguise and white crow anomalies.

2.4.2 In-disguise anomaly

A small deviation from the normal pattern called in-disguise anomaly [9]. For example, if someone tries to access to someone’s social network account and would not want to be caught; so, he will try to behave in the same manner as a normal user. Such anomalies are recognized through strange patterns, which also include uncommon nodes or entity alterations. These are difficult to be detected as they are hidden inside the network.

2.5 Based on structural operations on network/graph structure

Another type of anomalies in social networks according to Eberle and Holder [9] is based on structural operations. This classification consists of three properties as follows:

•
Insertion: It deals with the existence of an unexpected edge or vertex in the graph.
•
Modification: It deals with the presence of an unexpected label on an edge or a vertex.
•
Deletion: The last one deals with the absence of an expected vertex or an edge.

2.6 Based on interaction pattern in network/graph structure

In another work by [10], network and social networks anomalies are considered as follows:

•
Near stars/cliques: When completely discon- nected (Near Stars) or all connected neighbors (Near Clique) exist, it will be considered as anomaly because it is very rare.
•
Heavy locality: It determines the presence of an anomaly when heavy weight around a particular area or a group is suspicious.
•
Particular dominant links: At a particular node or link as compared to other nodes when there is unexpected presence of heavy load, it specifies an unusual activity.

2.7 Types of anomalies in dynamic networks

2.7.1 Anomalous vertices

The goal of anomalous vertex detection is to discover a subset of the vertices such that every vertex in the subset has an ‘irregular’ evolution compared to the other vertices in the graph.

Identifying vertices that contribute the most to a discovered event (also known as attribution), such as in communication networks [11], and observing the shifts in community involvement [12] are some typical applications of this type of anomaly detection.

2.7.2 Anomalous edges

Edge detection aims to discover a subset of the edges such that every edge in the subset has an ‘irregular’ progress, optionally identifying the time points where they are abnormal. In a static graph, a distribution of the edge weights can be found, and according to the probability of its weight, each edge can be assigned a score. But, due to the temporal nature of dynamic graphs, two new main types of irregular edge progress can be discovered: (1) abnormal edge weight evolution [13], where the weight of a single edge oscillates over time and has inconsistent spikes in value, and (2) figure of unlikely edges in a graph between two vertices that are not typically connected or part of the same community [14].

2.7.3 Anomalous sub-graphs

The sub-graphs that are identified or tracked are typically bounded, such as to those discovered by community detection methods. Matching algorithms are needed for tracking the sub-graphs along time steps, like the community matching technique used in [15]. Anomalies of this type are exclusive for dynamic networks, and consist of communities that merge, split, disappear, and appear again frequently, or show a number of other behaviors.

2.7.4 Event and change detection

Event detection has encompassed much broader scope in comparison with the previous three types of anomalies, targeting to identify time points that are significantly different from the rest. Isolated points in time where the graph is not similar to the graphs at the previous and following time points show events. To measure the similarity of two graphs, one approach is comparing the signature vector of summary values extracted from each graph, such as average clustering coefficient [10]. Complementary to event detection, there is change detection. It is essential to notice the distinction between event and change detection. Change points choose a point in time where the entire behavior of the graph changes and the difference is maintained until the next change point, while events show isolated incidents.

2.8 Anomalies in wireless sensor networks

According to [16], various types of anomalies in Wireless Sensor Networks (WSN) are node anomaly, network anomaly and data anomaly, which are described below.

2.8.1 Node anomaly

These types of anomalies can be discovered during failure of WSN node or power problems. Failure of solar panel or fluctuations in power of various components can result in this type of anomaly. Node anomalies can occur due to hardware or software problems in the WSN nodes.

2.8.2 Network anomaly

Unexpected fluctuations in the signal strength and connection issues can be applied to identify network anomalies. Complete loss of connectivity or episodic connectivity can be applied to detect intrusions in the network.

2.8.3 Data anomaly

An intrusion try can be detected from disordered data communication.

Sensor data quality is crucial for right decision-making. Cryptographic and key management techniques are not enough to ensure the integrity of data, as they cannot protect sensor nodes from insider attacks such as data fabrication. Hence, anomaly detection models are proposed to identify any abnormal behavior in sensor data streams.

3. Applications of anomaly detection

In this section, we are trying to cover a comprehensive review of anomaly detection applications in various fields.

3.1 Intrusion detection to cloud systems

Providing on-demand, convenient, network access to a shared pool of configurable computing resources (e.g., applications, storage, services servers and networks) is the aim of cloud computing [17]. In cloud services, security is one of the most important problems. It suffers from different types of attacks such as Denial of Service (DoS), Distributed Denial of Service (DDoS), IP spoofing, Address Resolution Protocol spoofing, Routing Information Protocol known attacks such as, DNS poisoning, Flooding, etc.

Availability, confidentiality and integrity of cloud resources and services are three important factors in cloud computing. According to [18], the aim of attackers is to interrupt these factors as follows:

•
Insider attack: It happens when an authorized cloud user tries to gain unauthorized privilege in order to interrupt confidentiality of services, it is called insider attack. Their purpose is revealing important information to others or doing fraudulent activities.
•
Flooding attack: In flooding attack, attacker tries to send a huge number of packets from an innocent host in the network in order to interrupt service availability to an authorized user (flood victim). Types of packets can be TCP, ICMP, UDP or a mixture of them. The result of flooding attack can cause significantly raise the usage bills.
•
User to root attacks: Attackers try to sniff password in order to get access to authorized user’s account. They scan and record passwords that are broadcasted or used on a cloud. The attacker can exploit vulnerabilities for gaining root level access to the system by gaining access to user’s account.
•
Port scanning: The aim of port scanning tools is probing a server or host for filtered ports, open and closed ports. Port scanner is often used by administrators to verify security policies of their networks, and it is used by attackers to identify network services running on a host and exploit vulnerabilities. TCP scanning, SYN scanning, FIN scanning, UDP scanning, ACK scanning, Window scanning and etc. are some various port scanning techniques.
•
Attacks on virtual machine or hypervisor: One of the disadvantages of hypervisor is being potentially attacked, either from the network layer or from the host running on that hypervisor. Much the same as other devices, desktop machines and infrastructures, a hypervisor on a network behaves to respond to connections through standard TCP/IP by default. Thus, it makes the hypervisor being locatable on the network and consequently susceptible to traditional network enumeration attacks such as NESSUS and NMAP. For companies who invested in the cloud computing or hosting servers in large datacenters attacking the hypervisor from the guest or virtual machine is much more dangerous and an unfamiliar concept. The attacker can gain control over installed VMs by compromising the lower layer hypervisor. SubVir, BLUEPILL and DKSM are some known attacks examples on virtual layer. Hackers through these attacks can be able to compromise installed-hypervisor to gain control over the host.
•
Backdoor channel attacks: It allows the intruder to gain remote access to the infected node in order to compromise user confidentiality. For example, hacker takes control of victim’s resources in DDoS attack by using backdoor channels.

3.1.1 Types of intrusion detection system

In order to detect intrusions, different systems have been implemented as follows.

3.1.1.1. Host based intrusion detection systems (HIDS)

HIDS by collecting information such as file system used, network events, system calls, etc. detects intrusion for the machine. They do this by analysing and monitoring the information from a specific host machine. These systems observe behavior of the program, modification in host kernel and host file system. If there were a deviation from expected behavior, it reports the existence of an attack. Choosing system characteristics to monitor determines the efficiency of HIDS [19] proposed HIDS based architecture for cloud environment. In the proposed architecture, each node of Cloud/Grid contains IDS that provides interaction among service offered (e.g., PaaS), storage service and IDS service. When large numbers of training samples are applied for behavior analysis method, results of the research [19] describe very low the false negative alarm and false positive rate [20] proposed self-similarity based lightweight intrusion detection method for cloud computing by using the number of events from the Windows’ security event log that is extracted. The process of feature selection creates groups by combining EventID and security ID (SID) in Windows system. Then, self-similarity of each VM is measured. In [20], for calculating self-similarity two techniques are proposed; cosine and hybrid. The deviation of calculated similarity from normal behavior creates alerts. Self-similarity based approach is cost effective and efficient for detecting an anomaly in the cloud environment. However, the drawback of this approach is that it works only for Windows system. In [21], an abstract model for intrusion detection and severity analysis to provide the overall security of the cloud was proposed. It consists of six components including system call handler, detection module, security analysis module, profile engines, global components and intrusion response system [21].

3.1.1.2. Distributed intrusion detection system (DIDS)

Over a large network, a Distributed IDS (DIDS) consists of several IDS (e.g., NIDS, HIDS, etc.), communicating with each other, or with a central server that enables network monitoring. The components of intrusion detection collect the system information and convert them into a standardized form in order to be passed to a machine that aggregates information from multiple IDS and analyzes the same, called central analyzer. DIDS has both advantages of the HIDS and NIDS and can be used for detecting unknown and known attacks [22] proposed a cooperative agent based approach individual NIDS module. It notifies other regions, if any cloud region detects intrusions in each area. When a new attack is detected, then new blocking rule is added to block list. Hence, this type of detection and prevention helps to resist attacks in cloud [23] proposed a mutual agent based approach in cloud computing to detect DDoS attack. Mutual agent at that region notifies other regions when any intrusion region detected. The intensity of alerts provided from other regions is calculated for each region. Then, new blocking rule is added into block table at each region, if a new attack is detected. Therefore, DDoS attack is detected in the whole cloud by using mutual cooperation among cloud regions.

3.1.1.3. Hypervisor-based intrusion detection system

At hypervisor layer, there is a platform to run VMs Hypervisor-based intrusion detection system. This type of IDS lets the user to analyze and monitor communications between different VMs, between a hypervisor and VM and also within the hypervisor based virtual network. According to [24], VM introspection based IDS is one of the examples of hypervisor based intrusion detection system. In cloud computing, one of the most important techniques is hypervisor based IDS, to detect intrusion in a virtual environment.

3.1.1.4. Intrusion prevention system (IPS)

The proposed network intrusion detection and prevention approach by [25] does not need installing IDS on each node. The benefit of this approach is solving transferring alert message and trust issue. It also reduces overhead and no false alarm rate. Lee et al. [26] represented Cumulative-Sum based Intrusion Prevention System (CSIPS) for preventing DoS or DDoS attacks. They used three detection algorithms (called outbound, inbound and forwarded) and packet classification algorithm, which efficiently detect DDoS attack and logs, has sent to remote IPS machine cooperatively. IPSs are mainly categorized into two classes: Network based IPS (NIPS) and Host based IPS (HIPS). In another work by [27], they presented an IPS model based on dynamically distributed cloud firewall linkage. Authors introduced the structure and function of Cloud Firewall.

3.1.1.5. Intrusion detection and prevention system (IDPS)

It is very useful to combine IDS and IPS, which is named IDPS. IDPS identifies stops and reports intrusions to security administrators. Using proper configuration and management of IDS and IPS can reduce security issues [28] reported how intrusion detection and prevention can be used together to improve security, and also discussed different approaches to design, configure, and manage IDPS. According to [28], IDPS can be categorized into three broad categories: stateful protocol analysis, signature-based, and anomaly-based. On the type of events that they monitor and the ways in which they are deployed, IDPS can be divided into four groups based on [28]: (a) Network-Based (b) Host-Based (c) Network Behaviour Analysis (NBA) (d) Wireless.

3.1.1.6. Network based intrusion detection systems (NIDS)

According to [29], in static graphs, techniques focused on anomaly detection, which do not change and are capable of representing only a single snapshot of data. Examples of dynamic networks, which are unlike static networks, are constantly undergoing changes to their structure or attributes, consisting of global financial systems connecting banks all over the world, electric power grids connecting geographically distributed areas, and social networks that connect users, businesses, or customers using relationships such as friendship, collaboration, or transactional interactions [29]. Detection of ecological disturbances, such as wildfires [30] and cyclones [31]; intrusion detection for individual systems [32] and network systems [33]; identifying abnormal users and events in communication networks [34]; and detecting civil unrest using twitter feeds [35] are some samples.

3.1.2 Solutions to attacks

3.1.2.1. Firewalls

Firewall protects the front access points of the system and is treated as the first line of defense. Firewalls are used to deny or allow protocols, ports or IP addresses. It diverts incoming traffic according to a predefined policy. Several types of firewalls are discussed in [36].

3.1.2.2. IDS and IPS techniques

Signature-based intrusion detection attempts to define a set of rules (or signatures) that can be used to decide that a given pattern is that of an intruder. As a result, signature-based systems are capable of attaining high levels of accuracy and a minimal number of false positives in identifying intrusions [37]. In cloud, signature-based intrusion detection technique can be used for detecting a known attack. Approaches presented by [38, 39, 22, 40] used signature-based intrusion detection system for detecting intrusions on VMs (or front end of cloud environment).

3.1.2.3. Anomaly detection

The key element for using this approach efficiently is to generate rules in such a way that it can achieve lower the false alarm rate for the unknown as well as known attacks.

[41] presented a lightweight intrusion detection system to detect the intrusion in real-time, efficiently and effectively. In this work, behavior profile and data mining techniques are automatically maintained to detect the cooperative attack. In cloud, large numbers of events (network level or system level) occur, which makes difficult to monitor or control intrusions using anomaly detection technique [24, 19, 42, 43] proposed anomaly detection techniques to detect intrusions at different layers of cloud.

According to Fig. 6, IDS-based techniques and their usages can be categorized as follows:

Figure 6.

IDS techniques.

3.1.2.4. Artificial neural network (ANN) based IDS

The goal of using ANNs for intrusion detection is to be able to generalize data (from incomplete data) and to be able to classify data as being normal or intrusive [44]. Types of ANN used in IDS are as [44]: Multi-Layer Feed-Forward (MLFF) neural nets, Multi-Layer Perceptron (MLP) and Back Propagation (BP [45] proposed a three layer neural network for misuse detection in the network. The feature vector used in [45] was composed of nine network features (Protocol ID, Source Port, Destination Port, Source IP Address, Destination IP Address, ICMP Type, ICMP Code, Raw Data Length, Raw Data). However, intrusion detection accuracy is very low [46] presented MLP-based IDS. They revealed that inclusion of more hidden layers increased detection accuracy of IDS. This approach improves detection accuracy of the approach proposed in [45, 47] compared the rate of successively finding intrusion with MLP and Self-Organization Map (SOM) and revealed that SOM has high detection accuracy than ANN. It is claimed that Distributed Time Delay Neural Network (DTDNN) [44] has higher detection accuracy for most of the network attacks. DTDNN is a simple and efficient solution for classifying data with high speed and fast conversion rates.

3.1.2.5. Fuzzy logic based IDS

Fuzzy logic [48] can be used to deal with inexact description of intrusions [49] used relative fuzzy-entropy as a heuristic, Ant Colony Optimization (ACO) in order to search for a global best smallest set of network traffic features for real-time intrusion detection data set. They used UCI benchmark data set and the algorithm was best suited for real valued data sets [50] proposed Fuzzy IDS (FIDS) for network intrusions like SYN and UDP floods, Ping of Death, E-mail Bomb, FTP/Telnet password guessing and port scanning. In [51], an anomaly detection system at hypervisor layer named hypervisor detector is developed and evaluated to detect the malicious activities in cloud environment. The proposed hypervisor detector is implemented with Adaptive Neuro-Fuzzy Inference System (ANFIS). The ANFIS model integrates the artificial neural network (ANN) and fuzzy inference system into a composite ensuring that there are no limitations to distinguish the pertinent features of ANN and fuzzy inference system. The approach proposed by [50] cannot be used in real time for detecting network intrusions as the training time is significant by more. Features for comparison are taken from network packet header. This approach is used for large-scale DoS/DDoS attacks. To reduce training time of ANN, fuzzy logic with ANN can be used for fast detection of unknown attacks in cloud.

3.1.2.6. Association rule based IDS

Some intrusion attacks are formed based on known attacks or variant of known attacks [48] proposed network-based intrusion detection using data mining techniques. In this approach, signature-based algorithm generates signatures for misuse detection. However, a drawback of the proposed algorithm is its time consumption for generating signatures [52] solved the database scanning time problem examined in [48]. They proposed scanning reduction algorithm to reduce the number of database scans for effectively generating signatures from previously known attacks. However, it has very high false positive alarm rate since unwanted patterns are produced [53] proposed length decreasing support based Apriori algorithm to detect intrusions to reduce production of a short pattern as derived by [48, 52]. It is faster than other Apriori-based approaches.

3.1.2.7. Support vector machine (SVM) based IDS

SVM [54, 55] is used to detect intrusions based on limited sample data, where dimensions of data will not affect the accuracy [56] proposed an IDS using PSO-SVMs for parameter optimization and feature optimization leading to higher accuracy. The integration of an IDS into Mobile Ad hoc Network (MANETs) used in [57] as a dependable and powerful solution. This IDS is capable of detecting the DoS type attacks at a high detection rate with fast computing speed. It is proved that the proposed IDS upgrades the reliability of the network notably by detecting and eliminating the malicious nodes in the system. In another work by [58], a monitoring technique for detecting Dos attacks in Wireless Mesh Networks (WMN) has been designed. Packet delivery ratio, average packet drops and delay metrics are the evaluation factors. It has been proved that proposed IDS successfully eliminates the malicious nodes and improves the packet delivery ratio while reducing the packet drop by integrating a priority mechanism into the system. However, the efficiency of the approach mentioned is just tested for static mesh networks and its efficiency under mobile networks has not been analyzed yet.

3.1.2.8. Genetic algorithm (GA) based IDS

GA were originally introduced by John Holland in [59] and it was inspired from natural evolution. In fact, GA is looking for optimum solution within a population of candidate solutions that are traditionally shown in the form of binary strings called chromosomes. Genetic algorithms (GAs [60] are used to choose network features (to define optimal parameters) which can be used for achieving result optimization and improving the accuracy of IDS in other techniques. In [61], seven features of the captured packet were used (namely Duration, Protocol, Source_port, Destination_port, Source_IP, Destination_IP, Attack_name). The proposed method used support confidence based simple and flexible framework for fitness function. For detecting network intrusions, generated rules were used. In another work presented in [62], Genetic Programming (GP) based approach to generate rules from network features were designed. Note that GP [63] is an extension of genetic algorithms that shows each one by a tree rather than a bit string. Because of hierarchy nature of the tree, GP can provide various types of model such as mathematical functions, logical and arithmetic expressions, computer programs, networks structures, etc. For deriving rules, they used support confidence based fitness function, and it classifies network intrusions effectively. The disadvantage of this method is training time period for the fitness function [64] proposed information theory and GA-based approach for detecting abnormal behavior. According to mutual information between network features and type of intrusion, a small number of network features closely with network attacks was detected. However, discrete features are considered on this approach. Another method presented in [60] is used to detect anomaly and misuse by a combination of fuzzy logic and genetic algorithms. Fuzzy logic is used to set quantitative parameters in intrusion detection, whilst genetic algorithm is used to discover best-fit parameters of the introduced numerical fuzzy function. Best-fit problem has been solved as reported by [62]. Clustering genetic algorithms were used to solve the computer network intrusion detection problem in [65]. It shows an efficient intelligent intrusion detection system. The combination of two stages into the process including clustering stage and genetic optimization stage were used. Not only the proposed algorithm is able to cluster the cases automatically, but also detects the unknown intruded action.

3.1.2.9. Artificial immune systems (AIS)

Artificial immune systems in intrusion detection are spreading for two subjects. Firstly, high level of protection from invading pathogens in human immune system brings in a robust, self-organized and distributed manner. Secondly, the techniques used in computer security before were not able to deal with the dynamic and increasingly complex nature of computer systems and their security.

Several AISs have been designed for a wide range of applications including document classification, fraud detection, and network-and host-based intrusion detection [66]. AISs can be widely split into two categories based on the mechanism that they run: network-based models and negative selection models. Network-based models refer to systems which are largely based on Jerne’s network theory [67] which identifies that interactions occur between antibodies and antibodies as well as between antibodies and antigens. Moreover, in negative selection models, for generating a population of detectors, they use negative selection.

Table 1

Summary of IDS/IPS techniques

IDS/IPS technique	Characteristics/advantages	Limitations/ challenges
Signature based detection	• Detecting intrusion by matching captured patterns with preconfigured knowledge base. • High detection accuracy for previously known attacks. • Low computational cost.	• Cannot detect new or variant of known attacks. • High false alarm rate for unknown attacks.
Anomaly detection	• Using statistical test on collected behavior to identify intrusion. • Can lower the false alarm rate for unknown attacks.	• More time is required to identify attacks. • Detection accuracy is based on amount of collected behavior or features.
ANN based IDS	• Classifying unstructured network packet efficiently. • Multiple hidden layers in ANN increase efficiency of classification.	• Requiring more time and more samples training phase. • Has lesser flexibility.
Fuzzy logic based IDS	• Used for quantitative features. • Providing better flexibility to some uncertain problems.	• Detection accuracy is lower than ANN.
Association rules based IDS	• Used to detect known attack signature or relevant attacks in misuse detection.	• Cannot detect totally unknown attacks. • Requiring more number of database scans to generate rules. • Used only for misuse detection.
SVM based IDS	• Can correctly classify intrusions, if limited sample data are given. • Can handle massive number of features.	• Can classify only discrete features.
GA based IDS	• Used to select best features for detection. • Has better efficiency.	• Complex method. • Used in specific manner rather than general.
Artificial immune system based IDS	• Robust, self-organized and distributed manner.	• Complex method.
Hybrid techniques	• Efficient approach to classify rules accurately.	• Computational cost is high.

IDS/IPS technique

Characteristics/advantages

Limitations/ challenges

Signature based detection

•

Detecting intrusion by matching captured patterns with preconfigured knowledge base.

•

High detection accuracy for previously known attacks.

•

Low computational cost.

•

Cannot detect new or variant of known attacks.

•

High false alarm rate for unknown attacks.

Anomaly detection

•

Using statistical test on collected behavior to identify intrusion.

•

Can lower the false alarm rate for unknown attacks.

•

More time is required to identify attacks.

•

Detection accuracy is based on amount of collected behavior or features.

ANN based IDS

•

Classifying unstructured network packet efficiently.

•

Multiple hidden layers in ANN increase efficiency of classification.

•

Requiring more time and more samples training phase.

•

Has lesser flexibility.

Fuzzy logic based IDS

•

Used for quantitative features.

•

Providing better flexibility to some uncertain problems.

•

Detection accuracy is lower than ANN.

Association rules based IDS

•

Used to detect known attack signature or relevant attacks in misuse detection.

•

Cannot detect totally unknown attacks.

•

Requiring more number of database scans to generate rules.

•

Used only for misuse detection.

SVM based IDS

•

Can correctly classify intrusions, if limited sample data are given.

•

Can handle massive number of features.

•

Can classify only discrete features.

GA based IDS

•

Used to select best features for detection.

•

Has better efficiency.

•

Complex method.

•

Used in specific manner rather than general.

Artificial immune system based IDS

•

Robust, self-organized and distributed manner.

•

Complex method.

Hybrid techniques

•

Efficient approach to classify rules accurately.

•

Computational cost is high.

Dasgupta and Attoch-Okine [68] compared network and negative selection-based approaches to AIS design. Forrest et al. [69] proposed a self-oneself model that probabilistic individual antibodies do not interact. Several applications of AISs were considered, including anomaly detection, fault diagnosis, pattern recognition and computer security. In the field of computer security, they discussed virus detection and process anomaly detection, with several different approaches. Changes in behavior in UNIX processes can be detected through short-range correlation of process system calls. Detection of viruses can be done through detecting changes to files. Another way is using a signature-based approach and monitoring decoy programs, observe how they were changed, and build signatures from this for the main system. Another approach proposed by Aickelin et al. [70] discussed the application of danger theory to intrusion detection. They target to design a computational model of danger theory, which they consider critical in order to define, explore, and discover danger signals. They tried to build novel algorithms and used them to design an intrusion detection system with a low false positive rate. The correlation of signals to alerts and alerts to scenarios is considered particularly important. Their proposed system that was built on previous work in immunology by Matzinger [71] and work on attack correlation, collects signals from hosts, the network and elsewhere, and correlates these signals with alerts. The classification of alerts is good or bad in parallel to biological cell death by apoptosis and necrosis. Apoptosis is the process of cells death as a natural course of events. On the other hand, necrosis is opposite of that, where cells die pathologically. They wish that alerts could also be correlated to attack scenarios. The origination of these signals is not yet clear, and they will probably come from a combination of host and network sources. Traffic normalizers are some examples, i.e., a device that sits in the traffic stream and corrects potential ambiguities in this stream, packet sniffers. The danger algorithm is, however, yet used as the correlation algorithm. Whether the system actively responses to attacks is also not clearly yet. At the end, Aickelin et al. [70] concluded that if this approach works, the scaling problems of negative selection should be solved, but a large amount of research still remains to be done.

In [72], an adaptive immune system mechanism through unsupervised machine learning approaches is proposed to classify network traffic into either self (“normal”) and non-self-profiles (“suspicious”). Their approach distributes the NIDS among all connected network segments, allowing NIDS in each segment to identify potential threats individually and enabling the sharing of identified threat vectors between the communicating distributed NIDSs [72].

•

Negative selection algorithm: The mammalian immune system to select and elimination of harmful pathogens from the body is the basis of Negative Selection Algorithm. The human immune system produces gene libraries that develop antibodies through the gene expression process, which attach to pathogens and neutralize their impact on the body. The signature of the antigen is recognized and the population of antibodies is consulted to discover the antibody that is genetically designed to combat that antigen.

•

Clonal selection algorithm: The Clonal Selection Algorithm is the result of the Clonal Selection theory of immunity. The human body is successful at protecting the body from foreign pathogens that cause illness, it can be simulated in a security context to prevent and detect computers. In network intrusion, the artificial immune model consists of three different evolutionary stages Negative Selection, Clonal Selection, and gene library evolution. When an antigen is exposed, the bone marrow cells produce antibodies, which divide and secrete plasma cells through mitosis. After the response of these cells to the antigen, and store memory cells, which identify the type of antigen after the initial exposure. By reacting to this pathogen faster and skipping the detection phase of the response, these memory cells improve the speed of the response process. In artificial immune system, by creating artificial cells that treats like real body cells to detect the pathogen, and clone artificial antibodies to overwhelm the pathogen and perform the artificial response to eliminate the virus. The system is able to do self-detection, and implements an automated response to return to a normal state [5].

3.1.2.10. Hybrid techniques

Hybrid techniques use the combination of two or more techniques mentioned above. NeGPAIM [73] is based on a combination of two low level components including fuzzy logic for misuse detection and neural networks for anomaly detection, and also one high level component which generates central engine analyzing outcome of two low level components. One advantage of this technique is that it does not need dynamic updates of rules [74] presented an approach which uses a combination of Naive Bayes, ANN and Decision Tree (DT) classifiers on three separate sets of data input in order to better the performance of IDS. It is beneficial to use soft computing techniques on traditional IDS for Cloud environment. However, each technique has some pros and cons, which affect the performance of IDS. For example, higher time consumption to learn ANN network and less flexibility are the major disadvantages of ANN. Combining fuzzy logic to data mining techniques promote flexibility. GA with fuzzy logic improves performance of IDS, since GA selects best-fit rules for IDS. For matching patterns in a specific manner rather than general GA, it has better performance. In [75], an integrated fuzzy GNP rule mining with distance-based classification was proposed for network intrusions with high accuracy of detection rate. A density-based fuzzy imperialist competitive clustering algorithm was proposed by [76] under hybrid clustering method to improve the accuracy of intrusion detection.

3.1.3 Summary of IDS/IPS techniques

As we discussed above, each technique has used widely and has some pros and cons. In Table 1, a brief description of each technique characteristics is shown.

3.2 Fraud detection

Fraud is an intentional deception with the aim of obtaining financial gain or causing loss by implicit or explicit trick [77]. Fraud is a public law violation, in which the fraudster tries to reach an unlawful advantage or causes irreparable damage. The approximation of damage losses made by fraud activities shows that fraud costs a very considerable amount of money. Statistics from the Internet Crime Complaint Center show that there has been a significant rising in reported fraud in last decade [7]. Financial fraud losses across payment cards, remote banking and cheques totaled £755 million in 2015, an increase of 26 percent compared to 2014. Prevented fraud totaled £1.76 billion in 2015. This represents incidents that were detected and prevented by the banks and card companies and is equivalent to £7 in every £10 of attempted fraud being stopped [78]. Fraud detection involves detecting infrequent fraud activities among numerous legitimate transactions as fast as possible. Fraud detection methods are developing rapidly in order to adapt with new incoming fraudulent strategies across the world. However, development of new fraud detection techniques becomes more difficult due to the severe limitation of the ideas exchange in fraud detection. The amount of fraudulent transactions is usually a very low share of the total transactions. Therefore, the detection of fraud transactions in an accurate and efficient way is difficult and challengeable. Hence, implementing an efficient method, which can distinguish rare fraud activities from billions of legitimate transaction, looks essential.

3.2.1 Types of fraud techniques

The classification of frauds can be broadly categorized into traditional card related frauds, merchant related frauds and Internet frauds.

Merchant related frauds: The methods used for committing credit card frauds are described below:

•
Merchant collusion: Merchant collusion is a type of fraud which is done and operates by merchant owners or their employees scheme to commit fraud using the cardholder accounts or by misusing the personal information. They take the information about cardholders to fraudsters.
•
Triangulation: This type of fraud occurs when products or goods are offered at heavily discounted rates and are also shipped before payment at websites. The buyer while browse the site and place the online information such as name, address and valid credit card details to the site if he likes the offer. When these details received from fraudsters, they order it from a legitimate site using stolen credit card details. The fraudsters by using the credit card information purchase their needs then.
•
Internet related frauds: The internet brings fraudsters a base to make the frauds in the simplest and the easiest way. With the development of trans-border, economic and political spaces, a new worlds market has become available on the internet, capturing consumers from most countries around the world. Most commonly used techniques in Internet fraud are as follows [79]:

–
Site cloning: When fraudsters close an entire site or just the pages from which the customer made his purchases. The details entered by customer will be received and send the customer a fake receipt of the transaction through the email just as the real company would do.
–
False merchant sites: For completing details such as name and address to access the webpage where the customer gets his products will be required. Many of these sites claim to be free, but a valid credit card number to confirm an individual’s age will be needed.
–
Credit card generators: Some computer programs are capable of generating valid credit card numbers and expiry dates. From a single account number, these generators work by creating lists of credit card account numbers. This generator is used by card issuers to generate other valid card number combinations.
–
Lost/Stolen cards: This is the easiest way for the fraudsters where he gets the information of the cardholders by finding or stealing a credit card without requiring any on the modern technology.
–
Account takeover: When the valid customer’s personal information is taken by the fraudsters, it happens.
–
Cardholder-not-present (CNP): CNP transactions are done only on the internet that do not need any physical presence of card or cardholder at the point-of-sale. This takes many kinds of transactions such as orders taken over the phone or Internet, by mail order or fax. In such transactions, retailers are unable to physically check the card or the identity of the cardholder, which makes the user anonymous and able to hide their real identity.
–
Fake and counterfeit cards: This is another kind of fraud where the counterfeit cards with lost or stolen cards pose highest rank in credit card frauds. Fraudsters always try finding new and more creative ways to create counterfeit cards. Some of the techniques used for creating false and counterfeit cards are as follows:
–
Erasing the magnetic strip: When the fraudsters erase the magnetic stripe by using the powerful electro-magnet tools, it happens. The fraudster then misuses the details on the card and they match the details of a valid card, which they may have attained. For example, when the fraudster starts using the card, the cashier will swipe the card through the terminal several times, before understanding that the metallic strip does not work properly.
–
Creating a fake card: Sophisticated machines have become available for creating a fake card from using the scratch. Although fake cards require a lot of effort and skill to be produced, it is a common fraud. Modern cards are having many security features designed for making it difficult for fraudsters to make good quality fraudulent. Holograms are one of the best prevention methods in the credit cards that makes very difficult to forge cards effectively.
–
Skimming: One of the most popular and fast emerging types of credit card fraud is skimming. Skimming is involved in most Counterfeit fraud cases. When the actual data on a card’s magnetic stripe is electronically copied onto another, it is called skimming. Fraudsters carry pocket skimming devices, a battery-operated electronic magnetic stripe reader, in order to swipe customer’s cards to get hold of customer’s card details.
–
Phishing: This type of fraud used to steal a person’s identity. It is commonly performed via spam e-mail or pop-up windows. Phishing happens when a malicious person transmitting lots of false e-mails. The e-mails seem to be from a website or company you trust, for example your bank. The message tells you to cooperate the company with your personal details consisting your payment card details. They can assert that the reason for this is a database crash or things like that.

According to fraud facts report [80], the behavior of fraud losses of cards changed from 2006 to 2015. Figure 7 shows the types of losses percentage. There are different types of fraud. We will categorize them according to their applications as follows:

Definition of credit card fraud can be “Unauthorized account activity by a person in which the account was not pre designated” [79].

The payment card industry has grown exponentially during last few years. Fraud tends to be committed to certain patterns and that it is possible to detect such patterns, and as a result fraud.

Figure 7.
(a) Credit card losses by type at 2015; (b) Credit card losses by type at 2006.

When a particular consumer uses its credit card, a fixed pattern of credit card usage exists, that is made by the way consumer uses its credit card. Using the last one or two years, data can be used for training neural network about the particular pattern of using a credit card by a particular consumer. The neural network can be trained based on information regarding to various categories about the card holder, such as occupation of the card holder, income, occupation may fall in one category while in another category information about the large amount of purchased are placed, this information include the number of large purchase, frequencies of large purchase, location where these kinds of purchase are taken place etc. within a fixed time period [79].
3.2.2 Credit card fraud detection techniques

Fraud detection techniques can be classified into supervised based, unsupervised based and statistical based techniques as shown in Fig. 8.

In supervised learning, for designing models, samples of both fraudulent and non-fraudulent records, associated with their labels are used. These techniques are mostly used in fraud analysis approach. Back Propagation Network (BPN) is one of the most popular supervised neural networks. A multi-stage dynamic optimization method, that is a generalization of the delta rule, is used to minimize the optimize function [81, 82, 83]

Figure 8.

Credit card fraud detection techniques.

It has been observed that credit card fraud detection has two highly specific characteristics. First, obviously the very limited time span in which the acceptance or rejection of decision has to be made. Second, the large amount of credit card operations that should be processed at a given time [279].

Three layers BPNin combination with genetic algorithms (GA) for credit card fraud detection were proposed by [84]. In this work, genetic algorithms were used to make a decision about the network architecture, dealing with the network topology, number of hidden layers and number of nodes in each layer.

In another work presented by [85], a parallel granular neural network (GNN) is used to accelerate data mining and knowledge discovery process for credit card fraud detection. GNN is one type of Fuzzy Neural Network on the basis of Knowledge Discovery (FNNKD). The average training errors were less in the presence of larger training data set.

In [86], a CNN (Convolutional Neural Network)-based fraud detection framework was used to capture the inherent patterns of fraud behaviors learned from labeled data. Many transaction data is represented by a feature matrix, on which a convolutional neural network is implemented to identify a set of hidden patterns for each sample. They stated that experiments on real-world massive transactions of a major commercial bank show its superior performance in comparison with some state-of-the-art methods. In order to implement a CNN to credit card fraud detection, we need to transform features into a feature matrix to fit the CNN model. It includes over 260 million transactions of credit cards in a year. About four thousand transactions are labeled as frauds and the rest are legitimate transactions. The CNN model compared with SVM and some neural networks provides better performance.

The other type is unsupervised techniques. In this type, the previous knowledge of fraudulent and normal records in unsupervised learning is not essential. These methods increase alarm rate for those transactions, which are most different from the normal ones. These techniques are mostly used in user behavior approach. ANNs can provide a good result, if enough large transaction data set exists. They require a long training data set. Self-organizing map (SOM) is one of the most popular unsupervised neural networks learning which was proposed by [87]. It operates in two phases: training and mapping. In the first phase, the map is built and weights of the neurons are updated iteratively, based on input samples. In the second phase, test data is classified automatically into normal and fraudulent classes through the procedure of mapping [88]. After training the SOM, new unseen transactions are fed to the network and it classifies into normal and fraud clusters, if it is similar to all normal records, it is classified as normal. The same procedure happens for fraud transactions either.

3.2.2.1. Artificial immune system (AIS)

The two major response of immune and defense are as follows: innate immune response and acquired immune response. The body’s first response defense consists of the outer, unbroken skin and the “mucus membranes” lining internal channels, such as the respiratory and digestive tracts. Acquired immunity starts defending whenever the harmful cells could pass through the innate immune defense. In fact, adaptive immune response based on antigen-specific recognition of almost unlimited types of infectious substances carries out, even if previously unseen or mutated.

The acquired immune response is mighty of “remembering” every infection; therefore, a second exposure to the same pathogen is dealt with more efficiently.

AIS is a recent sub-field based on the biological metaphor of the immune system [89]. The immune system can identify self and non-self-cells, or more specific, between harmful cells (named pathogens) and other cells. The ability to recognize differences in patterns and capability of detecting and eliminating infections precisely has possessed the engineer’s intention in all fields. The main concepts that have used are negative selection algorithm, immune networks algorithm, clonal selection algorithm, and the dendritic cells algorithm.

Table 2

Comparison of NN, GA and AIS methods, which are mostly used in credit card fraud detection

	Genetic algorithm	Neural network	Artificial immune system
Components	Chromosome strings	Artificial neurons	Attribute strings
Location of components structure	Dynamic	Pre-defined	Dynamic
Structure	Discrete components	Networked components	Discrete components/networked components
Interaction with environment through	Fitness function	External stimuli	Recognition/objective
Threshold activity	Crowding/sharing	Neuron activation	Function component affinity

3.2.2.2. Negative selection

NSA (Negative Selection Algorithm) which was proposed by [90] is a change detection algorithm based on the T-Cells generation process of biological immune system. Since it was first designed, it has attracted many researchers’ attention in AIS and has gone through some phenomenal evolution. NSA has two stages: generation and detection. In generation phase, the detectors by trying some random processes are made and censored by trying to match self-sample. Those entrants that match (by the affinity of higher than affinity threshold) are removed and the rest are known as detectors. In detection stage, the set of detectors are used in validating whether an incoming data instance is a self or non-self. If it matches (if its affinity was higher than affinity threshold) any detector, it is claimed as non-self or anomaly.

3.2.2.3. Clonal selection

This theory is proposed for the immune system to describe the basic features of an immune response to an antigenic motivation. The selection mechanism ensures that only those clones (antibodies) with higher affinity threshold for the encountered antigen will pass. On the basis of clonal selection principle, clonal selection algorithm was initially designed in [91] and formally described in [92]. The general algorithm was called CLONALG.

Gadi et al. [91] used AIRS in the field of transactional credit card fraud detection in [91]. AIRS is a classification type algorithm that is based on AIS, which uses clonal selection to create detectors. AIRS creates detectors for all of the classes in the database and uses k-Nearest Neighbor (KNN) algorithm in detection stage in order to classify each record.

Soltani et al. proposed AIRS on credit card fraud detection in [93]. According to long training time of AIRS, authors have designed the model in cloud computing environment to shorten the time. MapReduce API had been used which works based on Hadoop distributed file system, and generates the algorithm in parallel.

RamaKalyani and UmaDevi proposed a model for credit card fraud detection based on the principles of GA in [94]. The aim of the approach was developing a synthetizing algorithm for creating test data and detecting fraudulent transaction.

Table 2 depicts a comparison of structure between three methods, which are mostly used in credit cards that fraud detects them briefly.

3.2.2.4. Hidden markov model (HMM)

An HMM is a double embedded casual process, which is implemented to model much more complicated casual process in which the model is implemented in a much more complicated casual process in comparison to a traditional Markov model. The base of the system is assumed to be a Markov process with unseen states. In simpler Markov models like Markov chains, states are deterministic transition probabilities are only unknown parameters. On the other hand, the states of an HMM are hidden, but state dependent outputs are visible. In fraud detection of credit card transaction, an HMM is trained for modeling the normal behavior encoded in user profiles. In this model, a new incoming transaction will be classified as fraud if it is not accepted by the model with high enough probability. Each user profile contains a set of information about 10 recent transactions of that user such as time, category and amount for each transaction. HMM creates high false positive rate [95]. Bhusari et al. [56] utilized HMM for detecting frauds with low false alarm.

3.2.2.5. Support vector machine (SVM)

SVM [57] is a supervised learning approach with associated learning algorithms that is capable of analyzing and recognizing patterns for classification and regression tasks. SVM is a binary classifier. The underlying idea of SVM was discovering an optimal hyper-plane, which distinct instances of two given classes, linearly. This hyper-plane was assumed to be located in the gap between some marginal instances called support vectors [77].

SVM has been implemented successfully to a wide range of applications. In credit card fraud detection, Ghosh and Reilly [96] proposed a model using SVMs. In this research, a three-layer feed-forward neural network was used for detecting fraudulent credit card transactions through only two passes required to check a fraud score in every two hours.

Chen et al. [97] implemented a Binary Support Vector System (BSVS), in which support vectors were chosen by GA. In this model, SOM was first tried to obtain a high true negative rate and after that, BSVS was used to improve training the data according to their distribution.

In [98], a classification model based on decision trees and SVM were designed for detecting credit card fraud. In this work, a comparative study between SVM and decision tree approaches in credit card fraud detection with a real data set was presented. The results revealed that the decision tree classifiers such as CART (Classification and Regression Tree) perform much better results for SVM in solving the problem.

3.2.2.6. Bayesian network

A Bayesian network is a graphical model that shows conditional dependencies between random variables. The basic graphical model is directed acyclic graph. Bayesian networks are effective for identifying unknown probabilities given known probabilities in the presence of uncertainty. Bayesian networks can play an important role in modeling conditions where some basic information is already known but incoming data is uncertain or partially unavailable [99]. The aim of using Bayes rules is often used for the prediction of the class label related to a given vector of features or attributes. Bayesian networks have been used successfully for different fields of interest for instance churn prevention [100] in business, pattern recognition in vision [101], generation of diagnostic in medicine [102] and fault diagnosis. Besides, these networks have been implemented to identify anomalies and frauds in the field of credit card transactions or telecommunication networks.

3.2.2.7. Expert systems

Rules can be created from information, which are gained from a human expert and saved in a rule-based system as IF-THEN rules. Knowledge base system or an expert system is the information, which is cumulated in knowledge base. The rules in the expert system are used in order to do operations on a data to conclude to reach appropriate inference. Financial analysis and fraud detection are one example of the areas, which it can be used. By using expert system doubtful activity or transaction can be detected from deviations from “normal” spending patterns [103].

3.2.2.8. Fuzzy darwinian system

One type of Evolutionary-Fuzzy system named Fuzzy Darwinian Detection [104] generates GP for improving fuzzy rules. This system includes GP part combined with the fuzzy expert system. It achieved the results of very high accuracy and low false positive rate in comparison with other, but it is extremely expensive.

3.2.2.9. Statistical distribution based methods

Statistical Distribution based techniques are basically built on key assumption that is [5]: “Normal data instances happen in high likelihood regions of a model, while anomalies happen in the low likelihood regions of the model.” According to [5], there are two types of statistical methods. These are Parametric and Nonparametric.

The assumption of parametric techniques is that the normal data is created by a parametric distribution with parameters $\theta$ and probability density function $f(x,\theta)$ , where $x$ is an observation. The inverse of the probability density function, $f(x,\theta)$ is the anomaly score of an observation $x$ . The parameters $\theta$ are estimated from the given data.

The structure of the model in non-parametric statistical techniques is undefined, however, it is determined from given data instead. Such techniques usually provide fewer norms regarding the data, such as density smoothness, in comparison with parametric techniques. The easiest non-parametric statistical technique is using histograms to keep up a profile of the normal data.

3.2.2.10. Markov chain model

Markov Chain Model is one of the statistical based models proposed in detecting anomalies [105] implemented a discrete-time stochastic process to show how a random variable varies at discrete points in time. A Markov chain model is used to show a temporal profile of normal behavior in a computer and network system. The learning model is from historical data of normal profile behavior. Let $X_{t}$ explain a random variable showing the state of a system at time $t$ , where $t=0,1,2,\ldots$ . A stationary Markov chain is a special type of discrete time stochastic process with the following assumptions [106]:

The probability distribution of the state at time $t+1$ depends on the state at time $t$ , and does not depend on the previous states of time $t$ .

A state transition from time $t$ to time $t+1$ is not dependent on time. Let $p_{ij}$ explain the probability that the system is in a state $j$ at time $t+1$ given the system is in state $i$ at time $t$ . The stationary Markov chain can be defined as follows if the system has a finite number of states, 1, 2, …, $s$ , by a transition probability matrix [106]:

$p=\begin{bmatrix}p_{11}&p_{12}&p_{13}&\ldots&p_{1s}\\ p_{21}&p_{22}&p_{23}&\ldots&p_{2s}\\ \ldots&\ldots&\ldots&\ldots&\ldots\\ p_{s1}&p_{s2}&p_{s3}&\ldots&p_{ss}\\ \end{bmatrix}$

and an initial probability distribution [106]:

$Q=[q_{1}∼{}q_{2}∼{}\ldots∼{}q_{s}]$

where $q_{i}$ is the probability that the system is in state $i$ at time 0, and:

$\sum^{j=s}_{j=1}p_{ij}=1$

The probability that a sequence of states $X_{1},\ldots,$ $X_{T}$ at time $1,\ldots,T$ occurs in the context of the stationary Markov chain is computed as follows:

$P(X_{1},\ldots,X_{T})=q_{x_{1}}\prod^{T}_{t=2}P_{X_{t-1}X_{t}}.$

The disadvantage of statistical methods does not certify that all anomalies will be found for the cases where no specific test was developed.

Some of the main approaches as mentioned above (e.g., Bayesian, Neural Network, SVM and etc.) can be categorized into three groups [107].

3.2.3 Proximity based (or nearest neighbor based) anomaly detection

In [107], proximity and distance terms implemented to represent similarity and differences are the key approaches applied for detection of anomalies in any network. Proximity based anomaly detection techniques analyze each object with respect to its neighbors. Normal data objects and their neighbors have close proximity, i.e., following a dense neighborhood pattern that anomalous objects deviate far away from their nearest neighbors. Proximity based techniques can be mainly categorized into distance based and density based [108, 107].

Distance based techniques compute the anomaly score by implementing the deviation of a data object to its $k$ neighbors. Distance based anomalies also described as “global anomalies”. The multi-dimensional space in grid based approach is proposed by [109] to apply the high dimensional data more effectively. Instead of implementing the techniques to full dimensional space, high dimensional spaces can be decreased to low dimensional space using dimensionality reduction technique. For extracting the lower dimensional space, Principal Component Analysis (PCA) is one of the most important feature extraction techniques is used. The application of PCA is a type of correlation based clustering methods. One of the easiest approaches toward it is to calculate sparsity coefficient. The major problem related to distance based techniques is its failure to detect local anomalies, which can be simply overcome by density-based methods.

Density based techniques calculates the score of an anomaly by using the relative density of each data object. These techniques work by the density of an object and density around its neighbors. For a normal object, densities are supposed to be same whereas for anomalous objects they are different. The notion of relative density is usually used to measure the degree of anomalous behavior of an object [110] proposed Outlier Detection implementing In-degree Number (ODIN) score of an object. In-degree number score is the number of $k$ -neighbors of an object, which this particular object is a $k$ -nearest neighbor either. If any of them does not follow the rule, it will be considered as an anomalous score. Local Outlier Factor (LOF) computation proposed in [111, 112] is one of the most attractive density based anomaly detection methods. LOF score is the proportion of local accessibility density of $k$ -neighbors of object ‘ $o$ ’ being evaluated with that of its own. It is also a factor of both $k$ -nearest neighbors of object ‘ $o$ ’ and the accessibility distance measure. LOF score of an anomalous object is higher as relative density of an anomalous node is lower than that of its neighbors. However, for normal data objects both densities are nearly similar. Influential Outlier (INFLO) [113] proposed reverse $k$ nearest neighbors set (RNNk) to catch all those points which have object $o$ in its neighborhood set. Calculation of anomalous score in these techniques is in similar to LOF with certain added terminologies.

Table 3
Advantage and disadvantages of anomaly detection approaches [107]

Approach	Advantages	Disadvantages
Proximity based	• Simplest and easiest data mining approach. • Applicable to a number of domains.	• Handling and detection of anomalies become difficult when several regions with widely differing densities exist. • Difficult to detect the group of anomalies, if they are present close to each other. • Highly dependent on the proximity measures used for their efficient working which might not be available in certain situations.
Cluster based	• Unsupervised nature where no predefined set of labeled classes of data objects is required. • Consisting of quick comparison process whenever clusters are constructed.	• Causing high computational cost when the clusters are to be found before detecting anomalies. • A data object not depending to any cluster may be considered as noise rather than an anomaly. • Computational complexity. • Costly procedure for large data sets. • Anomalies follow an assumption to be belonging to either no cluster or a small cluster, hence, objects in the above encountered clusters might be considered as normal.
Classification based	• Providing improvement of the efficiency especially when ensemble techniques incorporating integration of a number of classifiers.	• Heavy dependency and reliability on training data. • Observing an imbalanced class issue in which just few objects show the main class.

Approach

Advantages

Disadvantages

Proximity based

•

Simplest and easiest data mining approach.

•

Applicable to a number of domains.

•

Handling and detection of anomalies become difficult when several regions with widely differing densities exist.

•

Difficult to detect the group of anomalies, if they are present close to each other.

•

Highly dependent on the proximity measures used for their efficient working which might not be available in certain situations.

Cluster based

•

Unsupervised nature where no predefined set of labeled classes of data objects is required.

•

Consisting of quick comparison process whenever clusters are constructed.

•

Causing high computational cost when the clusters are to be found before detecting anomalies.

•

A data object not depending to any cluster may be considered as noise rather than an anomaly.

•

Computational complexity.

•

Costly procedure for large data sets.

•

Anomalies follow an assumption to be belonging to either no cluster or a small cluster, hence, objects in the above encountered clusters might be considered as normal.

Classification based

•

Providing improvement of the efficiency especially when ensemble techniques incorporating integration of a number of classifiers.

•

Heavy dependency and reliability on training data.

•

Observing an imbalanced class issue in which just few objects show the main class.

3.2.4 Cluster-based anomaly detection

In cluster-based approach, often anomalies belong to a small sparse cluster or do not belong to any cluster while the normal objects are a section of large and dense clusters. Clusters of the data objects can be implemented by trying numerous approaches such as, Two-Step Anomaly Detection Approach Using Clustering Algorithm [114], Online clustering for evolving data streams with online anomaly detection [115], Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations [116], K-Means, K-Medoids for small data sets, CLARANS [117] and CLARA [118] for large data sets and Chameleon [119], BIRCH [120] for doing macro clustering on micro clusters. In cluster based anomaly detection methods, if the object does not belong to any cluster, the density based clustering methods can be applied like DBSCAN. DBSCAN proposed in [121] investigates the density around each object and the one being isolated or of less density than others is considered as an anomaly. This approach can detect the clusters with arbitrarily any shape. A number of modified variants of DBSCAN such as, FDBSCAN [44], L-DBSCAN [122], C-DBSCAN [123], P-DBSCAN [124], and TI-DBSCAN [125], NG-DBSCAN [126], DSET-SCAN [127] have also been implemented to detect the anomalies effectively. If the distance between object and cluster to which it is closest is large, then methods mentioned above can provide a better way to detect the anomalies. However, they concentrate more in order to discover the clusters and consider any point not related to any cluster as noise, which in a way is assumed to be anomalous. For solving such problems, numerous advanced methods as Cluster based Local Outlier Factor (CBLOF) and the corresponding algorithm FindCBLOF [128] are designed to find the encountered anomalies. Some of other methods like SOM as an unsupervised method designed by Kohonen [129], $k$ -means clustering [130], $k$ -means $++$ . [131] have been implemented. If the object is a part of small or sparse cluster, then not only the object but also all the objects depending to that cluster are considered as anomalous. This situation is used by defining a threshold value for the clusters and the objects belonging to low value clusters are considered as anomalous [107]. FindCBLOF algorithm proposed in [128] detects both the individual objects and points belonging to small clusters as anomalous by computing the objects’ similarity among the small cluster and the closest large cluster.

3.2.5 Classification based approaches

Classification in [54] is defined as a supervised method with two steps, i.e., a learning step and a classification step. In the learning stage, a training set of labeled data instances are applied to construct a classification model and in the classification step, the designed model is used to predict the class labels for the data. Both the steps are stated as the training and the test stages. For detection of anomalies, the training data objects are labeled as ‘normal’ and ‘anomalous’.

Table 4
Data sets used by researchers

Collection form	Amount	Used in	Methods
Large Brazilian bank, with registers within time window between Jul/14/2004 through Sep/12/2004 (real data set)	• 41,647 transactions. • 3.14% fraudulent transactions.	2008 [125]	AIS
Financial institute in Ireland (WebBiz) (real data set)	• 4 million transactions from 462,279 unique customers. • 5,417 fraudulent transactions.	2010 [126]	AIS
Hong kong bank, with registers within time window between January 2006 to January 2007 (13 month) (real data set)	50 million credit card transactions on about one million (1,167,757 credit cards) credit cards from a single country	2010 [127]	ANN tuned by GA
Chase bank and first union bank (real data set)	• Each bank supplied 500,000 records spanning one year. • 20% fraud and 80% non-fraud distribution for Chase Bank. • 15% versus 85% for First Union Bank.	1999 [128]	AdaCost which is a variant of AdaBoost
Major US bank (real data set)	• 6,000 credit card data with 64 predictor variables plus 1 class variable. • 84% of the data are normal accounts and 16% are fraudulent accounts.	2005 [129]	Multiple criteria linear programming
Large Australian bank (real data set)	64,0361 total transactions, with 21,746 credit cards	2012 [134]	AIS
Vesta Corporation (Vesta corporation is an innovator and worldwide leader in virtual commerce with headquarter in Portland, Oregon, USA) (real data set)	• 206,541 transactions. • 204,078 transactions are normal and 2,463 ones are fraudulent.	2012 [130]	ANN
Mellon bank (real data set)	• 1,100,000 transactions. • authorized in two months period.	1994 [96]	ANN
German bank	• 280,000 transactions authorized in four months. • There are about 500 fraudulent transactions.	2015 [135]	ANN
Synthetically generated data	• 320,000,000 transactions. • 1,050 credit card. • 42 features.	2010 [131]	GA
Synthetically generated data	• 1,000,000 transactions. • 20 features.	2012 [89]	GA
Synthetically generated data	The data are extracted into a flat file from SQL server database containing sample Visa Card transactions and then preprocessed.	2002 [78]	ANN

In one class model, only a single labeled class is defined, i.e., classifier is designed to just define the normal class and all those data objects that belong to that class are act as normal whereas the ones that do not fit in the defined class behave as anomalies. Some examples of one class models applied for anomaly detection are one-class SVM [132], Gaussian model description (GAUSSD) [133], Principal Component Analysis Description (PCAD) etc.

If data objects do not belong to a single class, it belongs to multiple classes. A number of classifiers are available for the classification issues. Some of them are capable of anomaly detection discussed above.

Summary of pros and cons of each approaches mentioned above have been depicted in Table 3.

3.2.6 Challenges of credit card fraud detection

An effective frauddetection technique should be capable of addressing these problems in order to gain the best performance [77].

•
Imbalanced data: The credit card fraud detection data has imbalanced nature. It means that just small portion of all credit card transactions are fraudulent. The results in the detection of fraud transactions are very difficult and imprecise rate of fraud.
•
Different misclassification importance: Various misclassification errors have various importance. Misclassification of a normal transaction as fraud is not as harmful as detecting a fraud transaction as normal.
•
Overlapping data: Many transactions may be considered as fraudulent, while they are not false positive, a fraudulent transaction may also seem to be normal (false negative). Hence gaining a low rate of false positive and false negative is a key challenge of fraud detection systems.
•
Lack of adaptability: Classification algorithms are commonly faced with the issue of finding new types of normal or fraudulent patterns. The supervised and unsupervised fraud detection systems are inefficient in detecting new patterns of normal and fraud behaviors.
•
Fraud detection cost: The system should take into account both the cost of fraudulent behavior that is detected and the cost of preventing it.

3.2.7 Data sets

The mentioned methods in any field definitely need a credible data set for testing, and examine efficiency in comparison to other’s related work. The lack of publicly available database has been a limiting factor for the publications on financial fraud detection, particularly credit card transactions.

Table 4 brings you some credit card fraud detection data set, which had used real or synthetically generated data set.

3.2.8 Evaluation

A variety of measures for various algorithms has been implemented for evaluation. False Positive (FP), False Negative (FN), True Positive (TP), and True Negative (TN) and the relation between them are four well-known quantities which usually adopted by credit card fraud detection researchers for comparison of different approaches’ accuracy. The mentioned parameters are defined as follows:

•
FP or false positive rate: The portion of the non-fraudulent transactions, which are classified as fraudulent transactions wrongly.
•
FN or false negative rate: The portion of the fraudulent transactions, which are classified as normal transactions wrongly.
•
TP or true positive rate: The portion of the fraudulent transactions, which are classified as fraudulent transactions correctly.
•
TN or true negative rate: The portion of the normal transactions, which are classified as normal transactions correctly.

Table 5 shows the details of the most well-known formulas which are used mostly for evaluation of the methods.

Table 5
Credit card fraud detection evaluation measures

Measure Formula

Accuracy (detection rate) (TN $+$ TP)/(TP $+$

FP $+$ FN $+$ TN)

Precision (hit rate) TP/(TP $+$ FP)

Sensitivity (true positive rate) TP/(TP $+$ FN)

False positive rate FP/(FP $+$ TN)

Septicity (true negative rate) TN/(TN $+$ FP)

3.3 Insider trading detection

Measure	Formula
Accuracy (detection rate)	(TN $+$ TP)/(TP $+$
	FP $+$ FN $+$ TN)
Precision (hit rate)	TP/(TP $+$ FP)
Sensitivity (true positive rate)	TP/(TP $+$ FN)
False positive rate	FP/(FP $+$ TN)
Septicity (true negative rate)	TN/(TN $+$ FP)

Insider trading is one of the many white-collar crimes that can contribute to the inconsistency of the economy. Commonly, the finding illegal insider trades have been a human-driven procedure. Formally, insider trading is not always illegal. Insider in a company consists of either Officers (CEO, CFO), large shareholders ( $>$ 10%) or members of the Board of Directors. For these people, a large amount of their compensation comes from stock and some awards. This process becomes illegal when the insider leverages information that only he or she may obtain for trade stock at an illegal benefit. The statistics are not enough to measure ethical behavior lonely.

Engelen [136] tried a case-based method and measured the attempted prosecution by Belgian authorities of insider trading around a revenue declaration in Bekaert, NV. In a case that was examined, because it was one of the first cases trying to prosecute insider trading under a new law, the Appeals Court adjudicated that the information used by the accused parties was not valued relevant and hence was not privileged because it had already been revealed that the firm would pay a revenue and revenues have been found to be irrelevant to the markets.

The available data is gathered from several heterogeneous sources such as stock trading data, option-trading data, and news. The data has temporal associations since the data is collected continuously. The temporal and streaming nature has also been exploited in certain techniques [137].

In [138], the collected data were used to construct networks which capture the relationship between trading behaviors of insiders. They compared the reported price of the transaction (purchase or sale) with the market closing price of the company’s stock on the same day of the transaction. This anomaly ranking can be used by investigators to prioritize cases for further analysis [138].

3.4 Stock market fraud detection

Stock fraud commonly occurs when brokers try to manipulate their customers into trading stocks without regard for the customers’ real interests. Stock fraud can happen at a company level, or by a single stockbroker. Corporate insiders, brokers, underwriters, large shareholders and market makers are likely to be manipulators [139].

[139] used an unsupervised technique named Peer Group Analysis (PGA) for fraud detection. The objective of PGA is to define the expected pattern of behavior around the target sequence in order to find the behavior of similar objects, and then to detect any variances in evolution among the expected pattern and the target.

The most important feature of PGA depends on its concentration on local patterns rather than global models; a sequence may include unusual in comparison to the whole population of sequences but may show unusual properties when compared with its peer group. That is, it may start to deviate in behavior from objects to which it has previously been similar.

According to [140], assume that we have observations of $N$ objects, each observation is a sequence of $d$ values, shown by a vector, ${\bm{x}}_{{i}}$ , of length $d$ . The $j^{\rm th}$ value of the $i^{\rm th}$ observation, $x_{ij}$ , happens at a fixed time point $t_{j}$ . Let PG ${}_{i}(t_{j})=$ [Some subset of observations ( $\neq\bm{x}_{i}$ ) which show behavior similar to that of ${\bm{x}}_{i}$ at time $t_{j}$ ], then PG ${}_{i}(t_{j})$ is the peer group of object $i$ , at time $j$ . The issue of discovering a proper number of peers depends on discovering the correct number of neighbors in a nearest-neighbor analysis.

3.5 Medical and public health anomaly detection

The fast growth of using electronic health records and computerized systems has led to newly evolving opportunities for better detection of fraud and abuse. Emerging new techniques in machine learning and artificial intelligence bring attention to automated methods of fraud detection. Combination of automated methods and statistical knowledge led to a newly emerging interdisciplinary branch of science that is named Knowledge Discovery from Databases (KDD) with the core of data mining.

According to the expansion of IT technologies, increased attention has concentrated on smart health service platforms to identify emergency situations relevance to chronic disease, telemedicine, silver care, and wellness. Moreover, there is a high plea for technologies which can properly judge a situation and provide suitable countermeasures or health information if an emergency situation happens [141]. With the emergence of IT convergence technologies, there has been a growing demand for different real-time smart health services. Smart health service detects and monitors a patient’s daily health status without considering location and transmits it to the medical information center. It is capable of providing disease information and guidelines for doctors to quickly and precisely diagnose patients with lifestyle diseases detection [142, 143] and tracing with a CCD (Charge-Coupled Device) camera in the convergence supervision sector are required. There are some issues for detecting and tracing real-time bio-images. Some examples are cases in when a camera moves, a part of the traced object is covered by another object, tracing fails, or the noise removal damages accuracy that is mentioned in [144]. Computer hardware and software have improved in terms of both quality and quantity. The user-interface has developed to incorporate three-dimensional graphics and real-time voice output. Hence, an innovative bio-interface related to users must be created. A mobile virtual interface, which can be shown through biometrics described in human motions, would play an important role in the interplay between computers and humans [141].

Supervised methods needs confidence in the right classification of the records. Moreover, they are beneficial in detecting previously known patterns of fraud and normal. Therefore, the models should be frequently updated to reflect new types of fraudulent behaviors and changes in the regulations and settings [145]. Examples of the supervised methods that have been implemented to health care fraud and abuse detection consists of decision tree like [146] that used six statistical techniques correlation analysis, logistic regression and classification tree for detecting provider fraud, and classification trees for finding provider fraud is another method used in [146, 147], genetic algorithms [148] and Support Vector Machine (SVM [149, 150].

Using decision tree, decision makers can choose best alternative and traversal from root to leaf indicates unique class separation based on maximum information gain. Decision Tree is widely used by many researchers in the healthcare field.

Another technique that is used in health anomaly detection is SVM. The support vector machine classifier creates a hyper plane or multiple hyper planes in high dimensional space that is used for classification, regression and other efficient tasks. Various kernel function such as polynomial, Gaussian, sigmoid etc., are used in this area.

Naïve Bayesian classifier has shown great performance in terms of accuracy so if attributes are independent of each other then we can use it in the medical field. Bayes theorem concentrates on prior, posterior and discrete probability distributions of data items. Bayesian Belief Network is widely used by many researchers in the healthcare field.

Table 6
Different techniques used for health anomaly detection

Author	Techniques	Disease
Shouman et al. [156]	K-NN classifier	Heart disease
Liu et al. [157]	Fuzzy k-NN	Thyroid disease
Zuo et al. [158]	Fuzzy k-NN	Parkinson disease
Khan et al. [159]	Decision tree	Breast cancer
Chien and Pottie [160]	Hybrid decision tree classifier	Chronic disease
Moon et al. [161]	Decision tree	Patterns of smoking in adults
Chang and Chen [162]	Decision tree	Skin diseases
Soliman et al. [163]	SVM classification	Classification of various diseases
Fei [164]	Hybrid PSO-SVM	Arrhythmia cordis
Huang et al. [165]	Hybrid SVM	Breast cancer
Avci [166]	Genetic SVM classifier	Heart valve disease
Abdi and Giveki [167]	Hybrid PSO-SVM	Erythemato-squamous diseases
Er et al. [168]	ANN	Chest diseases
Das et al. [169]	Ensemble neural network	Heart disease
Gunasundari and Baskar [170]	ANN	Lung diseases
Liu and West [171]	Bayesian belief network	Analyzing risks that are associated with health
Curiac et al. [172]	Bayesian belief network	Psychiatric patient data
Agrawal [173]	Weighted support vector regression (WSVR)	Monitoring the daily activities of patient
Lenert et al. [174]	K-means clustering	Health services of public domain
Belciug et al. [175]	Clustering techniques	Recurrence of breast cancer
Balasubramanian and Umarani [176]	Clustering techniques	Impact of ground water on human health
Escudero et al. [177]	K-means clustering	Alzheimer’s disease (AD)

Regression is used to find out functions that explain the correlation among different variables. A mathematical model is constructed using training data set. In statistical modeling, two kinds of variables are used where one is called dependent variable and another one is called independent variable and usually represented using ‘Y’ and ‘X’. There is always one dependent variable while independent variable may be one or more than one. Regression is a statistical method, which investigates relationships between variables. Logistic regression does not consider the linear relationship between variables [151]. Regression is widely used in medical field for predicting the diseases or survivability of a patient.

When fraudsters become notified about a particular detection method, they will change their strategies to prevent detection. As mentioned above, supervised methods in detecting previously known patterns of fraud and abuse are useful. In theory, we can implement unsupervised approaches to detect new types of fraud or abuse. Unsupervised methods usually evaluate one claim’s attributes relevant to other claims and determine how they are related to or differ from each other. Hence, it can clear sequence and association rules among records, define anomaly record(s) or group similar records [145]. Examples of the unsupervised methods that have been used for health care fraud are clustering [152, 153, 154], outlier detection [154, 155].

The detection system can be categorized based on data, structure, and the analysis method. The detection system can be classified into host-based detection and network-based detection systems. Host-based detection analyzes objects, time, and abnormal type based on life-logs.

Detection systems based on analysis method can be categorized into either misused detection or anomaly detection. Misused detection is applied to detect known attacks by targeting weakness of the system. Since researchers must consider various known-attack patterns for their detection system, a great amount of maintenance is required. Moreover, the system cannot find attacks that are entered into the system, which makes the system penetrable to emulated attacks or cheats. Anomaly detection makes a profile for normal behaviors, CPU usage, files attributes, and changed information. When an abnormal pattern is detected, it is compared with the previously saved profile. If the detected abnormal pattern reaches a threshold value, it is considered as an anomaly situation. As profiles are composed based on previous behaviors, the error rate is high since a proper decision cannot be made on current behavior patterns. Table 6 shows different techniques used for health anomaly detection.

3.6 Social network anomaly detection

With the increasing trend of online social networks in different domains, social network analysis has recently grown rapidly. Online Social Networks (OSNs) have obtained the interest of researchers for their analysis of usage as well as detection of abnormal activities.

Anomalous activities in social networks represent unusual and illegal activities showing various behaviors than others present in the same structure. Sometimes, it becomes hard to analyze the social networks since their large size and complex nature and it becomes necessary to prune the networks to consists just the most relevant and significant relationships. In some applications, the scope of which anomaly is present is considered by giving a degree of being an outlier to each object in the data set. For example, [111] called this degree as Local Outlier Factor (LOF). OSNs are often shown as graphs in which users are observed as nodes and interactions between users as edges, which can be either labeled, or not. Usually binary and static social links are considered in which only the presence of a link is considered sufficiently good but users’ actual communication activity is given no importance.

OSNs are often shown as graphs in which users are represented as nodes and interactions between users as edges, which can be either labeled, or not. Often binary and static social links are considered in which just the presence of a link is considered sufficiently good but users’ actual communication activity is given no importance [178].

The graph resulting from such networks consists of user interaction activities is named an activity graph [179, 180] which can either be a basic activity graph or a weighted one. Basic activity graph shows the graph in which every pair of nodes has the same type of edges irrespective of strong or weak ties in among them whereas weighted activity graph is the one in which strength of the active link is also taken into account.

It has been observed that for any type of social network, analysis of one or more of the three influence factors is targeted subject node (node influencing others), a tie or a social link (communication link between nodes) or an object node (the node being influenced [181].

The graphical delegation of the social networks leads to the applicability of different anomaly detection methods. Proximity or similarity measures defined in data mining methods do not seem much appropriate for social networks. In social networks may be defined on the following basis the same [107]:

•
Structure context-based similarity: It is a local cluster or neighborhood based similarity in a way that nodes having the same neighborhood are considered as similar. A number of mutual friends usually make similar decisions and provide determining the relation of them.
•
Similarity based on random walks: This type of similarity could be well known by this example. Suppose information or message requires to be forwarded to multiple users. However, at an initial stage, it is sent to just two users A and B who forward it to others. Now, the closeness or similarity could be captured by the simultaneous receipt of the message from both A and B to the nodes. Therefore, here similarity is shown as a random walk measure over the network.

Anomaly detection techniques in social networks can be categorized as follows [107].
3.6.1 Behavior based techniques

Behavior based techniques deal with the behavioral characteristics of the users. For example, the number and content of messages, the content of the items shared, the number of likes or comments on a post and duration of a conversation are some of the behavioral properties.

Content based filtering is one of the prominent and well-known behavior based approaches which anomalous behavior is detected by searching for the internal content of the sent and received messages. A trained classification model that may be implemented in the analysis phase is constructed using the content of the messages. For example, [182] proposed a Filtered Wall system which certain set of filtering rules were implemented by the users to prevent the unwanted and irrelevant posts from their walls. However, some smart malicious users are intelligent enough to deceive others by behaving similarly to the legitimate users. In social network scenario, Sybil attacks and cloning attacks are two of the quite popular attacks nowadays [183, 174, 174, 172, 172].

Even the complicated techniques like Honeypots implemented to detect the spammers fail to attract anomalous users in most of the situations. PCA is unsupervised statistical anomaly detection technique that was used by [184] for detecting the anomalous behavior in individuals. The evaluation was implemented on Facebook data set and a number of fake and compromised users were identified. The evaluation factors of normal and anomalous distributions were judged by observing the ‘like’ activities of the users, e.g., by observing the pages ‘liked’ by a user, the number of posts/pages liked at a particular period of time. The motivation for using this technique was the increment of fake Facebook like purchases, fake reviews for reviewing websites, followers on Twitter etc. Apart from these, an important contribution of this work was the detection of click spam highly prevalent nowadays in ads.

Moreover, the profile information of a user was used by Xiao et al. [185] to detect fake accounts in online social networks using certain supervised machine learning techniques for feature extraction and cluster building. The proposed technique is a faster and more efficient way to detect fake accounts as it only uses the attributes entered by a user during registration i.e., profile creation.

3.6.2 Structure based techniques

Structure based techniques work on the basic axiom of using structural properties to check the specification of normal and anomalous users. When a specific graph metric is discovered for different nodes or structures and the nodes showing different values than other users, then it is considered as anomalous. Any deviation from known pattern shows the anomalous behavior. The structural properties have been implemented by most of the researchers working in social network domain to define a number of new approaches for detecting anomalies in online social networks. For example, [186] studies the structural characteristics of the networks to predict various behaviors of individuals in link mining. A normal trend depicts that consumers, whose friends spend a lot, spend a lot themselves either. The concept of link analysis is appropriate for both heterogeneous and homogeneous networks, however, in the concerned work, the graphical structure of heterogeneous networks with different types of nodes or edges is given more concentrated [186] covered eight link mining tasks with their respective algorithms and grouped the defined tasks under three categories, namely object-related, link-related and graph-related. Most of the structure based link prediction methods show poor performance because of the involvement of prediction of future relationships likely to occur [187]. A number of complicated tasks such as Anomalous Link Discovery (ALD) was implemented which consists of only the prediction of anomalous relationships rather than all the consisted relationships. The results revealed that almost any prediction model has done quite well for ALD.

In social networks, link prediction is highly profitable for identifying friendship links among various users as such techniques are a good way to test connected, missing and corrupted links. Hence, they easily try to analyze the dynamics and prediction of future link behaviors.

3.6.3 Spectral anomaly detection techniques

Spectral anomaly detection techniques provide anomaly detection using some spectral characteristics in the spectral space of a graph. Various complex measures used which are applicable to the adjacency matrix such as Eigenvalues or Eigenvectors [188] or the different hyper graph algorithms used for Laplacian graphs [189] are focused on these methods. A social network graph is partitioned into different groups or communities in most of the approaches. Partitioning is done either by deleting the links among different nodes or implementing certain clustering-classification algorithms and measures. Some of the advanced techniques obtain the structural concept of centrality. For example, community structures were designed by Girvan and Newman in [190]. As shown in Fig. 9, communities in the form of various friendship groups were generated in which the strength of links among the nodes in a community or friendship group is dense whereas between various groups is sparse.

Figure 9.

Friendship links depicting centrality.

The concept of between’s mainly shaped by Freeman [191] is modified to work for edges instead of vertices to detect the number of shortest paths among a set of vertices that go through the edge under consideration. The consequence proposed is that the edges with high value of between’s centrality determine the points where a network is expected to break and therefore are separated. Generally, in online social networks high between’s centrality is seemed to be at the junction of densely connected network groups. It results in a number of highlighted groups could be specified by eliminating the set of links from a graph.

Ying et al. [188] detected the malicious nodes by computing the spectral coordinates or the spectra i.e. the Eigenvalues or Eigenvectors for the normal and anomalous users with an exclusive reference to RLA’s. RLA’s was emphasized because the prior knowledge regarding which node is the attacker and which one is the victim node, were absent. If fake links or nodes were present, it affects the value of the graph spectra. Spectral coordinates of a victim node are implemented to analyze the interdependency among the victim and the attacker nodes, by computing the spectral coordinates for attacking nodes. The results revealed that malicious users govern the attack set and each attacking node is linked to a number of victim nodes as shown in Fig. 10.

3.7 Dynamic network anomaly detection

Network collection of objects and the relationships is a robust way to depict connections between them. Some examples are global financial systems, which connect banks across the world, electric power grids connecting geographically distributed areas, and social networks that connect users, businesses, or customers using relationships such as friendship, collaboration, or transactional interactions.

Figure 10.

Describing relationship between attacking and victim nodes.

Possible changes consist of insertion and deletion of vertices (objects), insertion and deletion of edges (relationships between objects), and modification of attributes (e.g., vertex or edge labels). One major issue over dynamic networks is anomaly detection – finding objects, relationships, or points in time that are not the same as others. There are many high-impact and practical applications of anomaly detection such as detection of ecological disturbances, like wildfires and cyclones; intrusion detection for individual systems and network systems; identifying abnormal users and events in communication networks, and detecting civil unrest using twitter feeds [107].

Methods of anomaly detection in dynamic networks can be categorized as follows (as shown in Fig. 11).

Figure 11.

Overarching approach classification and types of anomalies they detect.

3.7.1 Community detection

Community-based methods track the evolvement of communities and their related vertices in the graphs over time. The various approaches vary in two main aspects: (1) for the community structure they analyze and (2) the definitions of communities they use.

The vertex detection logic in community detection can be used for detecting anomalies. A Group of vertices that belongs to the same community is expected to behave the same. It means that if at sequential time steps, one vertex in the community has a remarkable number of new edges added, the other vertices in the community would also have a considerable number of new edges. If the rest of the vertices in the community did not have new edges added, the vertex that did is anomalous.

In sub-graph detection logic Instead of looking at individual vertices and their community belongingness, by observing the behavior of communities over time, entire sub-graphs that behave abnormally can be detected.

Using change detection logic in community detection Changes are detected by partitioning the streaming graphs into coherent parts based on the similarity of their partitioning (communities). The starting of each segment shows a detected change.

3.7.2 Compression

In this section, we are going to discuss the methods that are all based on the MDL principle. The MDL principle and compression techniques based on this maxim extract patterns and regularity in the data to achieve a compact graph representation. Applying this maxim to graphs is done by using the adjacency matrix of a graph as a single binary string, flattened in row or column main order. The data-specific features are all derived from the encoding cost of the graph or its specific substructures; therefore, anomalies are then defined as graphs or substructures that prevent compressibility.

When the edge is included, an edge in compression methods by trying edge detection logic in compression methods is considered anomalous if the compression of a sub-graph has higher encoding cost than when it is omitted.

The main idea is that sequential time steps that are very similar can be grouped together leading to low compression cost. Increase in the compression cost means that the new time step differs considerably from the previous ones, and thus implicates a change.

3.7.3 Matrix/tensor decomposition

These techniques try to show the set of graphs as a tensor, it also thought of as a multidimensional array, and perform tensor factorization or dimensionality reduction. For modeling a dynamic graph as a tensor, the easiest method is to build a dimension for each graph aspect of interest, e.g., a dimension for destination vertices, source vertices, and time.

One of the most popular methods for matrices is Singular Value Decomposition (SVD [192], and for higher order tensors ( $\geqslant$ 3 modes) is PARAFAC, [193] a generalization of SVD. The major differences between the decomposition-based methods are whether they use a matrix or a higher order tensor, the method of decomposition, and how the tensor is constructed (what information is stored).

Here, by using vertex detection logic Matrix decomposition is used to obtain activity vectors per vertex. A vertex is characterized as anomalous if its activity changes significantly between consecutive time steps.

In event detection logic, there are two major approaches: (1) Tensor decomposition estimates the original data in a reduced dimensionality, and the indicator of how well the original data is estimated is reconstruction error. (2) Singular values and vectors, as well as Eigenvalues and Eigenvectors, are traced over time in order to find notable changes that exhibit anomalous vertices.

Change detection logic in compression is as: the activity vector of a graph, $u(t)$ , is the primary component, the left singular vector corresponding to the largest singular value obtained by carrying out SVD on the weighted adjacency matrix. A change point is when an activity vector is significantly different from the ‘normal activity’ vector, which is originated from previous activity vectors.

3.7.4 Distance measures

The metrics measured in graphs such as the number of vertices are typically structural features. The difference or similarity, which is inversely related, can be calculated once the summary metrics have been found for each graph. The metrics were chosen to extract and compare, and the methods they use to determine the anomalous values and corresponding graphs make different algorithms. Using the idea of distance as a metric to measure change is natural and widely used in [194, 195, 196].

By using edge detection if the evolvements of some edge attribute (e.g., edge weight) differ from the ‘normal’ behavior, then the corresponding edge is considered as anomalous.

A sub-graph with many ‘anomalous’ edges is assumed as anomalous.

In event detection, assumed a function $f(G_{i},G_{j})$ that measures the distance between two graphs, a time series of distance values can be constructed by applying the function on sequential time steps in the series. By using a number of various heuristics, such as choosing the top k or using a moving average threshold, anomalous values can then be extracted from this time series.

3.7.5 Probabilistic models

With a basis in distributions, probability theory, and scan statistics, these methods typically construct a model of what is considered ‘normal’ or expected, and any deviations from this model named as anomalous.

For vertex detection, there are two main approaches: (1) creating scan statistics time series and detecting points that are several standard deviations from the mean, (2) vertex classification [29].

For edge detection, communications (edges) are modeled using a counting process, and edges that deviate from the model by a statistically significant amount are flagged [29].

For sub-graph detection, fixed sub-graphs (e.g., paths and stars), multi-graphs, and cumulative graphs are used to construct models onthe expected behaviors. Deviations from the modelssignify an anomalous sub-graph [29].

And for event detection deviations from the models of the graph likelihood or the distribution of the Eigenvalues reveal when an event occurs [29, 197, 198, 199, 31, 9, 200, 201, 202, 203, 204, 205, 206, 207, 30, 208, 209, 210, 211, 212, 33].

According to the types of anomalies they find, a fair qualitative comparison can be made. The methods are again partitioned below:

•
Vertices methods: The online method proposed by [15] is the earliest design with available software, which models the frequency of the connections among vertices as a process of counting and uses Bayesian learning and predictive $p$ -values to detect anomalies [15, 213, 214]. In addition to operating on graph streams, it uses the advantages of the sparsity of the network by only examining edges that become visible in the graphs. A key advantage of this technique is that it does both sequential analysis, by using the history of the graph stream, new graphs are analyzed, in addition to retrospective analysis, where the history is updated based on the new graphs that arrive. The usage of this could be in the initial stages of the analysis when very little history is available. When the algorithm ends, interpretation of the output would be simple, labeling vertices/edges/sub-graphs explicitly as anomalous or not. The flexibility of the final part of the algorithm, when an analysis is done on only the identified portions of the graph that are anomalous and their neighbors, opens many possibilities – anomalous community detection, identifying the most important or influent anomalous vertex, and much more.
•
Edges methods: The proposed method in [15] is the only Bayesian learning technique that identifies anomalous edges explicitly. However, the tensor methods proposed in ParCube [215], can find the reconstruction error at arbitrary granularities, even on a per cell basis. Hence, they are capable of identifying anomalous edges; however, for each edge, the reconstruction error must then be held, or the user must provide individual edges of interest to test.
•
Sub-graph methods: For finding optimal anomalous sub-graphs, NetSpot [216] is presented. Its alternating optimization approach limits the method to work on an entire graph series simultaneously. A set of highly anomalous sub-graphs and their corresponding time windows is the result of the algorithm, solving the issue of attribution and requiring no further analysis on the part of the user. While the authors assign outlier scores to the edges in the network in their own way, it would be fascinating to utilize an algorithm exclusively designed to assign outlier scores to edges in a network [14, 15, 217]. Preprocessing the networks using an auxiliary edge scoring method helps a mapping from the ones that do not fit the requirements of being weighted, plain, and undirected to networks that do.
•
Events/changes methods: DeltaCon [218] provides a graph similarity scoring function based on a number of desirable principles [215]. It has been shown to effectively distinguish between, and correctly mark as more important, the removal of ‘bridging’ edges and edges with higher weights, compared to edges that do not affect the overall structure if removed or have a low weight. Anomalies are found by finding the similarity between two adjacent graphs in the stream, flagging those, which are sufficiently different from their immediate neighbors as outliers. While not as robust as methods such as [15] that perform both sequential and retrospective analysis, there is a commensurate decrease in the runtime, scaling linearly with the number of edges in the networks.

There are many publicly accessible data sets for testing the available methods or for personal use as shown in Table 7.

Table 7
Data sets used by researchers

Data set Network type Description

Ego-faceBook [219] Social networks Facebook data has been anonymized by replacing the facebook-internal ids for each user with a new value. in addition, while feature vectors from this dataset have been provided, the interpretation of those features has been obscured.

Facebook-social circles [219] Web graphs Nodes represent pages from berkely.edu and stanford.edu domains and directed edges represent hyperlinks between them. the data was collected in 2002.

Oregon-2 [220] Peering Graphs represent AS peering information inferred from oregon route-views, looking glass data, and routing registry, all combined.

RoadNet-CA [221] Vehicle traffic Provides up to 10 years of real historical data on the road networks in california.

Wikipedia [218, 219] Web links Provides an expansive wikipedia data set that has information such as page links, page views, and page revisions.

Email-eu-core [222] Communication Who-emails-whom network between about 150 former employees, mostly upper management.

Amazon movie reviews [223] Social Movie reviews from Amazon, the data span a period of more than 10 years, including all $\sim$ 8 million reviews up to October 2012.

DBLP [222] Co-author Computer science co-author network.

US patent citation [223] Citation A list of almost 3 million U.S. patents granted between January 1963 and December 1999, and all citations made to these patents between 1975 and 1999.

Figure 12.
Some of the application of WSNs.

3.8 Wireless sensor networks

Data set	Network type	Description
Ego-faceBook [219]	Social networks	Facebook data has been anonymized by replacing the facebook-internal ids for each user with a new value. in addition, while feature vectors from this dataset have been provided, the interpretation of those features has been obscured.
Facebook-social circles [219]	Web graphs	Nodes represent pages from berkely.edu and stanford.edu domains and directed edges represent hyperlinks between them. the data was collected in 2002.
Oregon-2 [220]	Peering	Graphs represent AS peering information inferred from oregon route-views, looking glass data, and routing registry, all combined.
RoadNet-CA [221]	Vehicle traffic	Provides up to 10 years of real historical data on the road networks in california.
Wikipedia [218, 219]	Web links	Provides an expansive wikipedia data set that has information such as page links, page views, and page revisions.
Email-eu-core [222]	Communication	Who-emails-whom network between about 150 former employees, mostly upper management.
Amazon movie reviews [223]	Social	Movie reviews from Amazon, the data span a period of more than 10 years, including all $\sim$ 8 million reviews up to October 2012.
DBLP [222]	Co-author	Computer science co-author network.
US patent citation [223]	Citation	A list of almost 3 million U.S. patents granted between January 1963 and December 1999, and all citations made to these patents between 1975 and 1999.

Wireless sensor networks (WSNs) are networks of tiny, low cost, low energy, and multifunctional sensors which are densely deployed to monitor a phenomenon, track an object, or control a process [224]. WSNs are proposed in many application domains, which consists of personal, business, industry and military applications (as shown in Fig. 12). In a personal application like home automation, for business applications sales tracking, industrial applications such as architectural and control, finally in military applications such as enemy target monitoring and tracking [224, 225, 226]. One of the most important factors in the IOT (Internet of Things) paradigm mentioned in [227] is the WSNs because they act as a digital skin that provide a medium to access information about the physical world by any computational system. Different technologies are proposed to build the integration of WSNs with IOT such as the 6LowPAN standard defined by IETF [228] that let the transmission of IPv6 packets in computational limited networks. Sensor data analysis is of high importance to decision makers. It was reported by [229] that the reason of using a WSN is not only to collect data from the field of deployment, but also more significantly the analysis of this data at timely manner that makes them some major decisions. Hence, the data quality is the main concern since it reflects the true world state of WSN applications. Unfortunately, the raw measurements gathered by sensor nodes, especially from large scale WSNs, often have inaccuracy and incompleteness [54]. These inaccurate sensor measurements may be generated due to reasons related to sensor device itself or the sensing environment. Resource limitations of sensor devices in terms of storage, energy, processing, and bandwidth may lead to node failures and therefore, reporting of anomalous readings. Other reasons that are related to the environment consists the anomalies and the difficulties of the deployment area may also result in erroneous data [230, 231]. In addition, malicious attacks such as denial of service, sinkhole, black hole, selective forwarding, and wormhole attacks [226, 232, 233] may also contribute to making such inaccurate and low quality data. Moreover, physical interruptions such as destruction or movements of sensor devices caused by humans or animals may influence the data collection process and lead to anomalous measurement [224].

The anomaly detection techniques for traditional networks focus on the network layer itself, while data on the application layer of WSN are more important to be concerned [234, 235].

Anomaly detection solutions in WSNs are caused by their detection effectiveness and their efficiency in employing the limited network resources [229]. Detection effectiveness is shown by detection accuracy, detection rate, and false alarms. Detection efficiency is shown by energy consumption and memory usage.

Five major requirements are important to design effective and efficient anomaly detection models, which consist of dimension reduction, online detection, distributed detection, adaptive detection and data correlation exploitation.

Online detection certifies that real-time anomalies are not missed while distributed detection certifies that the limited resources are efficiently employed by distributing the computational load over the network. Data reduction that tries to decrease the dimension of data to improve the efficiency exploits the feature correlation in a distributed structure. Adaptive detection is crucial for real time detection in dynamic environments where the changes in data distribution affect the detection efficiency. Correlation in sensor data of close neighborhoods was discovered to improve the detection efficiency by the mean of distributed detection in close neighborhoods [236, 237].

Anomaly detection in wireless sensor networks: In WSNs, anomalies can be defined as those important deviations in the sensing data measurements from the normal sensed data profile [5]. These anomalies happen as result of several reasons and between them are errors in the measurements made by faulty sensor nodes, some noise generated by external factors, actual events because of the changes in the sensed environment, or malicious attacks launched by in danger sensor nodes. The following depicts challenges and requirements of anomaly detection in WSNs.

Classification of anomaly detection in WSNs: we classify the existing anomaly detection models based on the detection method used to design the model into statistical-based, nearest neighbor based, clustering-based, classification-based, and others (as shown in Fig. 13) [238, 239, 240].

Figure 13.

Classification of methods used in WSN anomaly detection.

3.8.1 Statistical-based anomaly detection models for WSNs

The statistical-based anomaly detection models are the earliest models designed for anomaly detection and mostly used for one dimensional data sets [194]. If the probability of a pattern with respect to the statistical model is low, it is considered as an anomaly. According to [229] WSNs and classified into parametric and non-parametric techniques. In the parametric techniques, it is assumed that the data is created from a known distribution and then the parameters of the distribution are easily approximated from this data. In the non-parametric techniques, the main data distribution is not known a priori.

Palpanas et al. [241] designed a distributed deviation detection model in WSN to eliminate the unnecessary communication overhead and computational cost. This model named kernel density, which is based on a non-parametric statistical technique. This model was not evaluated experimentally and only described theoretically to magnitude the tradeoff among detection effectiveness and efficiency.

The work presented in [242] is an upgrade of the kernel density estimator based model designed in [241] by adapting a cluster based structure for WSN.

According to [243], there is five online and distributed statistical-based outlier detection methods for WSNs called Temporal and Spatial real-data-based Outlier Detection (TSOD), Temporal Outlier Detection (TOD), spatial Predicted-data-based Outlier Detection (POD), Spatial Outlier Detection (SOD), and Spatial and Temporal Integrated Outlier Detection (STIOD). It was surveyed that the TOD lowered the communication overhead however made the low accurate detection. On the other hand, SOD reached high accuracy rate, however with high communication overhead. POD and STIOD have lowered the communication overhead, however still with low accuracy rate. TSOD is the best option among those five techniques as it brings better detection accuracy of outliers locally at each node.

Nearest-Neighbor-Based Anomaly Detection Models for WSNs: They have been proposed for anomaly detection in computer networks with the assumption that normal patterns of data are always discovered in a dense neighborhood while the anomalous ones are far from their neighbors [5].

In [244], an in-network outlier detection model was designed based on the calculation of the distance similarity among data instances to discover the global anomalies in WSN. In this model, each node implements the distance similarity to discover the anomalies and spread its result to its direct and next hop neighbors. Other nodes do this the same until all nodes comply on a common decision about anomalous measurements. However, the diffusion communication leads to high-energy consumption exclusively when dealing with large scale WSN.

In [245], an in-network outlier cleaning and removal method for WSNs was implemented. The wavelet-based method was proposed for outlier correction and the neighboring Dynamic Time Warping (DTW) distance based similarity method was proposed for outlier detection and removal. The wavelet approximation method was obtained to correct the short and occasionally appeared outliers. Authors asserted that since the short outliers are of high frequency, they could be detected by trying the first few wavelets that show the sensing series. Moreover, the application of wavelets enhanced reduction of the dimension of data and therefore decreased the communication cost in the network.

In [245], the model that they designed used the PCA for reducing the dimension of data variables and then calculates the distance metric. It was claimed that the dimensionality could be reduced to one in any situation when validating with a static IBRL data set. However, the unsupervised distance-based model needs to be validated for dynamic data sets such as environmental ones to certify the dimension reduction claims.

3.8.2 Clustering-based anomaly detection models for WSNs

Clustering models are amog the most important data mining models, which are proposed to group similar patterns with the same characteristics into clusters. A cluster is anomalous if it is either smaller than or distant from other clusters in the data set [229, 246, 247].

In [248], they surveyed that a distributed anomaly detection model based on clustering and k-NN technique was designed. Each sensor node gathers the data and does the clustering locally instead of transmitting the whole data to the base station or cluster head.

The model was evaluated against the baseline centralized model. It was reported that distributed model presents usually similar accuracy but with high communication overhead reduction in comparison to the centralized model.

It was shown that the distributed approach accuracy is similar to the centralized approach, while reducing the communication overhead notably.

In [249], they studied an Elliptical Summaries Anomaly Detection system (ESAD). Data metrics are gathered at individual sensors and converted to elliptical summaries using the same method of [250]. To detect anomalies, the single linkage clustering algorithm was applied to extracts clusters from heterogeneity data that groups normal and anomalous metrics in different clusters.

3.8.3 Classification-based anomaly detection models for WSNs

In [251], a Quarter Sphere Support Vector Machines (QSSVM)-based distributed anomaly detection model was designed to detect the anomalous experiments in the data. The model structure is the same as the structure of the model designed by similar authors in [248].

An adaptive and online one-class SVM based anomaly detection model was implemented in [252] based on the model designed in [253]. The same as models in [253, 251], the one-class QSSVM method was also applied for this model. The difference here is the online training of the SVM in which the normal reference model revealed by the radius $R_{j}$ in each node is sequentially updated in many ways. Three different mechanisms of updating the normal reference model were checked and surveyed, called Instant Outlier Detection (IOD), Fixed-Size Time Window-Based Outlier Detection (FTWOD), and the Adaptive Outlier Detection (AOD). AOD was implemented to overcome the drawbacks of IOD and FTWOD therefore the update process is performed when there is an influence on the previous normal reference model by a measurement. Therefore, this mechanism relies on the previous decision results. The application of SVM needs setting up some necessary parameters of which their influence on the detection was totally ignored. The same situation goes for the type of kernel function, which may differ, from one WSN usage to another.

3.8.4 Advantages of using anomaly detection in WSN

•
Providing data reliability and quality is one of the most important motivations for anomaly detection in WSN. Since sensor data can be corrupted and damaged because of many reasons like reading errors, faulty sensors, or malicious attacks. Event reporting is another reason for implementing anomaly detection since many WSNs have been used lately for monitoring various kinds of phenomena, like weather changes and fire detection proposed in [254, 255]. The application of anomaly detection for event detection uses in detecting such a disaster or major problem in its early stage and uses in making decisions accordingly [229, 256, 257].
•
Sensing data are gathered in the shape of data streams which may be large volumes of real observations gathered from the environment [258]. Some WSNs are designed just for gathering one type of data such as temperature, light, humidity. This kind of data is called unvaried data. Multivariate data are recent WSNs, which is designed to collect multiple types of data from the field simultaneously. Usually more than one sensor to gather different types of data at the same time in each node. In the multivariate data, each type of data is called an attribute or feature. If one or more of its attributes are anomalous, the measurement considers it anomalous. With unvaried data, the anomaly detection can be easily obtained by observing that the single data attribute is anomalous in comparison with the attributes of other data instances. anomaly detection in multivariate WSNs is challenging because the individual attributes may not exhibit anomalous behavior but when they collected, may display anomalous behavior [211].

Table 8
Limitations of WSN techniques

Methods Constraints

Statistical based techniques
•
Depending on the underlying data, distribution such as parametric techniques-based are not beneficial because in most WSN real life applications, there is no major data distribution knowledge.
•
Histograms do not depend on the main data distribution but they are only efficient for unvaried data and cannot detect the interactions among attributes in multivariate data.
•
Choosing the threshold is application dependent and it is a complicated task, especially for the continuous dynamically changed environment.

Clustering based techniques
•
Dependency on the selection of cluster width in some clustering methods makes them not applicable for WSN usage.
•
Clustering is very computationally expensive with multivariate data because the computation of the distance metrics between all data patterns has high computational cost that makes them inapplicable for limited resource devices such as sensors.
•
Clustering techniques cannot deal with continuous changes of data streams over time so the normal reference model will be out of date by the time they are implemented. Although, recent clustering-based models [264, 252] have coped this issue via incremental learning methods, the computational cost for such methods is too expensive to be affordable by limited resource devices.

Classification-based techniques
•
Classification techniques like SVM are computationally expensive and therefore quickly use the sensors’ energy. These techniques are inefficient for modeling online detection models, which are desirable for some WSN usages.
•
Some of the models have inefficiently coped with highly dynamic data streaming. Although some adaptive classification-based models like [265] were proposed, these models incurred unhallowed computational cost that use up the limited energy of sensors quickly.

3.8.5 Challenges using anomaly detection in WSNs

Methods	Constraints
Statistical based techniques	• Depending on the underlying data, distribution such as parametric techniques-based are not beneficial because in most WSN real life applications, there is no major data distribution knowledge. • Histograms do not depend on the main data distribution but they are only efficient for unvaried data and cannot detect the interactions among attributes in multivariate data. • Choosing the threshold is application dependent and it is a complicated task, especially for the continuous dynamically changed environment.
Clustering based techniques	• Dependency on the selection of cluster width in some clustering methods makes them not applicable for WSN usage. • Clustering is very computationally expensive with multivariate data because the computation of the distance metrics between all data patterns has high computational cost that makes them inapplicable for limited resource devices such as sensors. • Clustering techniques cannot deal with continuous changes of data streams over time so the normal reference model will be out of date by the time they are implemented. Although, recent clustering-based models [264, 252] have coped this issue via incremental learning methods, the computational cost for such methods is too expensive to be affordable by limited resource devices.
Classification-based techniques	• Classification techniques like SVM are computationally expensive and therefore quickly use the sensors’ energy. These techniques are inefficient for modeling online detection models, which are desirable for some WSN usages. • Some of the models have inefficiently coped with highly dynamic data streaming. Although some adaptive classification-based models like [265] were proposed, these models incurred unhallowed computational cost that use up the limited energy of sensors quickly.

The significant challenge of anomaly detection in WSN is the way of achieving high detection efficiency with minimum energy cost.

Other challenges that should be considered during the implementation of suitable anomaly detectionsolution for WSNs are as follows [259]:

•
Computational and storage resource limitation: WSNs are made up of cheap sensors, which are really resource limited in terms of memory, and processing. The process of anomaly detection in WSN needs the using of the computational and storage resources for processing data in real time.
•
Communication overhead: Some traditional anomaly detection solutions are designed based on the centralized approach in which the data is gathered from sensors and delivered completely to be processed by the cluster head or the base station. But, the cost of data transmission is several orders of magnitude higher than the cost of data processing [224].
•
Dynamic network topology change: The stimulus of nodes in some WSN applications and the communication failures increase the network topology change. This change negatively influences the reliability of the normal reference model proposed by the anomaly detector.
•
Network heterogeneity: Sometimes, the application of WSN needs to use various types of nodes or devote different jobs to different nodes. In addition, the current sensor nodes might be equipped with many sensors for measuring various environmental phenomena simultaneously.
•
Dynamic streaming data: Another significant issue is the dynamic streaming nature of sensing data. There is no former knowledge available to design the normal distribution of sensing data.
•
Network scalability: Some WSNs applications developed over the time therefore some nodes may be added to the network. As a result, the old normal reference model, which was built for the network, needs to be updated. The high false alarm rate resulting from the development put a challenge to anomaly detection. In addition, the large amount of data produced due to network size development is also a problem for real time detection [229].
•
High dimension data: As network size may increase, the dimensionality of the gathered data may also increase. The increase of data dimensions results in a higher computational cost that use the energy and memory of sensors. As the anomaly detection process relies on the data metrics, the increase of data dimension becomes a challenge for efficiency aspect of anomaly detection.
•
Attacks on various network layers: There are different types of attack on each layer of WSN. We will discuss them briefly as follows:
•
Denial of service: Attack limited memory and less computational capacity of wireless sensor network can cause them vulnerable to denial of service attack [260].
•
Misdirection attack: In this type of attack, the information is lead to fake path. It changes the routing information of network and influence the connection negatively. Misdirection is a network layer attack. Authentication techniques among transmitter and receiver, multi hop routing, etc. can be used to detect misdirection attack [16].
•
Selective forwarding: Optional forwarding is a network layer attack. In this type of attack, a fake node acts like an actual node and lead the packets to a wrong path but optionally drops some of the packets so that it becomes hard to detect the intrusion. verification based routing, multi data flow and detection based on neighboring information can be proposed to identify this type of intrusion [260].
•
Sink hole attack: Sink hole attack occurs in a data link layer. This type of attack happens when an intruder comes with an agreement with a sensor node or presents a fake node in the sensor network. When a forged node absorbed the network traffic, an attack is implemented. Once the attack is successful, the fake node can do different malfunctions such as dropping all or selective packets and alteration of data [260].
•
Sybil attack: This happen when a malicious sensor node uses multiple identities to do an attack. In wireless sensor network, all the sensor nodes work well but this type of attack targets this cooperation and disturbs the routing and communication process [261].
•
Wormhole attack: Another data link layer attack is wormhole attack. In this type of intrusion, a malicious or forged node registers all the information and leads it to wrong way. This attack can happen without the knowledge of cryptography of actual wireless sensor node [262, 263].
•
Hello flood attack: In wireless sensor networks, routing protocols use Hello packets for detection of neighbors. Inthis type of attack, fake packets are used to cover hello packets and to attract the sensor nodes [263]. Attackers with ample radio resources and processing capabilities can create this type of attack. The victim node will detect false hello packet as normal node.

Note that limitations of WSN techniques are stated in Table 8.
3.9 Video anomaly detection

One of the most basic aspects of video anomaly detection is video feature representation. Several research works have been done in discovering the right representation to do anomaly detection in video streams accurately with an acceptable false alarm rate. But, due to large variations in environment and human movement this is very challenging [266].

To reach the aim of automatically detecting anomalous events, some dynamics of events have to be obtained in order to detect the presence and the spatial location of an anomaly present in the scene [266].

3.9.1 Challenges in modeling a good representation

According to high dimensionality nature of the video, they cannot feed into a classifier directly: they have much information redundancy hence cause high computational complexity.

The key to any successful application is choosing the right feature. But, it is very challenging due to the following reasons [266]:

•
Action pattern variations within the same class: The class can be a set of the action (e.g., walking, clapping), or a set of the type of event (e.g., normal, abnormal). There exists a high variety of data within one class, due to the changes in style and appearance. The delegation should be general to cover the variations between human movements, human-human and human-object interactions.
•
Environmental variations and noise: The real-world scenes consist of a lot of noise and may differ a result of illumination changes and background dynamics. The features must be capable of covering the environmental variations for the method to perform under noisy environment.

Conventional features used for anomaly detection: There is plenty of features for using anomaly detection. We will discuss some of which are used most in researches.
•
Optical flow-based descriptors: [267] proposed a region-based descriptor called “Motion Context” to describe both movement and appearance information of the spatio-temporal segment. The report used Edge Orientation Histogram (EOH) as appearance descriptor and Multi-Layer Histogram of Optical Flow (MHOF) as motion descriptor. Then for each query spatio-temporal segment, it looks for its best match in the training data set, and specifies the normality using a dynamic threshold. This approach is more efficient in comparison with their previous work using sparse approach.

The same category of descriptors is spatio-temporal video volume descriptors (HOG3D) proposed in [268]: these volumes are specified by the histogram of the spatio-temporal gradient in polar coordinates [269]. In [270], 3D gradient features of each spatio-temporal cube are exploited from the video sequence and trained to gain sparse combinations with allowable reconstruction errors.
•
Trajectory based sparse reconstruction: [271] utilized a route based joint sparse reconstruction framework for video anomaly detection, which depends on good tracking to exploit trajectories. Inspired by [272, 267], the authors use Multi-scale HOF (MHOF) as the feature descriptor to implement the basis for sparse representation. The fundamental underlying assumption of these methods is that any new feature representation of a normal/anomalous event can be approximately modeled as a (sparse) linear combination of the feature representations of previously observed events in a training dictionary [273].

4. Online anomaly detection

Online anomaly detection has the benefit that it can let experts do corrective actions as soon as the anomaly has happened in the sequence data. In this approach, anomalies are detected using PCA with oversampling strategy. PCA is a way of identifying patterns in data and expressing data in such a way as to focus their likenesses and variances. PCA is an unsupervised dimension reduction method. PCA is a statistical method that uses an orthogonal transformation to change over a set of observations of probably correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables [274].

Table 9
Platforms used in online anomaly detection

Platform/tool	Description
The hadoop distributed file system (HDFS)	HDFS enables the underlying storage for the hadoop cluster. it divides the data into smaller parts and distributes it across the various servers/nodes.
MapReduce	MapReduce provides the interface for the distribution of sub-tasks and the gathering of outputs. when tasks are executed, mapReduce tracks the processing of each server/node.
PIG and PIG latin (Pig and pigLatin)	Pig programming language is configured to assimilate all types of data (structured/unstructured, etc.). it is comprised of two key modules: The language itself, called pigLatin, and the runtime version in which the pigLatin code is executed.
Hive	Hive is a runtime Hadoop support architecture that leverages Structure Query Language (SQL) with the Hadoop platform. It permits SQL programmers to develop Hive Query Language (HQL) statements akin to typical SQL statements.
Jaql	Jaql is a functional, declarative query language designed to process large data sets. To facilitate parallel processing, Jaql converts “‘high-level’ queries into ‘low-level’ queries” consisting of MapReduce tasks.
Zookeeper	Zookeeper allows a centralized infrastructure with various services, providing synchronization across a cluster of servers. big data analytics applications utilize these services to coordinate parallel processing across big clusters.
HBase	HBase is a column-oriented database management system that sits on top of HDFS. it uses a non-SQL approach.
Cassandra	Cassandra is also a distributed database system. It is designated as a top-level project modeled to handle big data distributed across many utility servers. It also provides reliable service with no particular point of failure http://en.wikipedia.org/wiki/Apache_Cassandra) and it is a NoSQL system.
Oozie	Oozie, an open source project, streamlines the workflow and coordination among the tasks.
Lucene	The Lucene project is used widely for text analytics/searches and has been incorporated into several open source projects. Its scope includes full text indexing and library search for use within a Java application.
Avro	Avro facilitates data serialization services. Versioning and version control are additional useful features.
Mahout	Mahout is yet another Apache project whose goal is to generate free applications of distributed and scalable machine learning algorithms that support big data analytics on the Hadoop platform.

Streaming analytics: Data streams can be viewed as a continuous stream of events happening quickly. Processing such high-speed data as and when it flows into the system before entering the database is known as streaming analytics. Streams process data as it flows into the application, the vast amount of data flows at greater speed over the network today [275].

Applications of streaming analytics are found across industries namely, personalized e-commerce marketing, notifications by banks, actuators embedded in physical objects, real-time fraud detection, data and identity protection services, analysis of data generated by sensors [276] proposed an online anomaly detection algorithm based on Kolmogorov-Smirnov goodness of fit test to detect anomalous access requests in the cloud environment at runtime. Recognizing individual data points as anomalous can bring about false alerts. Hence, they proposed a way to deal with detecting anomalous user requests, which show as changes in the system behavior conflicting with what is normal. The proposed online anomaly detection algorithm shows the statistical process of anomaly detection. An anomalous window conveys more information and represents a pattern or distribution of an unusual characteristic of data. After receiving a window, the framework makes its cumulative distribution function. It cumulates the most recent $N$ set of distributions (past $N$ models). The model is considered as an anomaly if it does not match any past $N$ models. Detects anomalous access requests to unauthorized shared resources at runtime [276].

One of the biggest challenges for fraud detection systems is the tremendous growing amount of transactions. Current fraud detection systems need to be more effective and scalable in order to handle such large amount of incoming data. Hence, using Big Data technology is the best solution for this problem. Many Big Data platforms are released to store and process data in recent years. The MapReduce framework was proposed in 2004 [277]. Apache Hadoop is presented as the most popular open-source implementation of MapReduce and DFS for large-scale data processing and storing. However, Hadoop has a poor performance on iterative and online computing. Apache Spark allows users to persist the data in memory and is the most popular batch-processing platform for iterative computing. Storm is the most widely used real-time streaming processing system. Storm’s applications are submitted as topologies. These topologies usually contain two components, which are called spout and bolt. Spout is the source of streams in topology. It reads tuples from an external source and sends them into the topology. Bolt processes the data once a tuple. HBase4 is an open source distributed key-value store developed on top of the distributed storage system HDFS [278].

[278] proposed a hybrid framework with Big Data technologies to solve performance challenges faced by online credit card fraud detection systems. As a real time system, not only the need to consider the performance issues during data storing, model training, data sharing and fraud detection, but also take care of the integration problems of them since any slow component could become a bottleneck of the whole system should be considered [278]. The framework that was proposed in [278] aims at fusing different detection algorithms to improve accuracy and using a four-layer design to handle data storage, model training, data sharing and online detection. We implement the framework with latest Big Data technologies, which help to build a scalable, fault-tolerant and high performance system.

Note that platforms used in online anomaly detection are stated in Table 9.

5. Conclusion and future works

In this work, we have discussed different ways in which the problem of anomaly detection has been formulated in literature, and have attempted to provide an overview of the huge literature on various techniques. For each category of anomaly detection techniques, we have identified a unique assumption regarding the notion of normal and anomalous data. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. Ideally, a comprehensive survey on anomaly detection should allow a reader to not only understand the motivation behind using a particular anomaly detection technique, but also provide a comparative analysis of various techniques. However, the current research has been done in an unstructured fashion, without relying on a unified notion of anomalies, which makes the job of providing a theoretical understanding of the anomaly detection problem very difficult.

There are several promising directions for further research in anomaly detection. For example, in the field of anomaly detection in dynamic networks, it is relatively young and is rapidly growing in popularity, as indicated by the number of papers published in the past five years. Another example could be handling the problem of highly imbalanced data sets especially in credit card fraud detection.

Many techniques discussed in this survey require the entire test data before detecting anomalies. Recently, techniques have been proposed that can operate in an online fashion. Such techniques not only assign an anomaly score to a test instance as it arrives, but also incrementally update the model. Another upcoming area where anomaly detection is finding more and more applicability is in complex systems.

References

Hawkins

. Identification of outliers. Springer. 1980; 11.

Pal

Verma

. A survey on anomaly based malware detection and demolition in false alarm rate. 2015.

Viji

Banu

SKZ

. An improved credit card fraud detection using k-means clustering algorithm. in: International Journal of Engineering Science Invention (IJESI), One Day National Conference on “Internet of Things the Current Trend in Connected World” NCIOT. 2018; 59-64.

Tran

Huong

Heuchenne

HienTran

TMH

. Real time data-driven approaches for credit card fraud detection. in: Proceedings of the 2018 International Conference on E-Business and Applications. ACM. 2018; 6-9.

Chandola

Banerjee

Kumar

. Anomaly detection: A survey. ACM Computing Surveys (CSUR). 2009; 41(3): 15.

Gao

Fan

Turaga

Parthasarathy

Han

. A multi-graph spectral framework for mining multi-source anomalies. in: Graph Embedding for Pattern Analysis. Springer. 2013; 205-227.

Savage

Zhang

Chou

Wang

. Anomaly detection in online social networks. Social Networks. 2014; 39: 62-70.

Chen

Hendrix

Samatova

. Community-based anomaly detection in evolutionary networks. Journal of Intelligent Information Systems. 2012; 39(1): 59-85.

Eberle

Holder

. Anomaly detection in data represented as graphs. Intelligent Data Analysis. 2007; 11(6): 663-689.

10.

Akoglu

McGlohon

Faloutsos

. Oddball: Spotting anomalies in weighted graphs. in: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer. 2010; 410-421.

11.

Akoglu

Faloutsos

. Event detection in time series of mobile communication graphs. in: Army Science Conference. 2010; 77-79.

12.

Yang

Gao

. Incremental local evolutionary outlier detection for dynamic social networks. in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer. 2013; 1-15.

13.

Han

Lee

J-G

. Temporal outlier detection in vehicle traffic data. in: Data engineering, 2009. ICDE’09. IEEE 25th International Conference on. IEEE. 2009; 1319-1322.

14.

Abello

Eliassi-Rad

Devanur

. Detecting novel discrepancies in communication networks. in: Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE. 2010; 8-17.

15.

Heard

Weston

Platanioti

Hand

. Bayesian anomaly detection methods for social networks. The Annals of Applied Statistics. 2010; 4(2): 645-662.

16.

Chhaya

Sharma

Bhagwatikar

Kumar

. Wireless sensor network based smart grid communications: Cyber attacks, intrusion detection system and topology control. Electronics. 2017; 6(1): 5.

17.

Mell

Grance

. The NIST definition of cloud computing. National Institute of Standards and Technology. 2009; 53(6): 50.

18.

Modi

Patel

Borisaniya

Patel

Rajarajan

. A survey of intrusion detection techniques in cloud. Journal of Network and Computer Applications. 2013; 36(1): 42-57.

19.

Vieira

Schulter

Westphall

. Intrusion detection techniques in grid and cloud computing environment. IT Professional, IEEE Computer Society. 2010; 12(4): 38-43.

20.

Kwon

Kim

. Self-similarity based lightweight intrusion detection method for cloud computing. in: Asian Conference on Intelligent Information and Database Systems. Springer. 2011; 353-362.

21.

Arshad

Townend

. An abstract model for integrated intrusion detection and severity analysis for clouds. Cloud Computing Advancements in Design, Implementation, and Technologies. 2012; 1.

22.

C-C

Huang

C-C

. A cooperative intrusion detection system framework for cloud computing networks. in: Parallel Processing Workshops (ICPPW), 2010 39th International Conference on. IEEE. 2010; 280-284.

23.

Ram

. Secure cloud computing based on mutual intrusion detection system. International Journal of Computer Application. 2012; 2(1): 57-67.

24.

Garfinkel

Rosenblum

. A virtual machine introspection based architecture for intrusion detection. in: Ndss. 2003; 2003: 191-206.

25.

Ahmed

Pal

Hossain

Bikas

MAN

Hasan

. NIDS: A network based approach to intrusion detection and prevention. in: Computer Science and Information Technology-Spring Conference, 2009. IACSITSC’09. International Association of. IEEE. 2009; 141-144.

26.

Leu

F-Y

Z-Y

. Detecting dos and ddos attacks by using an intrusion detection and remote prevention system. in: Information assurance and security, 2009. IAS’09. Fifth International Conference on. IEEE. 2009; 251-254.

27.

Jia

Wang

. The research and design of intelligent IPS model based on dynamic cloud firewall linkage. International Journal of Digital Content Technology and Its Applications. 2011; 5(3): 304-309.

28.

Scarfone

Mell

. Guide to intrusion detection and prevention systems (idps). NIST Special Publication. 2007; 800(2007): 94.

29.

Ranshous

Shen

Koutra

Harenberg

Faloutsos

Samatova

. Anomaly detection in dynamic networks: A survey. Wiley Interdisciplinary Reviews: Computational Statistics. 2015; 7(3): 223-247.

30.

Cheng

Tan

P-N

Potter

Klooster

. Detection and characterization of anomalies in multivariate time series. in: Proceedings of the 2009 SIAM International Conference on Data Mining, SIAM. 2009; 413-424.

31.

Chen

Hendrix

Guan

Tetteh

Choudhary

Semazzi

Samatova

. Discovery of extreme events-related communities in contrasting groups of physical system networks. Data Mining and Knowledge Discovery. 2013; 1-34.

32.

Zhang

Zhuang

Pande

Lee

. Anomalous path detection with hardware support. in: Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM. 2005; 43-54.

33.

Ding

Katenka

Barford

Kolaczyk

Crovella

. Intrusion as (anti) social communication: Characterization and detection. in: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. 2012; 886-894.

34.

Wang

Tang

Park

Priebe

. Locality statistics for anomaly detection in time series of graphs. IEEE Transactions on Signal Processing. 2014; 62(3): 703-717.

35.

Chen

Neill

. Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. 2014; 1166-1175.

36.

Sequeira

. Intrusion prevention systems: Security’s silver bullet? Business Communications Review. 2003; 33(3): 36-41.

37.

Brown

Suckow

Wang

. A survey of intrusion detection systems. Department of Computer Science, University of California, San Diego. 2002.

38.

Roschke

Cheng

Meinel

. An extensible and virtualization-compatible IDS management architecture. in: Information Assurance and Security, 2009. IAS’09. Fifth International Conference on. IEEE. 2009; 130-134.

39.

Bakshi

Dujodwala

. Securing cloud from ddos attacks using intrusion detection system in virtual machine. in: Communication Software and Networks, 2010. ICCSN’10. Second International Conference on. IEEE. 2010; 260-264.

40.

Mazzariello

Bifulco

Canonico

. Integrating a network ids into an open source cloud computing environment. in: Information Assurance and Security (IAS), 2010 Sixth International Conference on. IEEE. 2010; 265-270.

41.

Zhengbing

Jun

Shirochin

. An intelligent lightweight intrusion detection system with forensics technique. in: Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, 2007. IDAACS 2007. 4th IEEE Workshop on. IEEE. 2007; 647-651.

42.

Dastjerdi

Bakar

Tabatabaei

SGH

. Distributed intrusion detection in clouds using mobile agents. in: Advanced Engineering Computing and Applications in Sciences, 2009. ADVCOMP’09. Third International Conference on. IEEE. 2009; 175-180.

43.

Guan

Bao

. A cp intrusion detection strategy on cloud computing. in: International Symposium on Web Information Systems and Applications (WISA). 2009; 84-87.

44.

Ibrahim

. Anomaly network intrusion detection system based on distributed time-delay neural network (DTDNN). Journal of Engineering Science and Technology. 2010; 5(4): 457-471.

45.

Cannady

. Artificial neural networks for misuse detection. in: National Information Systems Security Conference. 1998; 368-381.

46.

Moradi

Zulkernine

. A neural network based system for intrusion detection and classification of attacks. in: Proceedings of the 2004 IEEE International Conference on Advances in Intelligent Systems-theory and Applications. 2004.

47.

Grediaga

Ibarra

García

Ledesma

Brotóns

. Application of neural networks in network control and information security. in: International Symposium on Neural Networks. Springer. 2006; 208-213.

48.

Han

X-L

Ren

L-Y

. Using data mining to discover signatures in network-based intrusion detection. in: Machine Learning and Cybernetics, 2002 Proceedings 2002 International Conference on. IEEE. 2002; 13-17.

49.

Varma

PRK

Kumari

Kumar

. Feature selection using relative fuzzy entropy and ant colony optimization applied to real-time intrusion detection system. Procedia Computer Science. 2016; 85: 503-510.

50.

Tillapart

Thumthawatworn

Santiprabhob

. Fuzzy intrusion detection system. AU JT. 2002; 6(2): 109-114.

51.

Ganeshkumar

Pandeeswari

. Adaptive neuro-fuzzy-based anomaly detection system in cloud. International Journal of Fuzzy Systems. 2016; 18(3): 367-378.

52.

Zhengbing

Zhitang

Junqi

. A novel network intrusion detection system (NIDS) based on signatures search of data mining. in: Proceedings of the 1st International Conference on Forensic Applications and Techniques in Telecommunications, Information, and Multimedia and Workshop, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). 2008; 45.

53.

Yang

D-Z

Shen

F-C

. A novel rule-based intrusion detection system using data mining. in: Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on. IEEE. 2010; 169-172.

54.

Han

Pei

Kamber

. Data mining: Concepts and techniques. Elsevier. 2011.

55.

Han

Kamber

Pei

. Data mining: Concepts and techniques. The Morgan Kaufmann Series in Data Management Systems. 2006.

56.

Manekar

Waghmare

. Intrusion detection system using support vector machine (SVM) and particle swarm optimization (PSO). International Journal of Advanced Computer Research. 2014; 4(3): 808.

57.

Shams

Rizaner

. A novel support vector machine based intrusion detection system for mobile ad hoc networks. Wireless Networks. 2017; 1-9.

58.

Akilarasu

Shalinie

. Wormhole-free routing and doS attack defense in wireless mesh networks. Wireless Networks. 2016; 1-10.

59.

Holland

. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. MIT Press. 1992.

60.

Dhanalakshmi

Babu

. Intrusion detection using data mining along fuzzy logic and genetic algorithms. International Journal of Computer Science and Network Security. 2008; 8(2): 27-32.

61.

Gong

Zulkernine

Abolmaesumi

. A software implementation of a genetic algorithm based approach to network intrusion detection. in: Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 2005 and First ACIS International Workshop on Self-Assembling Wireless Networks. SNPD/SAWN 2005. Sixth International Conference on. IEEE. 2005; 246-253.

62.

Traore

. Detecting new forms of network intrusion using genetic programming. Computational Intelligence. 2004; 20(3): 475-494.

63.

Hansen

Lowry

Meservy

McDonald

. Genetic programming for prevention of cyberterrorism through dynamic and evolving intrusion detection. Decision Support Systems. 2007; 43(4): 1362-1374.

64.

Xia

Hariri

Yousif

. An efficient network intrusion detection method based on information theory and genetic algorithm. in: Performance, Computing, and Communications Conference, 2005. IPCCC 2005. 24th IEEE International. IEEE. 2005; 11-17.

65.

Zhao

J-L

Zhao

J-F

J-J

. Intrusion detection based on clustering genetic algorithm. in: Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on. IEEE. 2005; 3911-3914.

66.

De Castro

Timmis

. Artificial immune systems: A new computational intelligence approach. Springer Science & Business Media. 2002.

67.

Jerne

. Towards a network theory of the immune system. in: Annales d’immunologie. 1974; 1-2: 373.

68.

Dasgupta

Attoh-Okine

. Immunity-based systems: A survey. in: Systems, Man, and Cybernetics, 1997. Computational Cybernetics and Simulation, 1997 IEEE International Conference on. IEEE. 1997; 369-374.

69.

Forrest

Hofmeyr

Somayaji

Longstaff

. A sense of self for unix processes. in: Security and Privacy, 1996 Proceedings 1996 IEEE Symposium on. IEEE. 1996; 120-128.

70.

Aickelin

Bentley

Cayzer

Kim

McLeod

. Danger theory: The link between AIS and IDS? International Conference on Artificial Immune Systems. Springer. 2003; 147-155.

71.

Matzinger

. An innate sense of danger. in: Seminars in Immunology, Elsevier. 1998; 5: 399-415.

72.

Igbe

Darwish

Saadawi

. Distributed network intrusion detection systems: An artificial immune system approach. in: Connected Health: Applications, Systems and Engineering Technologies (CHASE), 2016 IEEE First International Conference on. IEEE. 2016; 101-106.

73.

Botha

Von Solms

Perry

Loubser

Yamoyany

. The utilization of artificial intelligence in a hybrid intrusion detection system. in: Proceedings of the 2002 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists on Enablement Through Technology, South African Institute for Computer Scientists and Information Technologists. 2002; 149-155.

74.

Katar

. Combining multiple techniques for intrusion detection. Int J Comput Sci Network Security. 2006; 6(2B): 208-218.

75.

Mabu

Wang

Hirasawa

. Integrated fuzzy GNP rule mining with distance-based classification for intrusion detection system. in: Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on. IEEE. 2012; 1569-1574.

76.

Shamshirband

Amini

Anuar

Kiah

MLM

Teh

Furnell

. D-FICCA: A density-based fuzzy imperialist competitive clustering algorithm for intrusion detection in wireless sensor networks. Measurement. 2014; 55: 212-226.

77.

Zojaji

Atani

Monadjemi

. A survey of credit card fraud detection techniques: Data and technique oriented perspective. arXiv Preprint ArXiv: 161106439. 2016.

78.

Worobec

. Fraud the facts 2016. Financial Fraud Action UK, Tech Rep. 2016.

79.

Patidar

Sharma

. Credit card fraud detection using neural network. International Journal of Soft Computing and Engineering (IJSCE). 2011; 1: 32-38.

80.

UK FFA, Association lUC. Fraud the Facts 2011. The definitive overview of payment industry fraud and measures to prevent it. Disponible en línea en: http://www.theukcardsassociation.org.uk/files/ukca/fraud_the_facts_2010.pdf. 2011.

81.

Vardhani

Priyadarshini

Narasimhulu

. CNN data mining algorithm for detecting credit card fraud. in: Soft Computing and Medical Bioinformatics. Springer. 2019; 85-93.

82.

Kumari

Mishra

. Analysis of credit card fraud detection using fusion classifiers. in: Computational Intelligence in Data Mining. Springer. 2019; 111-122.

83.

Hasheminejad

Salimi

. FDiBC: A novel fraud detection method in bank club based on sliding time and scores window. Journal of AI and Data Mining. 2018; 6(1): 219-231.

84.

Ravisankar

Ravi

Rao

Bose

. Detection of financial statement fraud and feature selection using data mining techniques. Decision Support Systems. 2011; 50(2): 491-500.

85.

Syeda

Zhang

Y-Q

Pan

. Parallel granular neural networks for fast credit card fraud detection. in: Fuzzy Systems, 2002. FUZZ-IEEE’02. Proceedings of the 2002 IEEE International Conference on. IEEE. 2002; 572-577.

86.

Cheng

Zhang

. Credit card fraud detection using convolutional neural networks. in: International Conference on Neural Information Processing. Springer. 2016; 483-490.

87.

Kohonen

. The self-organizing map. Proceedings of the IEEE. 1990; 78(9): 1464-1480.

88.

Vesanto

Alhoniemi

. Clustering of the self-organizing map. IEEE Transactions on Neural Networks. 2000; 11(3): 586-600.

89.

Castro

Timmis

. Artificial immune systems as a novel soft computing paradigm. Soft Computing-A Fusion of Foundations, Methodologies and Applications. 2003; 7(8): 526-544.

90.

Hofmeyr

Forrest

. An immunological model of distributed detection and its application to computer security. Citeseer. 1999.

91.

Gadi

MFA

Wang

, do Lago

. Credit card fraud detection with artificial immune system. in: International Conference on Artificial Immune Systems. Springer. 2008; 119-131.

92.

Tuo

Ren

Liu

Lei

. Artificial immune system for fraud detection. in: Systems, Man and Cybernetics, 2004 IEEE International Conference on. IEEE. 2004; 1407-1411.

93.

Soltani

Akbari

Javan

. A new user-based model for credit card fraud detection based on artificial immune system. in: Artificial Intelligence and Signal Processing (AISP), 2012 16th CSI International Symposium on. IEEE. 2012; 29-33.

94.

RamaKalyani

UmaDevi

. Fraud detection of credit card payment system by genetic algorithm. International Journal of Scientific & Engineering Research. 2012; 3(7): 1-6.

95.

Raj

SBE

Portia

. Analysis on credit card fraud detection methods. in: Computer, Communication and Electrical Technology (ICCCET), 2011 International Conference on. IEEE. 2011; 152-156.

96.

Ghosh

Reilly

. Credit card fraud detection with a neural-network. in: System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on. IEEE. 1994; 621-630.

97.

Chen

R-C

Chen

T-S

Lin

C-C

. A new binary support vector system for increasing detection rate of credit card fraud. International Journal of Pattern Recognition and Artificial Intelligence. 2006; 20(02): 227-239.

98.

Şahin

Duman

. Detecting credit card fraud by decision trees and support vector machines. 2011.

99.

Sherly

. A comparative assessment of supervised data mining techniques for fraud prevention. TIST Int J Sci Tech Res. 2012; 1(16).

100.

Lee

. Bayesian network approach to predict mobile churn motivations: Emphasis on general bayesian network, markov blanket, and what-if simulation. in: International Conference on Future Generation Information Technology. Springer. 2010; 304-313.

101.

Thornton

Gustafsson

Blumenstein

Hine

. Robust character recognition using a hierarchical bayesian network. AI 2006: Advances in Artificial Intelligence. 2006; 1259-1264.

102.

Przytula

Dash

Thompson

. Evaluation of bayesian networks used for diagnostics. in: Proc IEEE Aerospace Conf. 2003; 1-12.

103.

Leonard

. The development of a rule based expert system model for fraud alert in consumer credit. European Journal of Operational Research. 1995; 80(2): 350-356.

104.

Bentley

Kim

Jung

G-H

Choi

J-U

. Fuzzy darwinian detection of credit card fraud. in: the 14th Annual Fall Symposium of the Korean Information Processing Society, 14th October. 2000.

105.

. A markov chain model of temporal behavior for anomaly detection. in: Proceedings of the 2000 IEEE Systems, Man, and Cybernetics Information Assurance and Security Workshop, West Point, NY. 2000; 169.

106.

Winston

Goldberg

. Operations research: Applications and algorithms. Duxbury Press Boston. 2004; 3.

107.

Kaur

Singh

. A survey of data mining and social network analysis based anomaly detection techniques. Egyptian Informatics Journal. 2016; 17(2): 199-216.

108.

Knox

. Algorithms for mining distancebased outliers in large datasets. in: Proceedings of the International Conference on Very Large Data Bases, Citeseer. 1998; 392-403.

109.

Angiulli

Pizzuti

. Fast outlier detection in high dimensional spaces. in: European Conference on Principles of Data Mining and Knowledge Discovery. Springer. 2002; 15-27.

110.

Hautamaki

Karkkainen

Franti

. Outlier detection using k-nearest neighbour graph. in: Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on. IEEE. 2004; 430-433.

111.

Breunig

Kriegel

H-P

Sander

. LOF: Identifying density-based local outliers. in: ACM Sigmod Record. ACM. 2000; 2: 93-104.

112.

Mishra

Chawla

. A comparative study of local outlier factor algorithms for outliers detection in data streams. in: Emerging Technologies in Data Mining and Information Security. Springer. 2019; 347-356.

113.

Jin

Tung

Han

Wang

. Ranking outliers using symmetric neighborhood relationship. in: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer. 2006; 577-593.

114.

Jain

Pamula

. Two-step anomaly detection approach using clustering algorithm. in: International Conference on Advanced Computing Networking and Informatics. Springer. 2019; 513-520.

115.

Chenaghlou

Moshtaghi

Leckie

Salehi

. Online clustering for evolving data streams with online anomaly detection. in: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer. 2018; 508-521.

116.

Aytekin

Cricri

Aksu

. Clustering and unsupervised anomaly detection with l 2 normalized deep auto-encoder representations. in: 2018 International Joint Conference on Neural Networks (IJCNN). IEEE. 2018; 1-6.

117.

Han

. CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering. 2002; 14(5): 1003-1016.

118.

Kaufman

Rousseeuw

. Clustering large applications (Program CLARA). Finding Groups in Data: An Introduction to Cluster Analysis. 2008; 126-163.

119.

Karypis

Han

E-H

Kumar

. Chameleon: Hierarchical clustering using dynamic modeling. Computer. 1999; 32(8): 68-75.

120.

. An improved BIRCH clustering algorithm and application in thermal power. in: Web Information Systems and Mining (WISM), 2010 International Conference on. IEEE. 2010; 53-56.

121.

Kumar

Reddy

ARM

. A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognition. 2016; 58: 39-48.

122.

Viswanath

Pinkesh

. l-dbscan: A fast hybrid density based clustering method. in: Pattern Recognition, 2006. ICPR 2006. 18th International Conference on. IEEE. 2006; 912-915.

123.

Ruiz

Spiliopoulou

Menasalvas

. C-dbscan: Density-based clustering with constraints. in: International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing. Springer. 2007; 216-223.

124.

Kisilevich

Mansmann

Keim

. P-DBSCAN: A density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos. in: Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & application. ACM. 2010; 38.

125.

Kryszkiewicz

Lasek

. TI-DBSCAN: Clustering with DBSCAN by means of the triangle inequality. in: International Conference on Rough Sets and Current Trends in Computing. Springer. 2010; 60-69.

126.

Lulli

Dell’Amico

Michiardi

Ricci

. NG-DBSCAN: Scalable density-based clustering for arbitrary data. Proceedings of the VLDB Endowment. 2016; 10(3): 157-168.

127.

Hou

Gao

. Dsets-dbscan: A parameter-free clustering algorithm. IEEE Transactions on Image Processing. 2016; 25(7): 3182-3193.

128.

Deng

. Discovering cluster-based local outliers. Pattern Recognition Letters. 2003; 24(9): 1641-1650.

129.

Kohonen

. The self-organizing map. Neurocomputing. 1998; 21(1): 1-6.

130.

Kanungo

Mount

Netanyahu

Piatko

Silverman

. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002; 24(7): 881-892.

131.

Bahmani

Moseley

Vattani

Kumar

Vassilvitskii

. Scalable k-means+⁣+. Proceedings of the VLDB Endowment. 2012; 5(7): 622-633.

132.

K-L

Huang

H-K

Tian

S-F

. Improving one-class SVM for anomaly detection. in: Machine Learning and Cybernetics, 2003 International Conference on. IEEE. 2003; 3077-3081.

133.

Lumini

Nanni

. Ensemble of on-line signature matchers based on overComplete feature generation. Expert Systems with Applications. 2009; 36(3): 5291-5296.

134.

Wong

Ray

Stephens

Lewis

. Artificial immune systems for the detection of credit card fraud: An architecture, prototype and preliminary results. Information Systems Journal. 2012; 22(1): 53-76.

135.

Dal Pozzolo

Caelen

Johnson

Bontempi

. Calibrating probability with undersampling for unbalanced classification. in: Computational Intelligence, 2015 IEEE Symposium Series on. IEEE. 2015; 159-166.

136.

Engelen

P-J

Liedekerke

. An ethical analysis of regulating insider trading. Discussion Paper Series/Tjalling C Koopmans Research Institute. 2006; 6(5).

137.

Agarwal

. An empirical bayes approach to detect anomalies in dynamic multidimensional arrays. in: Data Mining, Fifth IEEE International Conference on. IEEE. 2005; 8.

138.

Kulkarni

Mani

Domeniconi

. Network-based anomaly detection for insider trading. arXiv Preprint ArXiv: 170205809. 2017.

139.

Ferdousi

Maeda

. Unsupervised outlier detection in time series data. in: Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on. IEEE. 2006; x121-x121.

140.

Bolton

Hand

. Unsupervised profiling methods for fraud detection. Credit Scoring and Credit Control VII. 2001; 235-255.

141.

Jung

Chung

. Sequential pattern profiling based bio-detection for smart health service. Cluster Computing. 2015; 18(1): 209-219.

142.

Kim

S-H

Chung

K-Y

. 3D simulator for stability analysis of finite slope causing plane activity. Multimedia Tools and Applications. 2014; 68(2): 455-463.

143.

Kim

J-Y

Chung

K-Y

Jung

J-J

. Single tag sharing scheme for multiple-object RFID applications. Multimedia Tools and Applications. 2014; 68(2): 465-477.

144.

Liu

. A bayesian discriminating features method for face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2003; 25(6): 725-740.

145.

Joudaki

Rashidian

Minaei-Bidgoli

Mahmoodi

Geraili

Nasiri

Arab

. Using data mining to detect health care fraud and abuse: A review of literature. Global Journal of Health Science. 2014; 7(1): 194.

146.

Shin

Park

Lee

Jhee

. A scoring model to detect abusive billing patterns in health insurance claims. Expert Systems with Applications. 2012; 39(8): 7441-7450.

147.

Liou

F-M

Tang

Y-C

Chen

J-Y

. Detecting hospital fraud and claim abuse through diabetic outpatient services. Health Care Management Science. 2008; 11(4): 353-358.

148.

Pedersen

Quinlan

. Who’s who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy. The American Journal of Human Genetics. 2017; 100(3): 406-413.

149.

Kirlidog

Asuk

. A fraud detection approach with data mining in health insurance. Procedia-Social and Behavioral Sciences. 2012; 62: 989-994.

150.

Kumar

Ghani

Mei

Z-S

. Data mining to predict and prevent errors in health insurance claims processing. in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. 2010; 65-74.

151.

Gutiérrez

Hervás-Martínez

Martínez-Estudillo

. Logistic regression by means of evolutionary radial basis function neural networks. IEEE Transactions on Neural Networks. 2011; 22(2): 246-263.

152.

Liu

Vasarhelyi

. Healthcare fraud detection: A survey and a clustering model incorporating geo-location information. in: 29th World Continuous Auditing and Reporting Symposium (29WCARS), Brisbane, Australia. 2013.

153.

Ekina

Leva

Ruggeri

Soyer

. Application of bayesian methods in detection of healthcare fraud. Chemical Engineering Transaction. 2013; 33.

154.

Tang

Mendis

BSU

Murray

Sutinen

. Unsupervised fraud detection in medicare australia. in: Proceedings of the Ninth Australasian Data Mining Conference, Australian Computer Society, Inc. 2011; 121: 103-110.

155.

van Capelleveen

. Outlier based predictors for health insurance fraud detection within US medicaid. University of California, San Diego. 2013.

156.

Shouman

Turner

Stocker

. Applying k-nearest neighbour in diagnosing heart disease patients. International Journal of Information and Education Technology. 2012; 2(3): 220.

157.

Liu

D-Y

Chen

H-L

Yang

X-E

L-N

Liu

. Design of an enhanced fuzzy k-nearest neighbor classifier based computer aided diagnostic system for thyroid disease. Journal of Medical Systems. 2012; 36(5): 3243-3254.

158.

Zuo

W-L

Wang

Z-Y

Liu

Chen

H-L

. Effective detection of parkinson’s disease using an adaptive fuzzy k-nearest neighbor approach. Biomedical Signal Processing and Control. 2013; 8(4): 364-373.

159.

Khan

Choi

Shin

Kim

. Predicting breast cancer survivability using fuzzy decision trees for personalized healthcare. in: Engineering in Medicine and Biology Society, 2008. EMBS 2008. 30th Annual International Conference of the IEEE. IEEE. 2008; 5148-5151.

160.

Chien

Pottie

. A universal hybrid decision tree classifier design for human activity classification. in: Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE. IEEE. 2012; 1065-1068.

161.

Moon

Kang

S-Y

Jitpitaklert

Kim

. Decision tree models for characterizing smoking patterns of older adults. Expert Systems with Applications. 2012; 39(1): 445-451.

162.

Chang

C-L

Chen

C-H

. Applying decision tree and neural network to increase quality of dermatologic diagnosis. Expert Systems with Applications. 2009; 36(2): 4035-4041.

163.

Soliman

THA

Sewissy

AbdelLatif

. A gene selection approach for classifying diseases based on microarray datasets. in: Computer Technology and Development (ICCTD), 2010 2nd International Conference on. IEEE. 2010; 626-631.

164.

Fei

S-W

. Diagnostic study on arrhythmia cordis based on particle swarm optimization-based support vector machine. Expert Systems with Applications. 2010; 37(10): 6748-6752.

165.

Huang

C-L

Liao

H-C

Chen

M-C

. Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Systems with Applications. 2008; 34(1): 578-587.

166.

Avci

. A new intelligent diagnosis system for the heart valve diseases by using genetic-SVM classifier. Expert Systems with Applications. 2009; 36(7): 10618-10626.

167.

Abdi

Giveki

. Automatic detection of erythemato-squamous diseases using PSO–SVM based on association rules. Engineering Applications of Artificial Intelligence. 2013; 26(1): 603-608.

168.

Yumusak

Temurtas

. Chest diseases diagnosis using artificial neural networks. Expert Systems with Applications. 2010; 37(12): 7648-7655.

169.

Das

Turkoglu

Sengur

. Effective diagnosis of heart disease through neural networks ensembles. Expert Systems with Applications. 2009; 36(4): 7675-7680.

170.

Gunasundari

Baskar

. Application of artificial neural network in identification of lung diseases. in: Nature & Biologically Inspired Computing, 2009. NaBIC 2009. World Congress on. IEEE. 2009; 1441-1444.

171.

Beard

West

. Using bayesian statistics in health psychology: A comment on depaoli et al. Health Psychology Review (just-accepted). 2017; 1-5.

172.

Curiac

D-I

Vasile

Banias

Volosencu

Albu

. Bayesian network model for diagnosis of psychiatric diseases. in: Information Technology Interfaces, 2009. ITI’09. Proceedings of the ITI 2009 31st International Conference on. IEEE. 2009; 61-66.

173.

Agarwal

. Weighted support vector regression approach for remote healthcare monitoring. in: Recent Trends in Information Technology (ICRTIT), 2011 International Conference on. IEEE. 2011; 969-974.

174.

Lenert

Lin

Olshen

Sugar

. Clustering in the service of the public’s health. 1999.

175.

Belciug

Salem

A-B

Gorunescu

. Clustering-based approach for detecting breast cancer recurrence. in: Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on. IEEE. 2010; 533-538.

176.

Balasubramanian

Umarani

. An analysis on the impact of fluoride in human health (dental) using clustering data mining technique. in: Pattern Recognition, Informatics and Medical Engineering (PRIME), 2012 International Conference on. IEEE. 2012; 370-375.

177.

Escudero

Zajicek

Ifeachor

. Early detection and characterization of alzheimer’s disease in clinical scenarios using bioprofile concepts and k-means. in: Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE. IEEE. 2011; 6470-6473.

178.

Butler

. Membership size, communication activity, and sustainability: A resource-based model of online social structures. Information Systems Research. 2001; 12(4): 346-362.

179.

Heidemann

Klier

Probst

. Identifying key users in online social networks: A pagerank based approach. 2010.

180.

Nazir

Raza

Chuah

C-N

. Unveiling facebook: A measurement study of social network based applications. in: Proceedings of the 8th ACM SIGCOMM Conference on Internet Measurement. ACM. 2008; 43-56.

181.

Cui

J-T

J-F

. Social influence study in online networks: A three-level review. Journal of Computer Science and Technology. 2015; 30(1): 184-199.

182.

Vanetti

Binaghi

Carminati

Carullo

Ferrari

. Content-based filtering in on-line social networks. in: International Workshop on Privacy and Security Issues in Data Mining and Machine Learning. Springer. 2010; 127-140.

183.

Bhat

Abulaish

. Using communities against deception in online social networks. Computer Fraud & Security. 2014; 2014(2): 8-16.

184.

Viswanath

Bashir

Crovella

Guha

Gummadi

Krishnamurthy

Mislove

. Towards detecting anomalous user behavior in online social networks. in: Usenix Security. 2014.

185.

Xiao

Freeman

Hwa

. Detecting clusters of fake accounts in online social networks. in: Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security. ACM. 2015; 91-101.

186.

Getoor

Diehl

. Link mining: A survey. ACM SIGKDD Explorations Newsletter. 2005; 7(2): 3-12.

187.

Liben-Nowell

Kleinberg

. The link-prediction problem for social networks. Journal of the Association for Information Science and Technology. 2007; 58(7): 1019-1031.

188.

Ying

Barbar

. Spectrum based fraud detection in social networks. in: Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE. 2011; 912-923.

189.

Agarwal

Branson

Belongie

. Higher order learning with graphs. in: Proceedings of the 23rd International Conference on Machine Learning. ACM. 2006; 17-24.

190.

Girvan

Newman

. Community structure in social and biological networks. Proc Natl Acad Sci USA. 2001; 99(cond-mat/0112110): 8271-8276.

191.

Freeman

. A set of measures of centrality based on betweenness. Sociometry. 1977; 35-41.

192.

Golub

Reinsch

. Singular value decomposition and least squares solutions. Numerische Mathematik. 1970; 14(5): 403-420.

193.

Harshman

. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis. 1970.

194.

Hodge

Austin

. A survey of outlier detection methodologies. Artificial Intelligence Review. 2004; 22(2): 85-126.

195.

Chen

Miao

Zhang

. Neighborhood outlier detection. Expert Systems with Applications. 2010; 37(12): 8745-8749.

196.

Papadimitriou

Kitagawa

Gibbons

Faloutsos

. Loci: Fast outlier detection using the local correlation integral. in: Data Engineering, 2003. Proceedings. 19th International Conference on. IEEE. 2003; 315-326.

197.

Saligrama

Chen

. Video anomaly detection based on local statistical aggregates. in: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE. 2012; 2112-2119.

198.

Tran

Navasca

Luo

. Video detection anomaly via low-rank and sparse decompositions. in: Image Processing Workshop (WNYIPW), 2012 Western New York. IEEE. 2012; 17-20.

199.

Tantipathananandh

Berger-Wolf

. Finding communities in dynamic social networks. in: Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE. 2011; 1236-1241.

200.

Klema

Laub

. The singular value decomposition: Its computation and some applications. IEEE Transactions on Automatic Control. 1980; 25(2): 164-176.

201.

Gupta

Gao

Aggarwal

Han

. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering. 2014; 26(9): 2250-2267.

202.

Barnathan

Megalooikonomou

Faloutsos

Faro

Mohamed

. TWave: High-order analysis of functional MRI. Neuroimage. 2011; 58(2): 537-548.

203.

Kapoor

Wang

J-C

Wetherill

Bertelsen

Hinrichs

Budde

Agrawal

Bucholz

Dick

. A meta-analysis of two genome-wide association studies to identify novel loci for maximum number of alcoholic drinks. Human Genetics. 2013; 132(10): 1141-1151.

204.

Rossi

Gallagher

Neville

Henderson

. Modeling dynamic behavior in large evolving graphs. in: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. ACM. 2013; 667-676.

205.

Aggarwal

Subbian

. Event detection in social streams. in: Proceedings of the 2012 SIAM International Conference on Data Mining. SIAM. 2012; 624-635.

206.

Miller

Arcolano

Beard

Kepner

Schmidt

Bliss

Wolfe

. A scalable signal processing architecture for massive graph analysis. in: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE. 2012; 5329-5332.

207.

Shelton

. Intrusion detection using continuous time bayesian networks. Journal of Artificial Intelligence Research. 2010; 39: 745-774.

208.

Miller

Arcolano

Bliss

. Efficient anomaly detection in dynamic, attributed graphs: Emerging phenomena and big data. in: Intelligence and Security Informatics (ISI), 2013 IEEE International Conference on. IEEE. 2013; 179-184.

209.

Chandola

Banerjee

Kumar

. Anomaly detection for discrete sequences: A survey. IEEE Transactions on Knowledge and Data Engineering. 2012; 24(5): 823-839.

210.

Malliaros

Megalooikonomou

Faloutsos

. Fast robustness estimation in large social graphs: Communities and anomaly detection. in: Proceedings of the 2012 SIAM International Conference on Data Mining. SIAM. 2012; 942-953.

211.

Aggarwal

. Outlier detection for high dimensional data. in: ACM Sigmod Record. ACM. 2001; 2: 37-46.

212.

Bogdanov

Mongiov

Singh

. Mining heavy subgraphs in time-evolving networks. in: Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE. 2011; 81-90.

213.

Gupta

Gao

Sun

Han

. Integrating community matching and outlier detection for mining evolutionary community outliers. in: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. 2012; 859-867.

214.

Gupta

Gao

Sun

Han

. Community trend outlier detection using soft temporal pattern mining. in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer. 2012; 692-708.

215.

Papalexakis

Faloutsos

Sidiropoulos

. Parcube: Sparse parallelizable tensor decompositions. in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer. 2012; 521-536.

216.

Mongiovi

Bogdanov

Ranca

Papalexakis

Faloutsos

Singh

. Netspot: Spotting significant anomalous regions on dynamic networks. in: Proceedings of the 2013 SIAM International Conference on Data Mining. SIAM. 2013; 28-36.

217.

Chakrabarti

. Autopart: Parameter-free graph partitioning and outlier detection. in: European Conference on Principles of Data Mining and Knowledge Discovery. Springer. 2004; 112-124.

218.

Koutra

Vogelstein

Faloutsos

. Deltacon: A principled massive-graph similarity function. in: Proceedings of the 2013 SIAM International Conference on Data Mining, SIAM. 2013; 162-170.

219.

Leskovec

Mcauley

. Learning to discover social circles in ego networks. in: Advances in Neural Information Processing Systems. 2012; 539-547.

220.

Leskovec

Kleinberg

Faloutsos

. Graphs over time: Densification laws, shrinking diameters and possible explanations. in: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM. 2005; 177-187.

221.

Leskovec

Lang

Dasgupta

Mahoney

. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics. 2009; 6(1): 29-123.

222.

Paranjape

Benson

Leskovec

. Motifs in temporal networks. in: Proceedings of the Tenth ACM. International Conference on Web Search and Data Mining. ACM. 2017; 601-610.

223.

McAuley

Leskovec

. From amateurs to connoisseurs: Modeling the evolution of user expertise through online reviews. in: Proceedings of the 22nd International Conference on World Wide Web. ACM. 2013; 897-908.

224.

Akyildiz

Sankarasubramaniam

Cayirci

. Wireless sensor networks: A survey. Computer Networks. 2002; 38(4): 393-422.

225.

Arampatzis

Lygeros

Manesis

. A survey of applications of wireless sensors and wireless sensor networks. in: Intelligent Control, 2005. Proceedings of the 2005 IEEE International Symposium On, Mediterrean Conference on Control and Automation. IEEE. 2005; 719-724.

226.

Yick

Mukherjee

Ghosal

. Wireless sensor network survey. Computer Networks. 2008; 52(12): 2292-2330.

227.

Alcaraz

Najera

Lopez

Roman

. Wireless sensor networks and the internet of things: Do we need a complete integration? in: 1st International Workshop on the Security of the Internet of Things (SecIoT’10). 2010.

228.

Montenegro

Kushalnagar

Hui

Culler

. Transmission of IPv6 packets over IEEE. 802.15.4 Networks. 2007.

229.

Zhang

Meratnia

Havinga

. Outlier detection techniques for wireless sensor networks: A survey. IEEE Communications Surveys & Tutorials. 2010; 12(2): 159-170.

230.

Bettencourt

Hagberg

Larkey

. Separating the wheat from the chaff: Practical anomaly detection schemes in ecological applications of distributed sensor networks. in: International Conference on Distributed Computing in Sensor Systems. Springer. 2007; 223-239.

231.

Elnahrawy

. Research directions in sensor data streams: Solutions and challenges. Rutgers University, Tech Rep DCIS-TR-527. 2003; 2: D3.

232.

Krontiris

Benenson

Giannetsos

Freiling

Dimitriou

. Cooperative intrusion detection in wireless sensor networks. in: European Conference on Wireless Sensor Networks. Springer. 2009; 263-278.

233.

Ioannis

Dimitriou

Freiling

. Towards intrusion detection in wireless sensor networks. in: Proc. of the 13th European Wireless Conference. 2007; 1-10.

234.

Tsai

. A framework of machine learning based intrusion detection for wireless sensor networks. in: Sensor Networks, Ubiquitous and Trustworthy Computing, 2008. SUTC’08. IEEE International Conference on. IEEE. 2008; 272-279.

235.

Farooqi

Khan

. Intrusion detection systems for wireless sensor networks: A survey. in: Communication and Networking. Springer. 2009; 234-241.

236.

Wang

Deshpande

. Predictive modeling-based data collection in wireless sensor networks. in: Wireless Sensor Networks. Springer. 2008; 34-51.

237.

Deshpande

Guestrin

Madden

Hellerstein

Hong

. Model-driven data acquisition in sensor networks. in: Proceedings of the Thirtieth International Conference on Very Large Data Bases. VLDB Endowment. 2004; 30: 588-599.

238.

Zidi

Moulahi

Alaya

. Fault detection in wireless sensor networks through SVM classifier. IEEE Sensors Journal. 2018; 18(1): 340-347.

239.

Zamry

Zainal

Rassam

. Unsupervised anomaly detection for unlabelled wireless sensor networks data. International Journal of Advances in Soft Computing & Its Applications. 2018; 10(2).

240.

Emadi

Mazinani

. A novel anomaly detection algorithm using DBSCAN and SVM in wireless sensor networks. Wireless Personal Communications. 2018; 98(2): 2025-2035.

241.

Palpanas

Papadopoulos

Kalogeraki

Gunopulos

. Distributed deviation detection in sensor networks. ACM SIGMOD Record. 2003; 32(4): 77-82.

242.

Subramaniam

Palpanas

Papadopoulos

Kalogeraki

Gunopulos

. Online outlier detection in sensor data using non-parametric models. in: Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment. 2006; 187-198.

243.

Zhang

Hamm

Meratnia

Stein

Van De Voort

Havinga

. Statistics-based outlier detection for wireless sensor networks. International Journal of Geographical Information Science. 2012; 26(8): 1373-1392.

244.

Branch

Giannella

Szymanski

Wolff

Kargupta

. In-network outlier detection in wireless sensor networks. Knowledge and Information Systems. 2013; 34(1): 23-54.

245.

Zhuang

Chen

. In-network outlier cleaning for data collection in sensor networks. in: CleanDB. 2006.

246.

Moshtaghi

Havens

Bezdek

Park

Leckie

Rajasegarar

Keller

Palaniswami

. Clustering ellipses for anomaly detection. Pattern Recognition. 2011; 44(1): 55-69.

247.

Moshtaghi

Rajasegarar

Leckie

Karunasekera

. Anomaly detection by clustering ellipsoids in wireless sensor networks. in: Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2009 5th International Conference on. IEEE. 2009; 331-336.

248.

Rajasegarar

Leckie

Palaniswami

Bezdek

. Distributed anomaly detection in wireless sensor networks. in: Communication Systems, 2006. ICCS 2006. 10th IEEE Singapore International Conference on. IEEE. 2006; 1-5.

249.

Bezdek

Rajasegarar

Moshtaghi

Leckie

Palaniswami

Havens

. Anomaly detection in environmental monitoring networks [application notes]. IEEE Computational Intelligence Magazine. 2011; 6(2): 52-58.

250.

Rajasegarar

Bezdek

Leckie

Palaniswami

. Elliptical anomalies in wireless sensor networks. ACM Transactions on Sensor Networks (TOSN). 2009; 6(1): 7.

251.

Rajasegarar

Leckie

Palaniswami

Bezdek

. Quarter sphere based distributed anomaly detection in wireless sensor networks. in: Communications, 2007. ICC’07. IEEE International Conference on. IEEE. 2007; 3864-3869.

252.

Moshtaghi

Leckie

Karunasekera

Bezdek

Rajasegarar

Palaniswami

. Incremental elliptical boundary estimation for anomaly detection in wireless sensor networks. in: Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE. 2011; 467-476.

253.

Yang

Meratnia

Havinga

. An online outlier detection technique for wireless sensor networks using unsupervised quarter-sphere support vector machine. in: Intelligent Sensors, Sensor Networks and Information Processing, 2008. ISSNIP 2008. International Conference on. IEEE. 2008; 151-156.

254.

Thuc

K-X

Insoo

. A collaborative event detection scheme using fuzzy logic in clustered wireless sensor networks. AEU-International Journal of Electronics and Communications. 2011; 65(5): 485-488.

255.

Bahrepour

Meratnia

Poel

Taghikhaki

Havinga

. Distributed event detection in wireless sensor networks for disaster management. in: Intelligent Networking and Collaborative Systems (INCOS), 2010 2nd International Conference on. IEEE. 2010; 507-512.

256.

Baig

. Pattern recognition for detecting distributed node exhaustion attacks in wireless sensor networks. Computer Communications. 2011; 34(3): 468-484.

257.

Baig

Khan

. Fuzzy logic-based decision making for detecting distributed node exhaustion attacks in wireless sensor networks. in: Future Networks, 2010. ICFN’10. Second International Conference on. IEEE. 2010; 185-189.

258.

Gaber

. Data stream processing in sensor networks. in: Learning From Data Streams. Springer. 2007; 41-48.

259.

Rassam

Zainal

Maarof

. Advancements of data anomaly detection research in wireless sensor networks: A survey and open issues. Sensors. 2013; 13(8): 10087-10122.

260.

Shukla

Kumari

. Security threats and defense approaches in wireless sensor networks: An overview. International Journal of Application Or Innovation in Engineering & Management (IJAIEM). 2013; 2.

261.

Mohammadi

Jadidoleslamy

. A comparison of link layer attacks on wireless sensor networks. arXiv Preprint ArXiv: 11035589. 2011;

262.

Granjal

Monteiro

Silva

. Security for the internet of things: A survey of existing protocols and open research issues. IEEE Communications Surveys & Tutorials. 2015; 17(3): 1294-1312.

263.

Diaz

Sanchez

. Simulation of attacks for security in wireless sensor network. Sensors. 2016; 16(11): 1932.

264.

Moshtaghi

Bezdek

Havens

Leckie

Karunasekera

Rajasegarar

Palaniswami

. Streaming analysis in wireless sensor networks. Wireless Communications and Mobile Computing. 2014; 14(9): 905-921.

265.

Zhang

Meratnia

Havinga

. Adaptive and online one-class support vector machine-based outlier detection techniques for wireless sensor networks. in: Advanced Information Networking and Applications Workshops, 2009. WAINA’09. International Conference on. IEEE. 2009; 990-995.

266.

Chong

Tay

. Modeling representation of videos for anomaly detection using deep learning: A review. arXiv Preprint ArXiv: 150500523. 2015.

267.

Cong

Yuan

Tang

. Video anomaly search in crowded scenes via spatio-temporal motion context. IEEE Transactions on Information Forensics and Security. 2013; 8(10): 1590-1599.

268.

Klaser

Marszałek

Schmid

. A spatio-temporal descriptor based on 3d-gradients. in: BMVC 2008-19th British Machine Vision Conference. British Machine Vision Association. 2008; 275: 271-210.

269.

Javan Roshtkhari

Levine

. Online dominant and anomalous behavior detection in videos. in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013; 2611-2618.

270.

Shi

Jia

. Abnormal event detection at 150 fps in matlab. in: Proceedings of the IEEE International Conference on Computer Vision. 2013; 2720-2727.

271.

Monga

Bala

Fan

. Adaptive sparse representations for video anomaly detection. IEEE Transactions on Circuits and Systems for Video Technology. 2014; 24(4): 631-645.

272.

Cong

Yuan

Liu

. Sparse reconstruction cost for abnormal event detection. in: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE. 2011; 3449-3456.

273.

Morris

Trivedi

. A survey of vision-based trajectory learning and analysis for surveillance. IEEE Transactions on Circuits and Systems for Video Technology. 2008; 18(8): 1114-1127.

274.

Callegari

Gazzarrini

Giordano

Pagano

Pepe

. A novel PCA-based network anomaly detection. in: Communications (ICC), 2011 IEEE International Conference on. IEEE. 2011; 1-5.

275.

Rajeshwari

Babu

. Real-time credit card fraud detection using streaming analytics. in: Applied and Theoretical Computing and Communication Technology (iCATccT), 2016 2nd International Conference on. IEEE. 2016; 439-444.

276.

Smrithy

Balakrishnan

. A statistical technique for online anomaly detection for big data streams in cloud collaborative environment. in: Computer and Information Technology (CIT), 2016 IEEE International Conference on. IEEE. 2016; 108-111.

277.

Dean

Ghemawat

. MapReduce: Simplified data processing on large clusters. Communications of the ACM. 2008; 51(1): 107-113.

278.

Dai

Yan

Tang

Zhao

Guo

. Online credit card fraud detection: A hybrid framework with big data technologies. in: Trustcom/BigDataSE/I? SPA, 2016 IEEE. IEEE. 2016; 1644-1651.

279.

Zamini

Montazer

. Credit card fraud detection using autoencoder based clustering. in: 2018 9th International Symposium on Telecommunications (IST). IEEE. 2018.

A comprehensive survey of anomaly detection in banking,wireless sensor networks,social networks,and healthcare

Abstract

Keywords

1. Introduction

2.1 Based on nature of anomalies

2.1.1 Point anomalies

2.1.3 Collective anomalies

2.2 Based on static/dynamic nature of network/graph structure

2.3 Based on information available in network/graph structure

2.3.1 Labeled anomalies

2.3.2 Unlabeled anomalies

2.4 Based on behavior

2.4.1 White crow anomaly

2.5 Based on structural operations on network/graph structure

2.7.1 Anomalous vertices

2.7.2 Anomalous edges

2.7.3 Anomalous sub-graphs

2.7.4 Event and change detection

2.8 Anomalies in wireless sensor networks

2.8.1 Node anomaly

2.8.2 Network anomaly

2.8.3 Data anomaly

3. Applications of anomaly detection

3.1 Intrusion detection to cloud systems

3.1.1.1. Host based intrusion detection systems (HIDS)

3.1.1.2. Distributed intrusion detection system (DIDS)

3.1.1.3. Hypervisor-based intrusion detection system

3.1.1.4. Intrusion prevention system (IPS)

3.1.1.5. Intrusion detection and prevention system (IDPS)

3.1.1.6. Network based intrusion detection systems (NIDS)

3.1.2 Solutions to attacks

3.1.2.1. Firewalls

3.1.2.2. IDS and IPS techniques

3.1.2.3. Anomaly detection

3.1.2.4. Artificial neural network (ANN) based IDS

3.1.2.5. Fuzzy logic based IDS

3.1.2.6. Association rule based IDS

3.1.2.7. Support vector machine (SVM) based IDS

3.1.2.8. Genetic algorithm (GA) based IDS

3.1.2.9. Artificial immune systems (AIS)

3.1.2.10. Hybrid techniques

3.2 Fraud detection

3.2.1 Types of fraud techniques

3.2.2.1. Artificial immune system (AIS)

3.2.2.2. Negative selection

3.2.2.3. Clonal selection

3.2.2.4. Hidden markov model (HMM)

3.2.2.5. Support vector machine (SVM)

3.2.2.6. Bayesian network

3.2.2.7. Expert systems

3.2.2.8. Fuzzy darwinian system

3.2.2.9. Statistical distribution based methods

3.2.2.10. Markov chain model

Table 3 Advantage and disadvantages of anomaly detection approaches [107]

3.2.5 Classification based approaches

Table 4 Data sets used by researchers

3.2.8 Evaluation

3.4 Stock market fraud detection

3.5 Medical and public health anomaly detection

Table 6 Different techniques used for health anomaly detection

3.6.2 Structure based techniques

3.6.3 Spectral anomaly detection techniques

3.7.2 Compression

3.7.3 Matrix/tensor decomposition

3.7.4 Distance measures

3.7.5 Probabilistic models

3.8.2 Clustering-based anomaly detection models for WSNs

3.8.3 Classification-based anomaly detection models for WSNs

3.8.4 Advantages of using anomaly detection in WSN

3.9.1 Challenges in modeling a good representation

Table 9 Platforms used in online anomaly detection

References

Table 3
Advantage and disadvantages of anomaly detection approaches [107]

Table 4
Data sets used by researchers

Table 6
Different techniques used for health anomaly detection

Table 9
Platforms used in online anomaly detection