An improved k-NN anomaly detection framework based on locality sensitive hashing for edge computing environment

Abstract

Large deployment of wireless sensor networks in various fields bring great benefits. With the increasing volume of sensor data, traditional data collection and processing schemes gradually become unable to meet the requirements in actual scenarios. As data quality is vital to data mining and value extraction, this paper presents a distributed anomaly detection framework which combines cloud computing and edge computing. The framework consists of three major components: k-nearest neighbors, locality sensitive hashing, and cosine similarity. The traditional k-nearest neighbors algorithm is improved by locality sensitive hashing in terms of computation cost and processing time. An initial anomaly detection result is given by the combination of k-nearest neighbors and locality sensitive hashing. To further improve the accuracy of anomaly detection, a second test for anomaly is provided based on cosine similarity. Extensive experiments are conducted to evaluate the performance of our proposal. Six popular methods are used for comparison. Experimental results show that our model has advantages in the aspects of accuracy, delay, and energy consumption.

Keywords

Aanomaly detection k-nearest neighbors locality sensitive hashing cosine similarity edge computing

1. Introduction

With the rapid development of Internet of Things (IoT), the number of IoT devices involved in manufacturing processes of a smart factory keeps increasing. Meanwhile, the amount of sensor data generated by various sensors deployed in a smart factory has exploded in recent years. The sensor data fall into two major categories: 1) state data which represent real-time status at a specific timestamp (e.g., temperature, humidity) and 2) accumulated data which denote accumulated amount during a time period (e.g., energy consumption, uptime). The above two types of data possess a strict sequential order. Thus, they can be declared as time series [1]. Typically, time series is able to demonstrate both real-time changes and trends over time for a variable.

In general, anomaly in time series falls into the following three categories [2].

•
Point anomaly [3]: point anomaly refers to a point which is different from other points. This may be the result of sensor malfunction or noise caused by external factors. Point anomaly is also called outlier.
•
Pattern anomaly [4]: pattern anomaly refers to a significant difference between a segment pattern and other segment patterns. In other words, pattern anomaly is caused by a sudden change.
•
Sequence anomaly [5]: sequence anomaly refers to the non-compliance of a subsequence to other subsequences, such as subsequences generated by different DoS attack mechanisms.

Both point anomaly and pattern anomaly are abnormal behaviors appeared in an individual time series, while sequence anomaly is abnormal behaviors appeared between sequences. In addition, there are other descriptions and terms which convey anomaly (e.g, concept drift). In [6], the authors propose a concept drift detection model for data streams. The model is based on ensemble classifiers. The authors point out that “change in the underlying distribution that data points come from”, namely concept drift, is inevitable in data streams.

Traditionally, the collected sensor data in a smart factory are transmitted to a cloud computing center for anomaly detection. Then, the detection result and relevant decision are sent back to the factory. However, with the explosion of data, cloud-based data processing methods inevitably suffer from performance degradation caused by several crucial factors (e.g., latency, bandwidth consumption, and jitter).

As a complement of cloud computing, edge computing [7] selectively offloads computing and storage tasks to the edge of a network. Namely, data can be processed on edge nodes. This new computing paradigm possesses several advantages [8]: 1) the latency of data transmission is reduced, 2) the amount of data transmitted to the cloud is reduced, and 3) data security and privacy are enhanced. As edge computing is able to meet the demand of mobility, real-time capability, reliability, security and privacy for a network, it has extensive application scenarios [9].

For IoT devices in a smart factory, the collected sensor data show the system status in real time. The monitoring, control, and decision-making for manufacturing processes can be achieved by analyzing the data. However, data anomaly often leads to improper instructions and wrong operations. The validity and quality of the data are of great importance. Thus, a timely and accurate anomaly detection for the data plays a vital role in enhancing the stability and security of manufacturing processes [10]. In general, anomaly is inevitable during the collection and transmission of the sensor data. When the original data are transmitted to the cloud, the reliability of data analysis cannot be guaranteed. Based on the concept of edge computing, anomaly detection of the original data can be performed on edge nodes. In this case, the quality of data transmitted to the cloud is improved. In addition, the amount of data which need to be transmitted is reduced. Thus, the network performance metrics latency and bandwidth consumption are improved.

For anomaly detection, the comparison of edge computing and cloud computing is depicted in Fig. 1. The traditional cloud computing mode transmits the collected data to the cloud via wireless networks and backbone networks. The result of anomaly detection accorded by the cloud is sent back to the smart factory. In general, the transmission delay is much larger than that of the calculation delay. Thus, the real-time requirements of a smart factory could hardly be met. In contrast, the edge computing mode is able to selectively offload certain tasks which are originally performed in the cloud to the edge nodes, such as data preprocessing, anomaly detection, real-time decision-making, and privacy policy enforcement.

Table 1
Representative literatures in five categories

Category Scheme

Statistical-based [11] [12] [13] [14] [15]

Clustering-based Hierarchical-based [16] [17] [18] [19] [20]

Partitional-based [21] [22] [23]

Distance-based [24] [25] [26] [27] [28] [29]

Density-based [30] [31] [32] [33] [34] [35]

Classification-based Neural network-based [36] [37] [38]

SVM-based [39] [40] [41]

Isolation-based [42] [43] [44]

Figure 1.
Comparison of edge computing and cloud computing.

In this paper, we propose an improved k-NN anomaly detection framework based on locality sensitive hashing (LSH). The adoption of LSH provides a data preprocessing operation which facilitates the subsequent operation performed by the k-nearest neighbors algorithm. In specific, the innovation point is that the finding of neighboring elements in a large set $X$ is transformed to finding neighboring elements in a small subset of $X$ . Thus, computational efficiency can be significantly improved, especially for high-dimensional data. Different operations are assigned to the cloud and edge nodes based on their specific computation intensities. This design is aimed at providing a further improvement of both computational efficiency and real time responsiveness. In addition, a second test for anomaly is developed based on the concept of cosine similarity (CS), which is designed to give more accurate anomaly detection results.

The reminder of this paper is structured as follows. Section 2 reviews representative anomaly detection methods in five categories. The advantages and disadvantages of these methods are summarized. Section 3 presents our improved k-NN anomaly detection framework which is enhanced by locality sensitive hashing. Three major components k-NN, LSH, and CS are elaborated, respectively. Section 4 evaluates the proposed model with extensive experiments. The analysis of numerical results is conducted based on the comparison of our method and six popular algorithms. Finally, conclusions and future work are presented in Section 5.
2. Related works

Category	Scheme
Statistical-based	[11] [12] [13] [14] [15]
Clustering-based	Hierarchical-based	[16] [17] [18] [19] [20]
	Partitional-based	[21] [22] [23]
Distance-based	[24] [25] [26] [27] [28] [29]
Density-based	[30] [31] [32] [33] [34] [35]
Classification-based	Neural network-based	[36] [37] [38]
	SVM-based	[39] [40] [41]
	Isolation-based	[42] [43] [44]

This section presents a brief review of existing anomaly detection methods. Table 1 shows representative literatures in five categories.

2.1 Statistical-based methods

The basic idea of a statistical-based method roots in hypothesis. It is assumed that the collected data are in certain distribution model (e.g., Gaussian distribution, Poisson distribution). If a data sample is in the distribution model, it is considered to be normal. When the data sample is not in the distribution model, it is declared as an outlier. The advantage of this type of methods is that there are mature theoretical models which are ready to use. In general, high availability and accuracy can be achieved.

In [11], a Markov-based online anomaly detection method is proposed. The anomaly detection is performed by a model with quantized amplitudes which are given by Markov chain. In [12], a local statistical scan for excessive communication of dynamic network subdomain is used to detect abnormal activities. In [13], a histogram-based anomaly detection method is developed. For stream data, a histogram of different characteristics is generated. The univariate characteristic density is modeled using histogram with fixed or dynamic facet width. Then, the anomaly detection is performed by calculating the anomaly score of a data sample. In [14], the authors propose to perform anomaly detection by using median and median absolute deviation. In addition, a prior dimension reduction is conducted based on principal component analysis. In [15], the authors develop a deep autoencoding Gaussian mixture model to perform unsupervised anomaly detection. A low-dimensional representation of an input data sample is fed into the Gaussian mixture model.

Drawback: Most statistical-based methods require multiple experiments to obtain prior information (e.g., data distribution, parameter estimation, and confidence interval). In general, these prior information could hardly be obtained in advance. In addition, this type of methods is not suitable for high-dimensional data.

2.2 Clustering-based methods

The fundamental idea of a clustering-based method is as follows. The collected data is divided into different clusters based on a certain criterion (e.g., distance). The similarity of data within the same cluster is made as high as possible, while the similarity of data between different clusters is made as low as possible [45]. Finally, a data sample which lies in no clusters is declared as an anomaly. The advantage of this type of methods is simplicity of algorithm, fast processing, and no need for class label or prior knowledge.

2.2.1 Hierarchical clustering-based methods

Hierarchical clustering-based methods divide data into different hierarchies. Thus, a tree-shaped clustering structure is obtained. The common strategies are bottom-up aggregation and top-down split. In [46], the authors make a comparison of similarity measures for categorical data in hierarchical clustering. Criteria are provided to determine in which situations a certain similarity measure is recommended for use. In [16], the authors propose an efficient data clustering method called BIRCH. Data are processed based on a tree structure. Each leaf node stores one cluster which is denoted by a cluster center and a radius. A data sample is assigned to the nearest leaf node. At last, data which are not assigned to any leaf node are declared as anomalies. In [18], the authors mainly deal with code anomalies, namely code fragments which are not typical. The acquisition of code vector representation and anomaly detection based on the vectorized data are discussed. In [19], the authors point out that traditional non-incremental hierarchical clustering algorithms for anomaly detection possess the disadvantages of low effectiveness and instability. The proposal in [19] dynamically adjusts the optimum clustering number based on the self-defined criterion. The manual picking of clustering number is avoided. This mitigates the above problems of traditional non-incremental hierarchical clustering methods. In [20], the authors propose a hybrid grid-based hierarchical clustering method based on Hausdorff distance [47]. Different versions of hierarchical clustering algorithms are used to cluster the grid-based trajectories based on pairwise the Hausdorff distances. The proposal is suitable for measuring the similarity between trajectories of different lengths.

2.2.2 Partitional clustering-based methods

Partitional clustering-based methods divide the collected data into several clusters. Initial centers of the above clusters are randomly selected. Then, heuristic algorithms perform iterative relocations for the collected data until a clear clustering result is achieved. The celebrated k-means algorithm was initially proposed in [21]. It randomly selects $k$ data samples as the centers of $k$ clusters. The remaining data are assigned to the nearest cluster. Then, the average value of data samples within each cluster is calculated to update the center. The above process is repeated until the criterion function shows a convergence. Namely, there is no need to update any center. By this time, a data sample which dose not belong to any cluster is deemed to be an anomaly. In the original k-means algorithm, the selection of initial centers is crucial to the quality of subsequent clustering result. In [23], the authors develop an agglomerative hierarchical clustering algorithm to identify outliers in hybrid dataset with numeric and categorical attributes. A similarity measure is designed to deal with both numeric and categorical attributes of data. In [48], the authors combine the information entropy with the original k-means algorithm for the purpose of optimizing the selection of initial centers and automatically determining the number of clusters.

Drawback: The performance of most clustering-based methods completely relies on the quality of clustering result. In addition, the time complexity of high-dimensional data processing is quite high.

2.3 Distance-based methods

Distance-based methods calculate the distance between data samples by using a distance function. For data sample $x$ , the distances between $x$ and the remaining data samples of the collected data are compared with a predefined threshold. A distance which is larger than the threshold indicates $x$ is an anomaly. The advantage of this type of methods is that there is no need to estimate the distribution of data in advance. In addition, the performance of distance-based methods for high-dimensional data is better than that of clustering-based methods.

In [24], the authors propose a linear regressive model for time series anomaly detection. Several algorithms related to the Huber-skip and least trimmed squares estimators are developed to perform anomaly detection. In [25], non-parametric methods for accurate periodicity detection are developed. Periodic distance measures for time series are used to perform anomaly detection. Among various distance-based methods, the most widely used idea is the k-nearest neighbors [26]. Methods based on this idea perform anomaly detection based on the proximity between data samples. It is simple and does not require any training. However, the computation cost of the k-NN is high due to the calculation of distance between data samples. In [27], a k-NN-SVM method is proposed for anomaly detection. The indexing of training set is conducted based on the idea of R*-tree. Labels of data chosen by the k-NN determine whether a further training based on the SVM is needed. Although the training time is reduced, this proposal is not efficient for processing high-dimensional data due to large amount of computation cost introduced by both the k-NN and the SVM. In [28], the authors develop an efficient method to detect anomalies in the log data. N-gram and frequent pattern mining (FPM) are used. A labeled log data sample set is obtained from historical logs by using clustering and self-training method.

Drawback: The performance of most distance-based methods is influenced by parameter setting. And parameter selection is not an easy task.

2.4 Density-based methods

As the above mentioned distance-based methods take no account of the local sparsity of data, density-based methods which involve both proximity and local sparsity of data are developed. The advantage of this type of methods is the imbalance of local sparsity of data could be nicely handled. This contributes to more accurate detection result of local anomaly.

In [30], a local outlier factor (LOF) model is proposed. The local density of a data sample is compared with the densities of its neighboring data samples. The density is calculated based on the distance between data samples. The larger the distance is, the smaller the density is, and vice versa. As LOF-based models possess high time complexity, they are not suitable for processing large amount of high-dimensional data [49]. In [31], an unsupervised density-based anomaly detection method is developed. The local anomaly score of a data sample is used to represent the degree of its deviation from other data samples. In [32], a connectivity-based outlier factor (COF) model is proposed. It improves the effectiveness of the original LOF model when a pattern itself has a similar neighborhood density of an outlier. The detection accuracy of COF is superior to that of LOF. However, the execution time of the COF algorithm is often larger than that of LOF. In [33], the authors study multiple types of traffic data for the purpose of extracting different features. A grid-based LOF algorithm is developed to detect abnormal areas in the city of Beijing. In [34], the authors propose a fast outlier detection algorithm for data streams. The proposal is able to reduce the computation cost of the LOF by using Z-score pruning. In [35], the authors develop a flexible parametric probability measure for large scale anomaly detection in mixed numerical and categorical input space. Low likelihood values are identified as anomalies.

Drawback: For most density-based methods, the time complexity of an algorithm significantly increases with the dimension increment of data.

2.5 Classification-based methods

Based on different core components, classification-based methods can be divided into two categories: multi-class classifier and one-class classifier. The former type gives different labels to data which belong to different classes. The latter type sets an identifiable perimeter which encloses normal data. The advantage of classification-based methods is that there are multiple mature mathematical packages which can be employed to perform anomaly detection. The accuracy of anomaly detection is considered to be with a fairly strong stability.

2.5.1 Neural network-based methods

Neural network-based methods simulate the human nervous system to conduct computing and self-learning. Anomaly detection is performed based on considerable prior training. The advantage of this type of methods is that no prior information is needed. No statistical assumption for the original data is needed, thus this type of methods is resistant to distractors. In [36], an anomaly detection model based on convolutional neural network (CNN) is proposed. The extracted features are converted to a binary vector using one-hot encoding. Then, the CNN model is trained for the purpose of anomaly detection. In [37], different deep neural networks are used (e.g., CNN, recurrent neural network) in the training phase. The acoustic emission training is used for dimension reduction which reduces computation cost. In [38], several most advanced deep learning techniques for anomaly detection in cyber security are demonstrated. In specific, the discussion is focused on the multilayer perceptron (MLP) deep learning method.

2.5.2 SVM-based methods

The core component of SVM-based methods is a linear classifier which is defined in feature space. An SVM equipped with a kernel trick can be considered as a non-linear classifier. In general, an SVM-based method generates an area which covers normal data. A data sample lies outside the area is declared as an anomaly. As the performance of SVM-based methods is sensitive to the result of parameter selection, considerable improvements have been proposed. For instance, a data-driven hyperparameter optimization of one-class SVM for anomaly detection is proposed in [39]. In [40], the authors develop a linear one-class SVM based on an unsupervised deep belief network. The proposal is aimed at anomaly detection of high-dimensional and large-scale data. In [41], the authors propose an anomaly detection scheme based on a generalized support vector data description. The basic idea is that hyperspheres are created to accommodate data samples in different classes. Hence, outliers can be identified.

2.5.3 Isolation-based methods

The basic idea of isolation-based methods is that anomalies are few and different, they are more susceptible to be isolated with normal data. The advantage of this type of methods is that the intrinsic characteristics of an outlier are taken into full account during the construction of an anomaly detector [42]. In [43], an isolation-based distributed outlier detection framework using the nearest neighbor ensembles is proposed. The proposal is mainly based on isolation forest (iForest) [44] and LOF.

Drawback: Most classification-based methods are not sensitive enough to anomalies which are close to normal data.

3. An improved k-NN anomaly detection framework based on LSH

3.1 Basic model

Our model is a distributed anomaly detection framework based on edge computing. As shown in Fig. 2, the cloud server and the edge nodes cooperate with each other. In specific, there are three major components: k-NN, LSH, and CS. The traditional k-NN algorithm is enhanced by locality sensitive hashing (LSH) [50]. The cloud is responsible for both the generation and update of hash table. The latest hash table is sent to the edge nodes. The anomaly detection of data collected by various sensors is performed on edge nodes with k-NN. In this anomaly detection process, the hash table is used to reduce the volume of data which are used by k-NN. Finally, the anomaly detection results are further tested based on cosine similarity.

Figure 2.

k-NN anomaly detection framework based on LSH.

3.2 Algorithm flow

The interaction of the above mentioned three major components is illustrated in Fig. 3. The brief flow chart is aimed at presenting the relation between the three algorithms. Thus, only key operations are depicted. For a test data sample $x_{i}$ , it can be considered that $x_{i}$ is first handled by LSH. Actual buckets are obtained based on the hash values of data samples in the labeled training set $Q$ . If there is no bucket for the hash value of $x_{i}$ , it is deemed to be abnormal. Otherwise, a new set $Q^{\prime}$ is obtained according to the particular bucket which accommodates the hash value of $x_{i}$ . Then, k-NN is on. The distances between $x_{i}$ and data samples in set $Q^{\prime}$ are obtained. If the majority of the $k$ nearest neighbors of $x_{i}$ are normal, $x_{i}$ is deemed to be normal. Otherwise, the combination of LSH and k-NN indicates that $x_{i}$ is abnormal. However, CS is introduced to perform a further testing. The cosine similarity values for normal data samples in set $Q$ are calculated. If the cosine similarity value for $x_{i}$ is within the normal range of cosine similarity values obtained, $x_{i}$ is deemed to be normal. Otherwise, it is deemed to be abnormal.

Figure 3.

Algorithm flow chart.

3.3 K-nearest neighbors

Anomaly detection based on the k-NN algorithm measures the proximity between data samples. For data sample $x_{i}$ , the most similar data samples (i.e., the nearest $k$ data samples in the sample space) are considered. If the majority of these samples are anomalies, then the data sample $x_{i}$ is considered to be an anomaly, vice versa.

An $m$ -dimensional dataset which contains $n$ data samples can be denoted as

$\displaystyle X=\left\{x_{i}\mid x_{i}\in R^{m}\right\},i=1,2,\ldots,n.$ (1)

In specific, data sample $x_{i}$ is

$\displaystyle x_{i}=\left[x_{i1},x_{i2},\ldots,x_{im}\right]^{T}.$ (2)

The distance between two data samples $x_{i}$ and $x_{j}$ can be calculated based on the Euclidean distance, namely

$\displaystyle d(x_{i},x_{j})=\sqrt{\sum_{d=1}^{m}(x_{id}-x_{jd})^{2}}.$ (3)

The $k$ nearest neighbors of $x_{i}$ can be denoted by set $N_{k}(x_{i})$ . We denote the classes of normal data and abnormal data as $c_{1}$ and $c_{2}$ , respectively. Thus, the class set can be denoted as $C=\left\{c_{1},c_{2}\right\}$ . Based on the majority voting, the class of $x_{i}$ can be determined as

$\displaystyle c(x_{i})=\mathop{\text{missing}}{argmin}\limits_{c_{j}}\sum_{x_{% s}\in N_{k}(x_{i})}I(y_{s}=c_{j}),$ (4)

where $s=1,2$ and $j=1,2,\ldots,k$ . $I(\cdot)$ is an indicator function. When $y_{i}$ is equal to $c_{j}$ , $I(\cdot)$ is 1. When $y_{i}$ is not equal to $c_{j}$ , $I(\cdot)$ is 0.

Based on the above analysis, the detailed k-NN algorithm is shown in Algorithm 3.3, where $Q$ is the training set.

: $k$ - $NN(k,Q,x_{i})$ [1] $D\leftarrow\varnothing$ $C\leftarrow\left\{c_{1},c_{2}\right\}$ // class set $j=1$ $\left|Q\right|$ $D\leftarrow D\cup\left\{d(x_{i},q_{j})\right\}$ // distances between $x_{i}$ and data samples in the labeled training set $Q$ $L\leftarrow\textit{sort}(D,\textit{ascending})$ // an ordered list of the distances $N_{k}(x_{i})\leftarrow\textit{top}(L,k)$ in $Q$ // the nearest $k$ data samples of $x_{i}$ $c(x_{i})==c_{1}$ 1 // $x_{i}$ is deemed to be normal 0 // $x_{i}$ is deemed to be abnormal

3.4 Locality sensitive hashing

Locality sensitive hashing is another nearest neighbors-based algorithm for large amount of high-dimensional data. A distinct feature of this hashing algorithm is that it is sensitive to location. The basic idea of LSH is that two neighboring data samples in the original sample space are still neighboring in high probability after being hashed. As LSH possesses rapid indexing ability, it is suitable for the incremental indexing of dynamic dataset. The cost of index update is considered to be low.

For a given set $S$ , we consider two arbitrary elements $p,q\in S$ . By a family of hash functions $H=\left\{h:S\rightarrow U\right\}$ is $(r,c\cdot r,p_{1},p_{2})$ -sensitive, we mean that $\forall h\in H$ , the following two formulas hold:

$\displaystyle P(h(p)=h(q))\geqslant p_{1},\left\|p,q\right\|\leqslant r,$ (5) $\displaystyle P(h(p)=h(q))\leqslant p_{2},\left\|p,q\right\|>c\cdot r,$ (6)

where $c>1$ , $r>0$ , and $p_{1}>p_{2}$ . $\left\|\cdot\right\|$ is the distance between two elements. In specific, elements which are close to each other are mapped to the same hash bucket with a high probability. Contrarily, elements which are far away from each other are mapped to the same hash bucket with a low probability. As we employed the Euclidean distance in Section 3.3, a corresponding locality sensitive hash function can be formulated as

$\displaystyle\textit{hash}(x_{i})=\left\lfloor\frac{\left|x_{i}\cdot l+b\right% |}{a}\right\rfloor,$ (7)

where $x_{i}$ is an $m$ -dimensional data sample, $a$ is the width of the bucket, and $b$ is a random variable in $[0,a]$ . $l$ is an $m$ -dimensional vector, and $l\sim N^{m}(0,1)$ . In (7), it can be considered that $x_{i}$ is projected to a random line which is composed of several equal-length segments. The length of each segment is $a$ . These segments represent buckets.

The projection of all data samples in set $X$ results in a hash table which contains several buckets. Though there might be few misclassifications, data samples within the same bucket are considered to be neighboring. Thus, set $X$ is split to multiple mutually disjoint subsets. When set $X$ acts as a training set, the operation of finding neighboring elements in $X$ is transformed to finding neighboring elements in a specific subset of $X$ . Thus, both the computation cost and the processing time are reduced. This data preprocessing conducted based on the locality sensitive hashing facilitates subsequent operations performed by the k-nearest neighbors algorithm. As the locality sensitive hashing operation is likely to generate a large hash table which requires considerable indexing space, in our proposal, the hash table is generated in the cloud. Moreover, as new data are continuously collected by the edge nodes and transmitted to the cloud, periodic update of the hash table is also conducted in the cloud. In other words, the edge nodes are always using the latest version of the hash table which is sent to them by the cloud. Based on the above analysis, the detailed LSH algorithm is shown in Algorithm 3.4.

: LSH(Q,X)[1] $H\leftarrow\varnothing$ $B_{no}\leftarrow\varnothing$ $j=1$ $\left|Q\right|$ $h\leftarrow\textit{hash}(q_{j})$ $H\leftarrow H\cup\left\{h\right\}$ // hash values of data samples in the labeled training set $Q$ $h_{\max}=\mathop{\max}\limits_{\forall h\in H}\left\{h\right\}$ $j=0$ $\left|h_{\max}\right|$ $B_{j}\leftarrow\varnothing$ // possible buckets $j=1$ $\left|H\right|$ $\textbf{switch}∼{}(h_{j})$ // actual buckets $\textbf{case}∼{}0:B_{0}\leftarrow B_{0}\cup\left\{h_{j}\right\},\textbf{break}$ $\textbf{case}∼{}1:B_{1}\leftarrow B_{1}\cup\left\{h_{j}\right\},\textbf{break}$ $\ldots$ $\textbf{case}∼{}h_{\max}:B_{h_{\max}}\leftarrow B_{h_{\max}}\cup\left\{h_{j}% \right\},\textbf{break}$ $j=0$ $h_{\max}$ $B_{j}\neq\varnothing$ $B_{no}\leftarrow B_{no}\cup{j}$ // the numbers of actual buckets $i=1$ $\left|X\right|$ $j\leftarrow{hash(x_{i})}$ $({j}\cap B_{no})\neq\varnothing$ // there is a bucket whose number equals to the hash value of $x_{i}$ $Q^{\prime}\leftarrow Q_{B_{j}}$ // a new training set $Q^{\prime}$ whose elements are with hash values in bucket $B_{j}$ $k-NN(Q^{\prime},x_{i})$ $X_{n}\leftarrow X_{n}\cup\left\{x_{i}\right\}$ // $x_{i}$ is added to the normal set $X_{n}$ $X_{a}\leftarrow X_{a}\cup\left\{x_{i}\right\}$ // $x_{i}$ is added to the abnormal set $X_{a}$ // there is no bucket whose number equals to the hash value of $x_{i}$ // $x_{i}$ is deemed to be an anomaly $X_{a}\leftarrow X_{a}\cup\left\{x_{i}\right\}$

In Algorithm 3.4, the maximum value of all hashes in set $H$ is denoted as $h_{\max}$ . As set $H$ is in the form of $\left\{0,1,\ldots,h_{\max}\right\}$ , there are $h_{\max}+1$ buckets at most. The hashes belong to each bucket are denoted by sets $B_{0}$ , $B_{1}$ , $\ldots$ , and $B_{h_{\max}}$ , respectively. However, element(s) in set $\left\{0,1,\ldots,h_{\max}\right\}$ might be missing for an actual computation. Thus, set $B_{no}$ is introduced to accommodate the bucket numbers of buckets which really exist. If $\textit{hash}(x_{i})\notin B_{no}$ , there is no bucket for $\textit{hash}(x_{i})$ . In this case, $x_{i}$ is deemed to be an anomaly. If $\textit{hash}(x_{i})\in B_{no}$ , the corresponding elements of all hashes in $B_{\textit{hash}(x_{i})}$ are used as the input of $k-NN(\cdot)$ give by Algorithm 3.3. Thus, the replacement of set $Q$ by set $Q^{\prime}$ reduces the computation cost and the processing time. The anomaly detection of $x_{i}$ is subsequently performed by $k-NN(\cdot)$ . Finally, abnormal data and normal data are denoted as $X_{a}$ and $X_{n}$ , respectively.

3.5 Cosine similarity

Though the above combination of k-NN and locality sensitive hashing can efficiently accord an anomaly detection result, the performance of anomaly detection significantly relies on the parameter $k$ . To further ensure the accuracy of anomaly detection, we introduce cosine similarity for a second test. The basic idea of cosine similarity is measuring the cosine of the angle between two vectors. For data samples $x_{i}$ and $x_{j}$ , the cosine similarity between them can be calculated as

$\displaystyle\textit{sim}(x_{i},x_{j})=\cos(\theta)=\frac{x_{i}\cdot x_{j}}{% \left\|x_{i}\right\|\left\|x_{j}\right\|}=\frac{\sum_{d=1}^{m}(x_{id}\cdot x_{% jd})}{\sqrt[]{\sum_{d=1}^{m}(x_{id})^{2}}\cdot\sqrt[]{\sum_{d=1}^{m}(x_{jd})^{% 2}}},$ (8)

where $\theta$ is the angle between $x_{i}$ and $x_{j}$ . As the value of cosine is in $[-1,1]$ , a value which is close to 1 indicates that $x_{i}$ and $x_{j}$ are similar to each other, while a value which is close to -1 indicates that $x_{i}$ and $x_{j}$ are not similar to each other. The detailed cosine similarity algorithm is shown in Algorithm 3.5.

: $\textit{CS}(Q_{n},X_{n},X_{a},\alpha)$ [1] $j=1$ $\left|Q_{n}\right|$ $s\leftarrow\textit{sim}(q_{s},\alpha)$ $S\leftarrow S\cup\left\{s\right\}$ // cosine similarity values of normal data $s_{\min}\leftarrow\mathop{\min}\limits_{\forall s\in S}\left\{s\right\}$ // lower bound of normal cosine similarity values $s_{\max}\leftarrow\mathop{\max}\limits_{\forall s\in S}\left\{s\right\}$ // upper bound of normal cosine similarity values $j=1$ $\left|X_{a}\right|$ $s_{j}\leftarrow\textit{sim}(x_{j},\alpha)$ // cosine similarity values of possible anomaly $x_{j}$ given by LSH and k-NN $s_{j}\geqslant s_{\min}∼{}\&\&∼{}s_{j}\leqslant s_{\max}$ $X_{a}\leftarrow X_{a}\setminus\left\{x_{j}\right\}$ $X_{n}\leftarrow X_{n}\cup\left\{x_{j}\right\}$ // a cosine similarity value within the normal range indicates $x_{j}$ is normal $X_{a}$

In Algorithm 3.5, parameter $\alpha$ is an $m$ -dimensional vector $\left[1,1,\ldots,1\right]^{\mathrm{T}}$ . The normal data in the training set $Q$ (i.e., $Q_{n}$ ) are used to calculate a range of cosine values, namely $[s_{\min},s_{\max}]$ . For data sample $x_{j}\in X_{a}$ , if similarity $s_{j}$ is within the range of $[s_{\min},s_{\max}]$ , it is deemed to be normal. Otherwise, $x_{j}$ is deemed to be an anomaly. Finally, data in set $X_{a}$ are declared as abnormal.

4. Numerical results

4.1 Parameter setting and performance metrics

In our experiments, we employ the hardware and software settings used in one of our previous works [51]. Though the edge-cloud collaboration architecture proposed in [51] is aimed at detecting pattern anomaly of time series in wireless sensor networks, the experimental environment can be used for the evaluation of our model in this paper. In [51], experiments were focused on the performance analysis of the proposed edge-cloud collaboration scheme, where the number of edge nodes and cloud node are 15 and 1, respectively. In this paper, to facilitate the performance analysis of edge computing, we use 30 edge nodes and 1 cloud node. In specific, the 30 edge nodes are implemented on the MSP430 single chip computer equipped with the nRF905 wireless module. The MSP430 platform is able to work in 25 MHz and provide 100 KB RAM. The cloud node is an HP Z6 G4 workstation with 32 2.3 GHz cores and 32 GB RAM, and it runs a Debian Stretch 9.4.0 [52]. For the 30 edge nodes, the MSP430 single chip computers are installed with the open source operating system FreeRTOS [53]. For simplicity, the messaging protocols, message formats, and related requirements are based on the TCP/IP protocols. In specific, the communication between edge nodes and the cloud node is implemented based on the HTTP protocol. There is an Apache HTTP server [54] running on the cloud node.

Different values of parameter $k$ for the k-NN algorithm lead to different outputs. In general, a small $k$ is prone to cause overfitting, while a large $k$ is apt to make a loose boundary. To determine an appropriate value of $k$ , we have conducted extensive experiments by using the 10-fold cross validation. Without loss of generality, two famous datasets are used. One is the IBRL dataset [55] which we have used for the evaluation of the mobile edge-cloud collaboration outlier detection framework proposed in [42]. The other one is the KDD CUP 99 dataset [56]. As depicted in Fig. 4, experimental results show that $k=9$ yields the best accuracy of anomaly detection.

Figure 4.

Accuracy of anomaly detection for different $k$ .

The experimental results illustrated in Figs 5–8 correspond to the experiments conducted with the KDD CUP 99 dataset. The dataset has been widely used for evaluating the performance of anomaly detection algorithms. In our experiments, a pre-operational normalization for the dataset is performed to facilitate the comparison between data of different dimensions. The performance of our proposal is evaluated with three other algorithms: fixed-width clustering (FWC) [57], k-NN [58], and one-class SVM [59]. To facilitate the presentation, we denote our model by LKC. The width of the fixed-width clustering algorithm is set to $w=40$ . The parameter $k$ of the traditional k-NN algorithm is set to $k=1,000$ . For the one-class SVM algorithm, the expected ratio of anomalies and the width of radial basis function (RBF) are set to $v=1\%$ and $\sigma^{2}=12$ , respectively. The following six important performance metrics are considered: ROC, precision, recall, F1-score, accuracy, and delay. It should be noted that for the traditional k-NN algorithm, $k$ is set to $1,000$ . This is because without the operation of LSH, the whole set $Q$ is used by k-NN, rather than a small subset of $Q$ (i.e., $Q^{\prime}$ ).

4.2 Experimental results and analysis

For confidence level $\delta\in\left\{80\%,85\%,90\%,95\%\right\}$ , the true positive rate (TPR), false positive rate (FPR), and area under curve (AUC) of the above four algorithms are shown in Table 2. With the same confidence level, the performances of one-class SVM and LKC are similar to each other. However, LKC is superior to one-class SVM in terms of the FPR. When $\delta=95\%$ , the TPR of one-class SVM decreases sharply. The TPR of the traditional k-NN drops off by 68.08% when $\delta$ increases from 80% to 85%. Though the performance of FWC is superior to those of one-class SVM and the traditional k-NN, the performance of LKC is superior to FWC in terms of the TPR. In addition, the ROC curves of the four algorithms are depicted in Fig. 5 based on the results in Table 2. When the confidence level $\delta=95\%$ , for the AUC, LKC is superior to the other three algorithms and the performance of the traditional k-NN is the worst one.

Table 2
TPR, FPR, and AUC with different confidence levels

Algorithm	Confidence level ( $\%$ )	TPR ( $\%$ )	FPR ( $\%$ )	Variance	AUC
FWC	80	92.78	10.08	17.07	–
	85	65.29	1.87	8.49	–
	90	46.83	1.09	4.31	–
	95	27.98	0.45	1.51	0.9384
k-NN	80	91.23	7.87	16.60	–
	85	23.15	6.45	10.65	–
	90	11.08	4.23	2.44	–
	95	5.33	1.98	0.53	0.8946
One-class SVM	80	98.32	10.09	19.20	–
	85	91.26	5.69	16.61	–
	90	66.83	3.94	8.82	–
	95	5.12	3.66	0.51	0.9481
LKC	80	99.88	1.65	19.78	–
	85	98.76	1.48	9.36	–
	90	97.53	1.15	5.92	–
	95	89.68	0.96	1.96	0.9886

Figure 5.

ROC curves of the four algorithms.

The precision, recall, and F1-score of the four algorithms are shown in Fig. 6. The overall performance of one-class SVM is the worst one. The performance of FWC is good in terms of precision, but it performs poorly in terms of both recall and F1-score. Though the performance of the traditional k-NN is similar to LKC in terms of the above three metrics, LKC is still the best one. In addition, the traditional k-NN possesses large delay, which is shown in Fig. 8 later in this section.

Figure 6.

Precision, recall, and F1-score of the four algorithms.

Figure 7.

Delay of anomaly detection for LKC and FWC in pure edge computing and cloud computing.

Moreover, LKC and FWC are further investigated in terms of delay. Experiments are conducted for $10\%$ , $20\%$ , $30\%$ , $40\%$ , and $50\%$ of data in the dataset, respectively. We adopt the average values to obtain a more general result. The results shown in Fig. 7 are based on the average values of 20 individual experiments. On the whole, the values of delay for both LKC and FWC monotonically increase with the increase of data volume. For both edge computing and cloud computing, the delays of LKC are smaller than that of FWC. In addition, when $10\%$ data are used, the delays of LKC under edge computing and cloud computing are 75 ms and 327 ms, respectively. Compared with cloud computing, edge computing achieves an improvement of about $76.9\%$ . Similarly, when $50\%$ data are used, edge computing possesses an improvement of about $65.9\%$ . Thus, it is considered that LKC is superior to FWC in terms of delay. In addition, with the increase of data volume, edge computing shows a milder increase in delay than that of cloud computing.

For Fig. 7, the corresponding experiments are conducted for pure edge computing and cloud computing. In other words, the operations of both LKC and FWC are performed either on edge nodes or in the cloud. In our proposal, hash tables are constructed in the cloud. Thus, further experiments of the four algorithms are conducted based on the collaboration of edge nodes and cloud. As shown in Fig. 8, when $10\%$ data are used, LKC shortens the delay by $48.0\%$ to $65.8\%$ compared to the other three algorithms. When $10\%$ , $20\%$ , and $30\%$ data are used, the values of delay are 13 ms, 27 ms, and 41 ms, respectively. All of the three values are less than 50 ms. Thus, it can be considered that LKC is able to meet the real-time requirements of a smart factory in most cases.

Figure 8.

Delay of anomaly detection for the four algorithms in collaboration of edge computing and cloud computing.

The experimental results illustrated in Fig. 9 correspond to the experiments conducted with the IBRL dataset. The dataset contains sensor data collected from 54 sensors deployed in the Intel Berkeley Research Lab, including humidity, temperature, light, and voltage values. Like the KDD CUP 99 dataset, the IBRL dataset is also an important dataset in the field of anomaly detection. To further investigate the performance of our proposal, we consider another three recent models (KNN-DK [60], Deep k-NN [61], and kNN-TSAD [62]) which are based on the idea of k-NN. In [60], a modified k-NN classifier which uses dynamic $k$ is proposed. The proposal takes the class imbalance in a dataset into account. In [61], a deep k-NN defense scheme against clean-label data poisoning attacks is proposed. The implementation of the scheme considers real-world datasets with class imbalance. In addition, the authors also present a simple guideline for the selection of $k$ . In [62], a fast kNN-based approach for time sensitive anomaly detection is developed. The main idea of this model is that sliding window and locality sensitive hashing are combined to observe the data distribution as well as the value of $k$ .

Figure 9.

Energy consumption and accuracy for different $k$ .

As shown in Fig. 9a, the values of the average energy consumption are normalized. The baseline of the normalization is the genuine average energy consumption of edge nodes for LKC when $k=4$ . It can be observed that the overall trends of the four curves are monotonically increasing. With the increase of $k$ , the number of the nearest neighbors involved increases. Consequently, the computation cost increases. Thus, the average energy consumption increases. Moreover, when $k$ is large (e.g., $k>8$ ), the trend of increase is sharper than that of a small $k$ (e.g., $k<8$ ).

As shown in Fig. 9b, the overall trends of the four curves are similar to each other. When $k$ is small (e.g., $k<8$ ), the values of accuracy monotonically increase with the increase of $k$ . Beyond a certain inflection point (e.g., $k=8,9,10$ ), the values of accuracy monotonically decrease with the increase of $k$ . It can be observed that for different algorithms, the values of accuracy peak at different values of $k$ . On the left side of an inflection point, a large $k$ corresponds to a better accuracy than that of a small $k$ . In this case, it can be considered that when more “the nearest neighbors” are involved, the result of anomaly detection is more reliable. On the contrary, on the right side of an inflection point, a large $k$ corresponds to a worse accuracy than that of a small $k$ . In this case, it can be considered that when more “the nearest neighbors” are involved, excessive data samples introduce more distraction. Thus, the result of anomaly detection becomes less reliable.

5. Conclusions and future work

This paper proposed an improved k-NN anomaly detection method based on locality sensitive hashing. The anomaly detection is conducted on edge nodes. The data preprocessing performed by locality sensitive hash significantly reduces the amount of data which is handled by k-NN. This also speeds up the execution of k-NN. In addition, the adoption of edge computing reduces the delay of anomaly detection and the bandwidth consumption of data transmission. Moreover, the second test which is performed based on cosine similarity further improves the accuracy of anomaly detection. Compared to six existing methods, our proposal possesses higher accuracy, lower latency, and less energy consumption.

However, there is still room for improvement. Multi-source heterogeneous data generated in a smart factory often possess spatial-temporal correlativity. In the future, the optimization of our proposal should incorporate the analysis of spatial-temporal correlativity. In addition, though our proposal works well for univariate and multivariate datasets, the situation becomes a little complicated for a multi-modal dataset. As a multi-modal dataset often possesses more than one mode of distribution/anomaly, a normal data sample in one mode may become an anomaly in another mode. Similarly, an abnormal data sample in one mode may become normal in another mode. In our proposal, the LSH algorithm is able to classify training data into different buckets based on the hash values of the data. The subsequent k-NN algorithm works within an individual bucket. Thus, we hold that if the modes contained in a dataset can be clearly separated, the result obtained by the LSH algorithm handles the “multi-modal” feature gracefully. In this case, such a multi-modal dataset is not a problem in our proposal. On the contrary, if the modes contained in a dataset cannot be distinctly separated, the effectiveness of the LSH algorithm may be observably compromised, as well as the subsequent k-NN algorithm. In this case, such a multi-modal dataset can be considered as unmanageable for our proposal. Thus, an extra component which handles the multi-modal feature of a dataset should be developed for our proposal in the future.

Footnotes

Acknowledgments

The authors would like to thank the anonymous reviewers whose comments and suggestions greatly helped them improve the quality and presentation of this article. Cong Gao wants to thank his beloved mother, Miling Shen and family for their endless support and encouragement.

This work is partly supported by the Scientific Research Program of the Science and Technology Department of Shaanxi Province, China (2023-YBGY-211), the Scientific Research Program of Shaanxi Provincial Education Department, China (21JP115), the Scientific Research Program of the Science and Technology Bureau of Xi’an, China (22GXFW0129), the Scientific Research Program of the Science and Technology Bureau of Yulin, China (CXY-2022-162), and the Key Research and Development Programs of the Science and Technology Department of Shaanxi Province, China (2021ZDLGY06-03).

References

Chatfield

, The analysis of time series: Theory and practice, Springer, 2013.

Ren

Liu

and Pedrycz

, A piecewise aggregate pattern representation approach for anomaly detection in time series, Knowledge-Based Systems 135 (2017), 29–39.

Cauteruccio

Fortino

Guerrieri

Liotta

Mocanu

D.C.

Perra

Terracina

and Vega

M.T.

, Short-long term anomaly detection in wireless sensor networks based on machine learning and multi-parameterized edit distance, Information Fusion 52 (2019), 13–30.

Ren

Wang

Huang

Kou

Xing

Yang

Tong

and Zhang

, Time-series anomaly detection service at Microsoft, In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, 2019, pp. 3009–3017.

Pedrycz

and Jamal

, Multivariate time series anomaly detection: A framework of Hidden Markov Models, Applied Soft Computing 60 (2017), 229–240.

Dehghan

Beigy

and ZareMoodi

, A novel concept drift detection method in data streams using ensemble classifiers, Intelligent Data Analysis 20(6) (2016), 1329–1350.

Shi

Cao

Zhang

and Xu

, Edge computing: Vision and challenges, IEEE Internet of Things Journal 3(5) (2016), 637–646.

Gao

Chen

Zhang

Chen

Wang

and Yang

, Edge computing: Development and challenges, Journal of Xi’an University of Posts and Telecommunications 26(4) (2021), 7–19.

Mehnaz

and Bertino

, Privacy-preserving real-time anomaly detection using edge computing, In Proceedings of 2020 IEEE 36th International Conference on Data Engineering (ICDE), IEEE, 2020, pp. 469–480.

10.

Zhou

Wen

and Yang

, Fault isolation based on k-nearest neighbor rule for industrial processes, IEEE Transactions on Industrial Electronics 63(4) (2016), 2578–2586.

11.

Ozkan

and Kozat

S.S.

, Online anomaly detection under markov statistics with controllable type-I error, IEEE Transactions on Signal Processing 64(6) (2015), 1435–1445.

12.

Wang

Tang

Park

and Priebe

C.E.

, Locality statistics for anomaly detection in time series of graphs, IEEE Transactions on Signal Processing 62(3) (2013), 703–717.

13.

Goldstein

and Dengel

, Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm, KI-2012: Poster and Demo Track, 2012, pp. 59–63.

14.

Rousseeuw

P.J.

and Hubert

, Anomaly detection by robust statistics, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(2) (2018), e1236.

15.

Zong

Song

Min

M.R.

Cheng

Lumezanu

Cho

and Chen

, Deep autoencoding gaussian mixture model for unsupervised anomaly detection, In Proceedings of 2018 6th International Conference on Learning Representations (ICLR), OpenReview, 2018, pp. 1–19.

16.

Zhang

Ramakrishnan

and Livny

, BIRCH: An efficient data clustering method for very large databases, ACM Sigmod Record 25(2) (1996), 103–114.

17.

Guha

Rastogi

and Shim

, CURE: An efficient clustering algorithm for large databased, ACM Sigmod Record 27(2) (1998), 73–84.

18.

Bryksin

Petukhov

Smirenko

and Povarov

, Detecting anomalies in Kotlin code, In 2018 Companion Proceedings for the ISSTA/ECOOP 2018 Workshops, ACM, 2018, pp. 10–12.

19.

Shi

Zhao

Zhong

Shen

and Ding

, An improved agglomerative hierarchical clustering anomaly detection method for scientific data, Concurrency and Computation: Practice and Experience 33(6) (2021), e6077.

20.

Ding

Wang

and Li

, Anomaly detection in large-scale trajectories using hybrid grid-based hierarchical clustering, International Journal of Robotics and Automation 33(5) (2018), 474–480.

21.

Hartigan

J.A.

and Wong

M.A.

, Algorithm AS 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series c (Applied Statistics) 28(1) (1979), 100–108.

22.

Lei

Zhu

Chen

Lin

and Yang

, Automatic PAM clustering algorithm for outlier detection, Journal of Software 7(5) (2012), 1045–1051.

23.

Mazarbhuiya

F.A.

AlZahrani

M.Y.

and Georgieva

, Anomaly detection using agglomerative hierarchical clustering algorithm, In Proceedings of 2018 International Conference on Information Science and Applications (ICISA), Springer, 2018, pp. 475–484.

24.

Nielsen

and Johansen

, Asymptotic theory of outlier detection algorithms for linear time series regression models: Rejoinder, Scandinavian Journal of Statistics: Theory and Applications 43(2) (2016).

25.

Vlachos

and Castelli

, On periodicity detection and structural periodic similarity, In Proceedings of the 2005 SIAM International Conference on Data Mining (ICDM), SIAM, 2005, pp. 449–460.

26.

Arya

Mount

D.M.

Netanyahu

N.S.

Silverman

and Wu

A.Y.

, An optimal algorithm for approximate nearest neighbor searching fixed dimensions, Journal of the ACM (JACM) 45(6) (1998), 891–923.

27.

Chen

Zhang

and Wu

, Incremental k-NN SVM method in intrusion detection, In Proceedings of 2017 IEEE 8th International Conference on Software Engineering and Service Science (ICSESS), IEEE, 2017, pp. 712–717.

28.

Ying

Wang

Zhao

Shang

Huang

Cheng

Yang

and Geng

, An improved KNN-based efficient log anomaly detection method with automatically labeled samples, ACM Transactions on Knowledge Discovery from Data (TKDD) 15(3) (2021), 1–22.

29.

and Zhang

, A discriminative metric learning based anomaly detection method, IEEE Transactions on Geoscience and Remote Sensing 52(11) (2014), 6844–6857.

30.

Breunig

M.M.

Kriegel

H.-P.

R.T.

and Sander

, LOF: Identifying density-based local outliers, In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (ICMD), ACM, 2000, pp. 93–104.

31.

Zhang

Lin

and Karim

, Adaptive kernel density-based anomaly detection for nonlinear systems, Knowledge-Based Systems 139 (2018), 50–63.

32.

Tang

Chen

A.W.-C.

and Cheung

D.W.

, Enhancing effectiveness of outlier detections for low density patterns, In Proceedings of the 2002 Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2002, pp. 535–548.

33.

Wang

and Du

, Spatio-temporal anomaly detection in traffic data, In Proceedings of 2nd International Symposium on Computer Science and Intelligent Control (ISCSIC), 2018, pp. 1–5.

34.

Yang

Zhou

Shu

and Zhang

, A fast and efficient local outlier detection in data streams, In Proceedings of 2019 International Conference on Image, Video and Signal Processing (IVSP), ACM, 2019, pp. 111–116.

35.

Eiras-Franco

Martinez-Rego

Guijarro-Berdinas

Alonso-Betanzos

and Bahamonde

, Large scale anomaly detection in mixed numerical and categorical input spaces, Information Sciences 487 (2019), 115–127.

36.

Kwon

Natarajan

Suh

S.C.

Kim

and Kim

, An empirical study on network anomaly detection using convolutional neural networks, In Proceedings of 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), IEEE, 2018, pp. 1595–1598.

37.

Naseer

Saleem

Khalid

Bashir

M.K.

Han

Iqbal

M.M.

and Han

, Enhanced network anomaly detection based on deep neural networks, IEEE Access 6 (2018), 48231–48246.

38.

Teoh

Chiew

Franco

E.J.

Benjamin

and Goh

, Anomaly detection in cyber security attacks on networks using MLP deep learning, In Proceedings of 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), IEEE, 2018, pp. 1–5.

39.

Tran

K.P.

Huong

T.T.

et al., Data driven hyperparameter optimization of one-class support vector machines for anomaly detection in wireless sensor networks, In Proceedings of 2017 IEEE International Conference on Advanced Technologies for Communications (ATC), IEEE, 2017, pp. 6–10.

40.

Erfani

S.M.

Rajasegarar

Karunasekera

and Leckie

, High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning, Pattern Recognition 58 (2016), 121–134.

41.

Turkoz

Kim

Son

Jeong

M.K.

and Elsayed

E.A.

, Generalized support vector data description for anomaly detection, Pattern Recognition 100 (2020), 107119.

42.

Gao

Song

Wang

and Chen

, A mobile edge-cloud collaboration outlier detection framework in wireless sensor networks, IET Communications 15(15) (2021), 2007–2020.

43.

Wang

Song

and Gao

, An isolation-based distributed outlier detection framework using nearest neighbor ensembles for wireless sensor networks, IEEE Access 7 (2019), 96319–96333.

44.

Liu

F.T.

Ting

K.M.

and Zhou

, Isolation forest, In Proceedings of 2008 IEEE 8th International Conference on Data Mining (ICDM), IEEE, 2008, pp. 413–422.

45.

and Tian

, A comprehensive survey of clustering algorithms, Annals of Data Science 2(2) (2015), 165–193.

46.

Šulc

and Řezanková

, Comparison of similarity measures for categorical data in hierarchical clustering, Journal of Classification 36(1) (2019), 58–72.

47.

Taha

A.A.

and Hanbury

, An efficient algorithm for calculating the exact Hausdorff distance, IEEE Transactions on Pattern Analysis and Machine Intelligence 37(11) (2015), 2153–2163.

48.

Yin

and Zhang

, Parallel implementing improved k-means applied for image retrieval and anomaly detection, Multimedia Tools and Applications 76(16) (2017), 16911–16927.

49.

Salehi

Leckie

Bezdek

J.C.

Vaithianathan

and Zhang

, Fast memory efficient local outlier detection in data streams, IEEE Transactions on Knowledge and Data Engineering 28(12) (2016), 3246–3260.

50.

Datar

Immorlica

Indyk

and Mirrokni

V.S.

, Locality-sensitive hashing scheme based on p-stable distributions, In Proceedings of the twentieth annual Symposium on Computational Geometry (SCG), ACM, 2004, pp. 253–262.

51.

Gao

Yang

Chen

Wang

and Wang

, An edge-cloud collaboration architecture for pattern anomaly detection of time series in wireless sensor networks, Complex & Intelligent Systems 7 (2021), 2453–2468.

52.

Debian – The Universal Operating System. https://www.debian.org/, [Online; accessed 20-Nov-2021].

53.

FreeRTOS – Market leading RTOS (Real Time Operating System) for embedded systems with Internet of Things extensions, https://www.freertos.org/, [Online; accessed 20-Nov-2021].

54.

The Apache HTTP Server Project. https://httpd.apache.org/, [Online; accessed 20-Nov-2021].

55.

Madden

, Intel lab data. online dataset, http://db.csail.mit.edu/labdata/labdata.html, 2004. [Online; accessed 20-Nov-2021].

56.

Tavallaee

Bagheri

and Ghorbani

A.A.

, A detailed analysis of the KDD CUP 99 data set, In Proceedings of 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), IEEE, 2009, pp. 1–6.

57.

Rajasegarar

Leckie

and Palaniswami

, Hyperspherical cluster based distributed anomaly detection in wireless sensor networks, Journal of Parallel and Distributed Computing 74(1) (2014), 1833–1847.

58.

Foroushani

Z.A.

and Li

, Intrusion detection system by using hybrid algorithm of data mining technique, In Proceedings of 2018 ACM 7th International Conference on Software and Computer Applications (ICSCA), ACM, 2018, pp. 119–123.

59.

Zhang

and Gong

, An anomaly detection model based on one-class SVM to detect network intrusions, In Proceedings of 2015 IEEE 11th International Conference on Mobile Ad-hoc and Sensor Networks (MSN), IEEE, 2015, pp. 102–107.

60.

Hoque

Bhattacharyya

D.K.

and Kalita

J.K.

, KNN-DK: A modified k-NN classifier with dynamic k nearest neighbors, In Advances in Applications of Data-Driven Computing, Springer, 2021, pp. 21–34.

61.

Peri

Gupta

Huang

W.R.

Fowl

Zhu

Feizi

Goldstein

and Dickerson

J.P.

, Deep k-NN defense against clean-label data poisoning attacks, In Proceedings of European Conference on Computer Vision (ECCV), Springer, 2020, pp. 55–70.

62.

Zhao

Wang

Hou

and Huang

, A fast kNN-based approach for time sensitive anomaly detection over data streams, In Proceedings of International Conference on Computational Science (ICCS), Springer, 2019, pp. 59–74.

An improved k-NN anomaly detection framework based on locality sensitive hashing for edge computing environment

Abstract

Keywords

1. Introduction

2.1 Statistical-based methods

2.2 Clustering-based methods

2.2.1 Hierarchical clustering-based methods

2.2.2 Partitional clustering-based methods

2.3 Distance-based methods

2.4 Density-based methods

2.5 Classification-based methods

2.5.1 Neural network-based methods

2.5.2 SVM-based methods

2.5.3 Isolation-based methods

3. An improved k-NN anomaly detection framework based on LSH

3.1 Basic model

4.1 Parameter setting and performance metrics

Table 2 TPR, FPR, and AUC with different confidence levels

Footnotes

Acknowledgments

References

Table 2
TPR, FPR, and AUC with different confidence levels