Improving data classification accuracy in sensor networks using hybrid outlier detection in HAR

Abstract

Managing and Mining mobile sensor data has become a topic of advanced research in several fields of computer science, such as the distributed systems, the database systems, and data mining. The main objective of the sensor based applications is to make the real-time decision which has been proved to be very challenging due to the high resource-constrained computing and the enormous volume of sensor data generated by Wireless Sensor Networks (WSNs). This challenge motivates the sensor research community to explore new data mining techniques to extract information from large continuous raw data streams obtained from WSNs. Existing traditional data mining methods are not directly suited to WSNs due to the aggressive nature of sensor data and the presence of anomalies or outliers in WSNs. This work provides an overview of how traditional outlier detection method algorithms are revised and implemented in the application of Human Activity Recognition (HAR). Based on the limitations of the existing technique, a hybrid outlier detection method is proposed.

Keywords

Classification data mining human activity outlier detection sensor data

1 Introduction

Real-time applications of Wireless Sensor Networks (WSN) yield a vast amount of dynamic raw sensor data. This raw data can be utilized in many applications if transformed into usable information through data classification. However, WSN has challenging issues like limited power supply, random deployment, lossy communication environment, high failure rate...etc. These challenging issues make traditional data mining techniques unsuitable because existing data mining techniques are centralized and computationally expensive [4 , 22]. Therefore, new algorithms have been developed and a few of the existing data mining algorithms have been modified to handle the data generated from sensor networks. This proposed work is focused on outlier detection and sensor data classification. Classification in sensor data is identifying the classes of events or observations on the basis of a training dataset [28 , 38]. Classification is a popular issue in the field of information retrieval because it can be applied to a number of research areas, such as data classification, pattern recognition, image classification, text classification, web page classification etc. Still, almost all of these classification techniques are susceptible to outliers and endure a loss in efficiency due to the presence of outliers or anomalies in the dataset. Existing research works has proved that a dataset with less noise or outliers ensures improved classification results when compared to a dataset with outliers. An outlier in sensor networks is defined as an observation or a data point that is considerably dissimilar to or inconsistent with the remainder of the sensor data [2 , 28]. Wireless sensor networks may include a huge number of sensors, the possibility of error is more than other traditional networks. These issues make WSN more liable to anomalies. Outlier points should be detected and removed as it intensely affects the sensor data analysis [21 , 49]. Detecting and removing the outlier data points increases the standard of measured information and improves the robustness of the sensor data analysis under the presence of noise and defective sensors so that the communication overhead of unwanted data is diminished. In many applications anomalies are more attractive than normal data [14]. A very standard case where the attention gets focused on the anomalies or outliers instead of normal data or inliers is fraud detection in online transactions because it is more likely to detect the fraud details. With a relatively large number of features or attributes present in a dataset, it becomes a very difficult task to identify and remove the outliers present in the entire data set.

Now a day’s mobile sensor technologies are gaining interest in different research domains due to the use of significantly miniaturized electronic components, with low power consumption, which makes them ideal for applications in Human Activity Recognition (HAR) for both indoor and outdoor environments [1 , 15]. These applications allow users to achieve a natural execution of any physical activity, while providing good results in practical applications, such as health rehabilitation, respiratory and muscular activity assessment, sports and safety applications...etc [23 , 50]. However, in real time environments, the collected sensor data are affected by several factors related to sensor data alignment, data losses, and noise among other experimental constraints, affecting data quality and model accuracy. Nowadays mobile phones are equipped with an assortment of sensors like accelerometers, gyroscope and magnetic field sensors. In recent years many researchers started using smart mobile phones for human activity recognition [8 , 24]. The Human Activity Recognition problem is computationally difficult because, unlike normal activities, the outlier data are extremely inadequate. Therefore, it is a challenging task to design an outlier detection and removal model that can improve the performance of an activity recognition system. Due to the aggressive and resource constraints nature of sensor networks the existing outlier detection and removal methods cannot be directly applied to the HAR model. The main intention of this proposed work is to develop an outlier detection method for sensor data for human activity recognition.

This article is organized as follows: section 2 gives related works, section 3 describes about motivations and contributions, section 4 explains about proposed methodology, section 5 formalizes to experimental setup for HAR, section 6 explains about experimental analysis, section 7 depicts about results and discussions and finally, the paper ends up with a conclusion and future research directions.

2 Related works

This section explains various outlier detection techniques based on the methodology for WSNs. In statistical based outlier detection methods, based on a distance measure a data point is declared as an outlier if the probability of the data instance to be generated by the outlier model is very low [39, 51]. These techniques usually pose low communication and less computational complexity as they declare the most distant points as outliers based on data distribution. In nearest neighbor based outlier detection techniques, a data point is distinguished as an outlier when it is located far from its neighbors. These methods require the euclidean distance measure between each pair of data which prompts to most prominent computational complexity [38]. In clustering based outlier detection algorithms, data points are grouped to form clusters and the distance measure is utilized to identify outliers [7 , 43]. In classification based outlier detection, support vector machine (SVM) and Bayesian network classification model are used [29, 37]. The computational complexity of cluster-based and classification based techniques strategies are more notable than statistical based techniques.

Human Activity Recognition is an emerging and challenging area of research [1 , 40]. Many machine learning based studies exist on activity recognition that uses accelerometers [8 , 29]. Much of the existing studies in activity recognition focused on major activities, such as sitting, standing, sleeping, running, walking and jumping [5 , 50]. In HAR literature, studies were done both using a single sensor and a combination of multiple sensors. Few works proposed a physical activity monitoring system based on multiple sensors. These systems are able to recognize sitting, standing and lying body postures, as well as periods of walking with the aim of monitoring elderly people in their daily lives [16, 20], showed that use of multiple sensors is not helpful for significantly improving recognition for typical physical activities. Many works exist for outlier detection in the sensor network [13 , 51] but further research is still needed, particularly in activity recognition.

Most existing HAR systems make the focus on human activities detection without considering data outliers. Though many machine learning based outlier detection techniques were proposed for various applications in WSNs, studies on outlier detection for human activity recognition in WSNs is rare. Based on the literature reviewed, the focus of this work is on outlier detection in wireless sensor networks for human activity recognition. An outlier detection method is proposed and evaluated on two datasets collected in various environments. The effect of outlier detection is empirically evaluated using standard classification methods.

3 Motivations and contributions

Real-world applications such as smart home monitoring, patient monitoring, sports activities and so forth have increased the accessibility of sensors and raised interest in Human Activity Recognition (HAR). Particularly, the prevalence of mobile devices, such as the smartphone with sensors, can offer advanced capabilities to recognize human activities in wireless sensor networks. However, the raw sensor data collected from the wireless source are quite often inaccurate and unreliable due to the design issues of sensors devices in terms of lifetime, energy, memory, bandwidth, computational power, dynamic nature of the network, and harshness of the node deployment environment. These inaccuracies are referred to as anomalies or outliers. A solution to ensure the eminence of mobile sensor data is detected and removes outlier data. Outlier detection is a particularly important task in various applications of wireless sensor networks [37 , 42]. Recent researchers contribute to various outlier detection methods for sensor data. Applying traditional outlier detection methods on sensor data experience the ill effects of a high false positive rate, especially, when the collected sensor information is biased toward normal data while the anomalous events are rare. It is essential for a data analyst to detect outlier data in order to take proper actions. Human Activity Recognition is one of the application areas where more abnormal activities are present. This motivates us to explore outlier detection methods for HAR.

The basic idea of the proposed outlier detection method is to combine the advantages of two or more standard existing outlier detection methods. Authors in [11] proposed hybrid outlier method named FUCOD ((FUzzy Combination of Outlier Detection techniques) which combines distance based, density based, cluster based and distribution based outlier methods. Another hybrid approach named Pruning-based K-Nearest Neighbor (PB-KNN) was proposed by Yan. et al. which integrates the density-based, cluster based method and KNN algorithm to conduct effective outlier detection [45]. Two more hybrid combinations of density-based outlier detection method LOF with the clustering k-Means and the density based LOF, with the density based DBS can clustering results was proposed [46]. Aggarwal et al. in [2] in discussed outlier ensemble methods can solve the bias-variance problem and discuss many outlier combinations or aggregation methods. The above mentioned existing methods gave an idea to remove outliers using hybrid methods [2 , 46]. These combination outlier detection methods can be applied to the classification model as well as to compare the performance of the classification model with outlier removed data set and without outlier removed dataset.

We proposed (Nivetha, Venkatalakshmi; 2018) in [31] a proficient outlier detection and removal algorithm, Hybrid Outlier Detection (HOD) specifically useful in supervised classification problems. This hybrid model combines the advantages of two or more standard distance and density based outlier detection techniques. This hybrid outlier detection method is proposed to detect and remove anomalies or outliers in mobile sensor data with enhanced classification accuracy. In our existing work, we classified human activities, such as walking, jogging, standing, sitting, walking upstairs and down stairs. The experimental analysis for classification is carried on two different activity recognition datasets collected from various environments. The HOD method was implemented and tested with the Support Vector Machine (SVM) classifier and Radial Basis Function Neural Networks (RBFN) classifier. The performance of the SVM classifier is superior to RBFN classifier [31]. In order to further enchance the performance of HOD, an improved HOD (IHOD) method is proposed with SVM classifier.

The foremost focus of this proposed work is to evaluate various individual outlier detection methods for online and offline human activity recognition data over WSNs.

4 Proposed methodology

In this section, the proposed approach is explained in detail. Many recent works were developed on machine learning based activity recognition that is interested only in activity and did not care about outliers [20 , 48]. In this proposed work, a variant of hybrid outlier detection approach is designed to detect outliers. The methodology consists of the following five steps

Data collection: Two different datasets DS1 and DS2 are used for analysis. These two datasets differ in their properties such as a number of activities, instances, features, and environment used for data collection.

Feature extraction: Raw continuous time-series data from the mobile accelerometer cannot be directly applied to the classification model. Therefore the collected data are subjected to a feature extraction process. The transformed dataset represented using feature extraction is used for further processing.

Outlier Detection and Removal: The density based, distance based, cluster-based and the proposed improved Hybrid Outlier Detection methods are applied to both datasets DS1 and DS2.

Classification: The extracted feature set is processed to Support Vector Machine (SVM) to classify the activity labels

Evaluation: The various evaluation metrics are computed to show the effectiveness of the outlier removal in wireless sensor data for human activity recognition.

4.1 Hybrid outlier detection (proposed)

In our proposed hybrid approach, the dataset (DS) consists of activity samples which represent the activities performed where ‘x_i’ represents the accelerometer readings and ‘y_i’ represents the activity label for sample ‘i’. The accelerometer values in the dataset are transformed into feature set ‘S’. The input datasets are subjected to standard outlier detection methods (Oj). For the j^th hybrid outlier detection method employed, the outlier score for each activity classes are calculated. In general, M-outlier score can be obtained from ‘K’ different outlier detection methods. In the second phase, the outlier scores stored in the vectors A₀, A₁...A_M are averaged and sorted in decreasing order. The outliers are identified and removed using the standard Inter Quartile Range (IQR) method [44] with the following steps,

Step1. Find the first quartile (FQ) and the third quartile (TQ) of the averaged outlier scores

Step 2. Find the Inter Quartile Range (IQR) of outlier scores.

Step 3. Find LI = FQ - 1.5*IQR and HI = TQ + 1.5*IQR

Step 4.The outlier score values outside the interval (LI,HI) are declared as outliers.

The pseudo code of the proposed improved Hybrid outlier detection method is shown in Fig. 1.

Fig.1

Pseudo code of improved Hybrid Outlier Detection (IHOD).

5 Experimental setup for activity recognition

The data set for outlier detection are collected and preprocessed for classification. The steps involved in data collection and preparation are described in the following section. To improve and establish a comparative baseline for hybrid outlier methods, we decided to use the same two datasets DS1 and DS2 from our existing work [31]. The DS1 is the offline HAR dataset available in UCI machine learning repository. The DS2 is constructed in a real time environment using MATLAB MOBILE package. MATLAB MOBILE package can obtain data from built-in sensors from the android device.

5.1 Feature extraction

Standard classification techniques cannot be applied directly to the accelerometer data. Instead, the raw time series data are transformed into examples. The data in both data sets DS1 and DS2 are divided into ten - second segments and features were generated based on the two hundred readings contained within each ten - second segments. Next informative features were generated based on the two hundred raw accelerometer readings, where each reading contains x, y, and z the value corresponding to the three dimensions. A total of forty-six summary features were generated, though these are all variants of just six basics features.

The description of the datasets DS1 and DS2 used for outlier detection and activity classification obtained after feature extraction is given in Table 2. The features obtained for DS1 and DS2 are described in Table 2. After feature extraction process feature selection techniques or ensemble of feature selection methods can also be employed to select the most relevant attributes or classes [35, 52].

Table 1
Feature Extraction of HAR datasets

Feature No. of. features Description

Unique_id 1 Unique identifier

User 1 User Id

X0 ... X9, Y0 ... Y9, Z0 ... Z9 30 The fraction of accelerometer samples

Xavg, Yavg, Zavg 3 Average x, y and z values

Xpeak, Ypeak, Zpeak 3 Dominant frequency

Xabsoldev, Yabsoldev, Zabsoldev 3 Average absolute deviations

Xstanddev, Ystanddev, Zstanddev 3 Standard deviations

Resultant 1 Average of the square roots of the sum of the values

Feature	No. of. features	Description
Unique_id	1	Unique identifier
User	1	User Id
X0 ... X9, Y0 ... Y9, Z0 ... Z9	30	The fraction of accelerometer samples
Xavg, Yavg, Zavg	3	Average x, y and z values
Xpeak, Ypeak, Zpeak	3	Dominant frequency
Xabsoldev, Yabsoldev, Zabsoldev	3	Average absolute deviations
Xstanddev, Ystanddev, Zstanddev	3	Standard deviations
Resultant	1	Average of the square roots of the sum of the values

Table 2

Description of datasets

Properties	Dataset-1 (DS1)	Dataset-2 (DS2)
No. of. instances	5418	2360
No. of. attributes	46	46
No. of. class labels	6	3
Type of class attributes	Polynomial	Polynomial
Nature of attributes	Real	Real
Class attributes	Standing, Sitting, Laying, Walking, Moving upstairs and Moving Downstairs	Walking, Sitting and Standing
Dataset size	5418×46	2360×46

6 Experimental analysis

The main goal of this experimental analysis is to compare the performance of the proposed hybrid outlier detection methods with existing methods for human activity recognition. Three experimental analyses were conducted with datasets DS1 and DS2. The first analysis aimed at detecting the outliers using three standard outlier detection methods such as distance-based, density based and cluster based K-means methods. The second experimental analysis attempted to detect the outliers using the hybrid outlier detection method. In order to have a deeper level of analysis for the proposed hybrid method, two different hybrid methods are employed. The first method is IHOD1, obtained by combining the distance based and density based outlier detection methods. In IHOD1, distance based KNN- OD is employed first and the outlier score for each activity sample is calculated based on the distance measure. The calculated outlier score is stored as a vector (A₀). Similarly, LOF based density OD method is employed and the outlier score for each activity sample is calculated based on the density score. The calculated outlier score is stored as a vector (A₁). In the second phase, the outlier scores stored in the vectors A₀ and A1 are averaged. Then the final outlier score values are sorted in descending order. Now the outliers are removed using IQR method. The second method is improved hybrid outlier detection method-2 (IHOD2) is obtained by combining all the three outlier detection methods, distance based - KNN, density based - LOF and clustering based K-means outlier detection methods. In the proposed hybrid method HOD2, KNN- OD is employed first and the outlier score for each activity sample is calculated based on the distance measure. The calculated outlier score is stored as a vector (A₀). Similarly, LOF- OD based method is employed and the outlier score for each activity sample is calculated.The calculated outlier score is stored as a vector (A₁). K-means cluster method OD is then applied to calculate the outlier score as a vector (A₂). In the second phase, the outlier scores stored in the vectors A₀, A₁and A₂ are averaged. Then the outlier’s scores are sorted in decreasing order and outliers are removed using IQR method. The outlier detection methods are implemented using the Rapid miner tool. The default parameter values available in the Rapid miner tool are used in the implementation. The values of ‘K’ can be specified as the number of neighbors for distance based and density-based methods. First, we choose a random value of K = 10 for all the outlier detection methods and the proposed method is repeated with various K values such as 5, 20, 30, 40, 50, 100. For cluster-based methods, the optimal number of clusters is determined using the standard the silhouette method. In K-means clustering, silhouette method is used to find out the optimum value for a number of clusters (K). It is used to measure how close each data point in a cluster is to the neighboring data points in the clusters. The value of silhouette lies in the range of [-1, 1]. Higher the value better is the cluster formation.

The number of clusters for DS1 and DS2 is determined as 12 and 8 as in Fig. 2. The confusion matrix obtained for activity classification without outlier detection, KNN, LOF, K-means, IHOD1 and IHOD2 of dataset-1 (DS1) is shown in Tables 3–5. The confusion matrices of all outlier detection methods for dataset DS2 are shown in Tables 7 and 8. The third experimental analysis is done to classify the acivity labels of all OD methods using the standard classifier Support Vector Machine (SVM) on the two datasets. The reasons for choosing SVM in human activity recognition is, they do not include a set of rules understandable by humans [37, 39].

Fig.2

The optimal number of clusters for DS1 and DS2 using Silhouette methods.

Table 3

Comparison with existing work

S.No	Description	HOD	IHOD1 & IHOD2
1	Classification Method	SVM, RBFN	SVM
2	Outlier Detection	KNN, LOF	KNN, LOF, K - means clustering
3	Number of Neighbors ‘K ‘	10	10, 20, 30,40, 50, 100
4.	Outliers removed	top - n	Removed using IQR method.
5	Parameters evaluated	Classification Accuracy, Precision, recall, ROC	Classification Accuracy, Precision, recall, ROC, time and space complexity

Table 4

Confusion Matrix of without OD and KNN-OD for DS1

Activity	Without outlier detection						Distance based KNN - OD
	S	U	St	W	D	J	S	U	St	W	D	J
Sitting (S)	262	0	0	0	4	1	1573	2	23	3	0	2
Upstairs (U)	13	264	12	38	126	16	34	2033	132	139	5	8
Standing (St)	22	0	226	0	0	0	7	27	411	92	2	9
Walking (W)	0	244	1	2001	243	30	10	19	64	292	3	7
Downstairs (D)	1	71	7	26	143	9	1	0	1	1	296	0
Jogging (J)	0	53	0	16	12	1567	0	0	1	1	0	220
Misclassification rate	14.5%						10.95%

Table 5

Confusion Matrix of LOF - OD and K-means cluster based OD for DS1

Activity	Density based LOF OD						Cluster based K-means OD
	S	U	St	W	D	J	S	U	St	W	D	J
Sitting (S)	1584	9	3	3	19	0	1576	22	3	0	2	2
Upstairs (U)	7	346	21	7	21	1	32	2025	13	144	5	6
Standing (St)	15	95	2031	4	2035	9	9	33	408	92	2	5
Walking (W)	16	69	20	4	18	4	7	20	62	289	2	7
Downstairs (D)	0	4	0	2	0	1	0	1	1	0	296	0
Jogging (J)	0	3	2	9	0	1	1	0	1	0	1	22
Misclassification rate	9.28%						10.04%

Table 6

Confusion Matrix of IHOD1 and IHOD2 for DS1

Activity	IHOD1						IHOD2
	S	U	St	W	D	J	S	U	St	W	D	J
Sitting (S)	1588	5	9	4	0	2	1588	5	9	4	0	2
Upstairs (U)	18	2052	36	41	3	4	18	2052	36	41	3	4
Standing (St)	11	12	480	137	0	3	11	12	480	137	0	3
Walking (W)	4	9	100	345	5	5	4	9	100	345	5	5
Downstairs (D)	0	0	1	0	297	0	0	0	1	0	297	0
Jogging (J)	1	0	2	1	1	232	1	0	2	1	1	232
Misclassification rate	8.04 %						8.14 %

Table 7

Confusion Matrix of without outlier detection, KNN - OD, and LOF - OD for DS2

Activity	without outlier Detection			Distance based KNN - OD			Density based LOF – OD
	S	St	W	S	St	W	S	St	W
Standing (S)	156	1	0	224	6	1	224	6	1
Sitting (St)	0	0	0	9	402	113	9	402	113
Walking (W)	81	619	2071	4	212	1957	4	212	1957
Misclassification rate	23.2%			14.18%			9.02 %

Table 8

Confusion Matrix of K-means cluster based OD, IHOD1 and IHOD2 for DS2

Activity	Cluster based K-means OD			IHOD1			IHOD2
	S	St	W	S	St	W	S	St	W
Standing (S)	204	5	71	66	2	0	66	2	0
Sitting (St)	1	274	5	0	158	19	0	158	19
Walking (W)	32	341	1995	1	38	594	1	38	594
Misclassification rate	9.8%			6.22%			6.22%

7 Results and discussions

In addition to misclassification rate obtained from confusion matrix of classification results, various performance measures such as, classification accuracy, precision (or) correctness, recall (or) detection rate, time and space complexity are evaluated on DS1 and DS2.

7.1 Classification accuracy

The classification accuracy rate is high for both datasets using IHOD1 and IHOD2 methods when compared to individual outlier detection methods. Confusion matrix for activity classification without outlier detection, distance-based KNN, density based LOF, cluster-based K-means OD, IHOD1 and IHOD2 methods with SVM classification are shown in Tables 4–8 for DS1 and DS2. In Tables 6 and 8, the misclassification rate of IHOD1 and IHOD2 gets reduced to 8.04 % and 8.14 % for DS1 and 6.26 % and 6.24 % for DS2 for SVM, which is less when compared to individual OD methods. Misclassification rate is reduced considerably for classifiers with individual outlier detection methods than classifier without outlier detection. This is because of the outlier methods are employed to remove the noisy instances. Among the individual outlier detection methods employed, density LOF based outlier detection method has a low misclassification rate and the overall classification accuracy rate is very high for IHOD1 & IHOD2. The classification accuracy for DS1 and DS2 for K = 10 is calculated from the misclassification rate using Classification Accuracy = 100 - Misclassification rate, and shown in Table 9. To find the optimum K value, classification accuracy for K = 5, 10, 20, 30, 40, 50, 100 for the standard dataset DS1 is shown in Fig. 3. From the results Obtained it is very clear that proposed hybrid methods have high classification accuracy than other methods and when K = 50, the hybrid methods achieve high accurcy than K = 5, 10, 20, 30, 40, 100.

Fig.3

Classification Accuracy of ODM’s for K = 5, 10, 20, 30, 40, 50, 100 for DS1.

Table 9

Classification Accuracy for K = 10 (DS1 and DS2)

S. No	Outlier Detection	Accuracy (%)
		DS1	DS2
1	Without Outlier detection	85.53	76.8
2	Distance Based KNN	89.05	85.82
3	Density based LOF	90.72	90.98
4	Cluster Based K- means	89.96	90.02
5	IHOD1	91.96	93.78
6	IHOD2	91.86	93.78

7.2 Precision and recall

Precision (or) Correctness is defined as the proportion of the number of correctly classified activities to the total number of activities classified. From the results in Fig. 5, it is found that the SVM methods without outlier removal lead to low precision for DS1 and DS2. This implies that a large number of outliers have a negative impact on the precision of the classifier. Among the individual OD methods, the classifier proves the highest precision for IHOD1 and IHOD2 compared to other individual outlier detection methods used for DS1 and DS2. For DS1, the precision of SVM is 92.11. % for DS2, the precision of SVM is 93.8% with IHOD1. For DS1, the precision of SVM is 92.5% and for DS2, the precision of SVM is 92.24% with IHOD2. Also, the results prove that the classifier with hybrid outlier detection method has a strong relationship to activity classification than the classifiers with individual outlier detection methods. In general, hybrid outlier detection based classifiers predict the activities very accurately with high correctness.

Recall (or) Detection rate (or) Completeness is defined as the proportion of the correctly classified positive activities to the total number of positive activities. From the results in Table 10, it is found that the SVM methods without outlier removal lead to low detection rate for DS1 and DS2. This implies that a large number of outliers have a negative impact on the detection rate of the classifiers. Among the individual outlier detection methods, the classifiers show the highest detection rate for density-based outlier detection method irrespective of the datasets used. Among all outlier detection models employed, hybrid methods predict the maximum activities with high recall value. For DS1, the hybrid method IHOD1 obtained a high detection rate of 89.38% and IHOD2 obtained a high detection rate of 89.31%. For DS1, the hybrid method IHOD2 obtained a high detection rate of 89.57% and for DS2, the hybrid method HOD2 obtained a high detection rate of 89.71%.

Table 10
Precision and Recall for DS1 and DS2

S.No Outlier Detection Precision (%) Recall (%)

DS1 DS2 DS1 DS2

1. Without Outlier detection 79.9 62 74.5 72.7

2. Distance Based KNN 88.72 85.9 84.5 74

3. Density based LOF 88.9 91.9 87.37 85.2

4. Cluster Based K- means 88.69 88.3 88.71 84.9

5. IHOD1 92.11 93.8 89.38 89.57

6. IHOD2 92.5 92.24 89.31 89.71

S.No	Outlier Detection	Precision (%)	Recall (%)
1.	Without Outlier detection	79.9	62	74.5	72.7
2.	Distance Based KNN	88.72	85.9	84.5	74
3.	Density based LOF	88.9	91.9	87.37	85.2
4.	Cluster Based K- means	88.69	88.3	88.71	84.9
5.	IHOD1	92.11	93.8	89.38	89.57
6.	IHOD2	92.5	92.24	89.31	89.71

7.3 Visual evaluation

Receiver operating characteristics (ROC) curves are used for comparing the performance of the classifiers visually. ROC curve is plotted using a false alarm rate in X-axis and detection rate in Y-axis. From the results in Tables 4–8, misclassification rate, precision, and recall values show that SVM with hybrid outlier detection methods performs better for all OD methods irrespective of datasets. In addition to various quality metrics evaluated so far, the performance of SVM with outlier detection methods is also evaluated visually using ROC curves. Analyzing the ROC curves of datasets DS1 and DS2 from Figs. 4 and 5, it can be observed that the hybrid outlier detection method IHOD2 combined with SVM classifier out performs hybrid outlier detection method (IHOD1). Furthermore, it can be observed that applying SVM on the original dataset (with outliers) has the much worse ROC curve than SVM with hybrid methods. Similarly, when the proposed methods (IHOD1 and IHOD2) was able to alleviate the effect of noisy features more effectively. From the results obtained, it is found that among all the OD methods, the hybrid outlier detection method based prediction models perform well in all aspects. IHOD2 serves an excellent role to make the outlier detection process than IHOD1 because of its compound combination of three OD methods. Among the datasets used, a drastic increase in the detection rate and precision is observed for DS2 before and after outlier detection. The possible reason for this might be due to the environment used for data collection and the quality of sensing device used. Also, the outliers may occur in an environment with clutter and variable lighting.

Fig.4

ROC of DS1.

Fig.5

ROC of DS2.

7.4 Time and space complexity

The time complexity of the proposed hybrid algorithm depends on the size of the dataset (s), the size of the attribute set (a) and a number of detection algorithm combined (n). The algorithm has a time complexity of O (a*n*s) in the worst case. To demon strate the scalability of the hybrid algorithm proposed, Figs. 6 and 7 shows the execution time of two hybrid algorithms for two different datasets as the size of the dataset is increased. As can be seen the execution time of both algorithm increases with increase in the size of datasets. However, the individual outlier detection algorithm increases its execution time dramatically with the increase in the size of the dataset. However, the individual outlier detection algorithm increases its execution time dramatically with the increase in the size of the dataset. In contrast, the proposed hybrid algorithm increases linearly. The above analysis shows that IHOD1 and IHOD2 are linear to the size of the dataset, which makes this algorithm to have good scalability. Hence, it can be claimed that hybrid OD method proposed is scalable and suitable for very large datasets in real time applications.

Fig.6

Time and Space Complexity of DS1.

Fig.7

Time and Space Complexity of DS2.

A comparative analysis is made with existing methods and shown in Table 11. Results obtained show that classification accuracy for IHOD1 and IHOD2 is high than other existing methods.

Table 11

Comparison of Existing Methods

S. No	Author	Classifier	Outlier	Accuracy
			detection	(%)
1	Kose et al. [25]	Naïve Bayes	Clustering	88.8
2	Gomes J. et al. [20]	SVM	Distance based	91.1
3	Vo et al. [43]	SVM	Distance based	90.5
4	Zhao K et al. [50]	Decision tree	Density based	86.5
5	Our Proposed Approach	SVM	IHOD1	92.4 (DS1)
92.36 (DS2)
IHOD2	93.7 (DS1)
94.2 (DS2)

8 Conclusion and future work

The improved Hybrid outlier detection approach for combining outlier detection algorithms was proposed. Experiments on two datasets indicate that the proposed improved Hybrid outlier detection method can result in much better detection performance than the individual outlier detection methods. The general nature of the proposed work allows that the combining schemes can be applied to a number of combinations of outlier detection algorithms. Despite the fact that performed experiments have provided confirmation that the proposed technique can be exceptionally effective in the anomaly detection and classification accuracy is increased using outlier detection methods. The future work is expected to completely describe them particularly in extensive and high dimensional databases with feature selection or feature reduction methods. This proposed work can be used to detect abnormal activities of human in home monitoring and health care monitoring. The following are some possible threats to the validity of the proposed method. This work was carried out in a class imbalance condition. A suitable sampling method may be adapted to balance the distribution before employing the outlier detection. For online data collection, we should take into account the various hardware limitations. Though a reasonable number of activities are recorded, the performance of the proposed outlier detection methods must be analyzed by increasing the number of activities further. The performance of the proposed method must be analyzed for an application other than monitoring daily human activities.

References

M.R.

Alam ,

M.B.I.

Reaz ,

M.A.M.

Ali ,

S.A.

Samad ,

F.H.

Hashim and

M.K.

Hamzah , Human Activity Classification for Smart Home: A Multiagent Approach, in procedia of IEEE Symposium on Industrial Electronics and Applications (ISIEA 2010), Penang, Malaysia, 2010, pp. 511–514.

C.C.

Aggarwal and

Sathe , Theoretical Foundations and Algorithms for Outlier Ensembles, ACM SIGKDD Explorations Newsletter 17(1) (2015), 24–47.

A.S.

Evani ,

Sreenivasan ,

J.S.

Sudesh ,

Prakash and

Bapat , Activity recognition using wearable sensors for healthcare, in the proceedings of Sensor Technologies and Applications (SENSORCOMM) (2013), 173–177.

Alsheikh et al., Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications, IEEE Communications - Surveys & Tutorials (2014), 1996–2018.

Alshurafa ,

Xu ,

J.J.

Liu , et al., Designing a robust activity recognition framework for health and exergaming using wearable sensors, Journal of Biomedical and Health Informatics 18 (2014), 1636–1646.

Althloothi ,

M.H.

Mahoor ,

Zhang and

R.M.

Voyles , Human activity recognition using multi-features and multiple kernel learning, Journal of Pattern Recognition 5 (2014), 1800–1812.

Al-Zoubi ,

Al-Dahoud and

A.A.

Yahya , New outlier detection method based on fuzzy clustering, WSEAS Transactions on Information Science and Applications (2010), 681–690.

Bayat ,

Pomplun and

D.A.

Tran , A Study on Human Activity Recognition Using Accelerometer Data from Smartphones, in Procedia Computer Science (2014), 450–457.

Bhaduri and

Stolpe , Distributed Data Mining in Sensor Networks. In: Aggarwal C. Managing and Mining Sensor Data.- Springer Boston, (2013), 211–236.

10.

Cateni ,

Colla ,

Vannucci and

Vannocci , A Procedure for Building Reduced reliable Training Datasets from Real-WorldData, International Conference on Artificial Intelligence and Applications AIA (IASTED 2014), Austria, (2014), 17–19.

11.

Cateni ,

Colla and

Nastasi , A multivariate fuzzy system applied for outliers detection, Journal of Intelligent and Fuzzy Systems 24(4) (2013), 889–903.

12.

Cheng ,

Chen and

Shen , A framework for daily activity monitoring and fall detection based on surface electromyography and accelerometer signals (2013), 38–45.

13.

Cheng and

Zhu , Lightweight anomaly detection for wireless sensor networks, Journal of Distributed Sensors Networks, 2015, DOI:10.1155/2015/653232

14.

Arifoglu and

Bouchachia , Activity Recognition and Abnormal Behaviour Detection with Recurrent Neural Networks, The 14th International Conference on Mobile Systems and Pervasive Computing (MobiSPC 2017) (2017), 86–93.

15.

De ,

Bharti ,

Sajal K.

Das and

Chellappan , Multimodal wearable sensing for fine-grained activity recognition in healthcare, IEEE transactions on Internet Computing 19 (2015), 26–35.

16.

W.-Y.

Deng ,

Q.-H.

Zheng and

Z.-M.

Wang , Cross-person activity recognition using reduced kernel extreme learning machine, Journal of Neural Networks 53 (2014), 1–7.

17.

Duan ,

Xu ,

Liu and

Lee , Cluster-based outlier detection, Annals of Operating Research (2009), 151–168.

18.

Fawzy and

H.M.

Mokhtar , O. Outliers detection and classification in wireless sensor networks, Egyptian Informatics (2013), 157–164.

19.

Gao ,

A.K.

Bourke and

Nelson , Evaluation of accelerometer-based multi-sensor versus single-sensor activity recognition systems, Journal of Medical Engineering and Physics 36 (2014), 779–785.

20.

Gomes ,

Krishnaswamy ,

Gaber ,

Sousa and

Emars , A Personalized Mobile Activity Recognition System, in Proceedings of the IEEE International Conference on Mobile Data Management (MDM), Bengaluru, Karnataka, (2012), 316–319.

21.

S.A.

Haque ,

Rahman and

S.M.

Aziz , Sensor Anomaly Detection in Wireless Sensor Networks for Healthcare, Sensors (2015), 8764–8786.

22.

Jurdak ,

X.R.

Wang ,

Obst and

Valencia , Wireless Sensor Network Anomalies: Diagnosis and Detection Strategies, Intelligent Systems Reference Library (2011), 309–325.

23.

Vishwakarma ,

Kapoor and

Dhiman , a Unified framework for human activity recognition: An approach using spatial edge distribution and R-transform, International Journal of Electronic & Communication (2016), 341–353.

24.

T.S.

Kim ,

J.H.

Cho and

J.T.

Kim , Mobile Motion Sensor-Based Human Activity Recognition and Energy Expenditure Estimation in Building Environments, Smart Innovations, Systems and Technology (2013), 987–993.

25.

Kose ,

O.D.

Incel and

Ersoy , Online Human Activity Recognition on Smart Phones, in Proceedings of the workshop on Mobile Sensing: From Smart phones and Wearables to Big Data, Beijing, China (2012), pp. 11–15.

26.

Lara and

Labrador , A mobile platform for real-time Human Activity Recognition, Proceedings of the IEEE Consumer Communications and Networking Conference, Las Vegas, USA, (2012), pp. 667–671.

27.

Lu and

qing-Ling and

Yi-Ju , Activity Recognition in Smart Homes, Multimedia Tools Appl 2016, doi: 10.1007/s11042-016-4197-1

28.

Mahmood ,

Shi ,

Khatoon and

Xiao , Data Mining Techniques for Wireless Sensor Networks: A Survey, Journal of Distributed Sensor Networks (2013), 1–24.

29.

Mannini and

A.M.

Sabatini , Machine Learning Methods for Classifying Human Physical Activity from On-Body Accelerometers, Sensors (2010), 1154–1175.

30.

E.A.

Mosabbeb ,

Cabral ,

De la F.

Torre and

Fathy , Multi-label Discriminative Weakly-Supervised Human Activity Recognition and Localization, in the proceedings of Computer Vision - ACCV 2014. Springer International Publishing (2014), pp. 241–258.

31.

Nivetha and

Venkatalakshmi , Hybrid outlier detection (HOD) method in sensor data for human activity classification, Journal of Intelligent Data Analysis (2018), 245–260, doi: 10.3233/IDA-163329

32.

J.C.

Quiroz ,

Banerjee ,

S.M.

Dascalu and

S.L.

Lau , Feature Selection for Activity Recognition from Smartphone Accelerometer Data Feature Selection for Activity Recognition from Smartphone Accelerometer Data, Journal of Intelligent Automation & Soft Computing (2017), 1–9.

33.

Rajasegarar ,

Leckie ,

Palaniswami ,

J.C.

Bezdek and

Palaniswami , Anomaly detection in wireless sensor networks, IEEE Transactions on Wireless Communications 15 (2008), 34–40.

34.

Rajasegarar ,

Gluhak ,

Ali ,

Nati ,

Moshtaghi ,

Leckie and

Palaniswami , Ellipsoidal neighborhood outlier factor for distributed anomaly detection in resource-constrained networks, Journal of Pattern Recognition 47 (2014), 2867–2879.

35.

San-segundo ,

J.M.

Montero ,

Barra-chicote ,

Fernández and

J.M.

Pardo , Feature extraction from smartphone inertial signals for human activity segmentation, Signal Processing (2016), 359–372.

36.

Shahid ,

Haider ,

Saad and

Qaisar , Characteristics, and classification of outlier detection techniques for wireless sensor networks in harsh environments: A survey, Journal of Artificial Intelligence Review (2015), 193–228.

37.

Shahid ,

Haider ,

Saad and

Qaisar , One-class support vector machines: Analysis of outlier detection for wireless sensor networks in harsh environments, Artificial Intelligence Review, 2013. doi: 10.1007/s10462-013-9395-x

38.

Sharma ,

Rajpoot and

L.K.

Sharma , Nearest Neighbour Classification for Wireless Sensor Network Data, Journal of Computer Trends and Technology (2011), 41–43.

39.

Sheng ,

Li ,

Mao and

Jin , Outlier detection in sensor networks, Proc. of MobiHoc’07, Québec, Canada, 2007, pp. 219–228.

40.

Shoaib et al., A Survey of Online Activity Recognition Using Mobile Phones, Journal of Sensors (2015), 2059–2085.

41.

Su ,

Long ,

Qiu ,

Li and

Liu , Anomaly Detection of Single Sensors Using OCSVM_KNN. In Big Data Computing and Communications; Springer: Cham, Switzerland, (2015), pp. 217–230.

42.

Subramaniam ,

Palpanas ,

Papadopoulos, V.

Kalogeraki and

Gunopulos , Online outlier detection in sensor data using non-parametric models, in Proceedings of VLDB’06 International conference on Very large databases, Korea, (2006), pp. 187–198.

43.

Q.V.

Vo ,

M.T.

Hoang and

Choi , Personalization in Mobile Activity Recognition System Using K- Medoids Clustering Algorithm, Journal of Distributed Sensor Networks, (2013), 1–12. www.mathwords.com/o/outlier.htm

44.

Yan ,

You ,

Ji ,

Yin and

Yang , A hybrid outlier detection method for health care big data. Proceedings - 2016 IEEE International Conferences on Big Data and Cloud Computing, BDCloud, Social Computing and Networking, SocialCom and Sustainable Computing and Communications, SustainCom, (2016), pp. 157–162.

45.

N.A.

Yousri ,

M.A.

Ismail and

M.S.

Kamel , Fuzzy outlier analysis a combined clustering - Outlier detection approach, Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, 2007, pp. 412–418.

46.

Yu ,

Junsong and

Zicheng , Propagative hough voting for human activity 645 recognition, Computer Vision-ECCV 2012, Springer Berlin Heidelberg (2012), pp. 693–706.

47.

Yu ,

Jibin and

Jiang , An Improved ARIMA-Based Traffic Anomaly Detection Algorithm for Wireless Sensor Networks, Journal of Distributed Sensors Networks (2016), 1–9.

48.

Zhang ,

Meratnia and

Havinga , Outlier Detection Techniques for Wireless Sensor Networks: A Survey, IEEE Transactions on Communications and Surveys (2010), pp. 159–170.

49.

Zhao ,

Du ,

Li ,

Zhang ,

Liu and

Xu , Healthy: A Diary System Based on Activity Recognition Using Smartphone, in Proceedings of Mobile Ad-Hoc and Sensor Systems (MASS)”, Hangzhou, China, (2013), pp. 290–294.

50.

Zhang ,

Shi ,

Gao and Li, Unsupervised outlier detection in sensor networks using aggregation tree, in Proc of ADMA-Advanced Data Mining and Applications, Harbin, China, (2007), pp. 158–169.

51.

Zimek ,

R.J.G.B.

Campello and

Sander , Ensembles for unsupervised outlier detection, ACM SIGKDD Explorations Newsletter 15(1) (2014), 11–22.