Abstract
Nowadays, the advancement of sensing and communication technologies has led to the possibility of collecting a large amount of sensor data, however, to build a reliable computational model and accurately recognise human activities we still need the annotations on sensor data. Acquiring high-quality, detailed, continuous annotations is a challenging task. In this paper, we explore the solution space on sharing annotated activities across different datasets in order to enhance the recognition accuracies. The main challenge is to resolve heterogeneity in feature and activity space between datasets; that is, each dataset can have a different number of sensors in heterogeneous sensing technologies and deployed in diverse environments and record various activities on different users. To address the challenge, we have designed and developed sharing data and sharing classifiers algorithms that feature the knowledge model to enable computationally-efficient feature space remapping and uncertainty reasoning to enable effective classifier fusion. We have validated the algorithms on three third-party real-world datasets and demonstrated their effectiveness in recognising activities only with annotations from as little as 0.1% of each dataset.
Keywords
Introduction
Sensor-based human activity recognition (HAR) has been playing a significant role in many applications, which enables to provide customised services to suit people’s current context. Various machine learning including recent deep learning techniques have been applied to HAR in feature extraction [2] and sensor fusion [16] and achieved promising accuracy on recognising daily activities. However, these techniques heavily rely on labelled training data in order to build a robust computational model.
Labelling sensor data with activities is a time-consuming and cost-sensitive task. Reducing the reliance on training data has long been a challenging research topic, and different approaches have been attempted, including unsupervised learning [32], active learning [1], co-training [6], and transfer learning [27]. The majority of these approaches have reduced the annotation to a certain degree, but the annotation burden on individual users is still heavy, and does not scale to a large number of users.
To directly tackle this challenge, we propose a SLearn approach to relieve the annotation burden on individual users via cross learning on scarce and partially annotated data from multiple users to achieve satisfactory activity recognition accuracies. The hypothesis is that as long as each user contributes a small number of labelled examples (even though these examples might not cover a complete set of activity types), SLearn will learn annotated examples across all the users and be able to build an activity recognition model for each user to recognise all the activities. For example, if a user A’s data only contains annotations on ‘sleeping’ and a user B’s only contains annotations on ‘preparing meal’, SLearn will take both users’ data and build activity models to be able to recognise these two activities on both users. In this way, SLearn reduces labelling on individual users and still builds a robust model to recognise activities.

Three smart home deployments observing the same set of activities but with different spatial layouts and a different number of sensors. Each table shows the averaged activation ratios of sensors (e.g., S1 or S2, mapping to red dots in each house setting) per each activity class (e.g., ‘Leave home’ or ‘Sleep’).
The challenge faced by SLearn is the heterogeneity in different datasets. Take Fig. 1 as an example where each smart home has different spatial layouts and is deployed with a different number of sensors with different sensing technologies. Transferring activity models across these smart homes, especially, mapping and/or aligning features extracted from these sensors, is a daunting task, as their feature space can be completely disparate. Here we focus on binary sensors, including RFID sensors, door sensors, and pressure sensors. Transfer learning across datasets on binary sensors is currently under-explored. Most transfer learning techniques on sensor-based activity recognition focus on accelerometer data that have the uniform format of sensor data [27].
We have explored preliminary sharing data and sharing classifiers algorithms to integrate knowledge- and data-driven approaches in our earlier work [30]. The algorithms have achieved promising results in gaining reasonably good accuracy with a small amount of training data from each dataset. However, the algorithms struggle when there exists high heterogeneity between datasets and sensor data are noisy, making it uncertain to select appropriate activity labels. To tackle this challenge and improve the accuracy of shared learning, we use the recent advance in uncertainty reasoning to support robust classifier fusion in the face of conflict and uncertainty among classifiers in different datasets. We design a new hybrid approach to combine sharing data and classifiers to complement their limitations and reduce the risk of negative transfer. We evaluate the algorithms on three third-party datasets and demonstrate their effectiveness by comparing them against the state-of-the-art classification techniques.
The rest of the paper is organised as follows. Section 5 reviews the mainstream work on addressing the scarcity of labelled data and identifies the difference between them and SLearn. Section 2 describes the SLearn approach and Section 3 introduces the evaluation methodology and experiment setup. Section 4 performs an evaluation and discusses the strength and limitation of SLearn Algorithms, and Section 6 concludes with some suggested future work.
We hypothesise that by shared learning across different datasets, we can build an activity model that can recognise these users’ activities accurately only with a small amount of training data. Let
In the following, we will explore the solution space to address this question. We will look into two directions: sharing classifiers and sharing training data. For both approaches, the key enabler is feature space remapping; that is, transforming sensor feature from one dataset to another.
Feature space remapping
Different feature remapping strategies have been proposed [18] and a promising approach is a meta-feature based mapping function for event-driven sensors in smart home environments [5]. The authors have defined a range of meta-features about each sensor; for example, the average sensor event frequency over 1-hour time periods, over 3/8/24-hour periods, the mean and standard deviation of the time between this sensor event and the next sensor event, and the probability of the next event is from the same sensor. These meta-features are used as a heuristic to guide the mapping process. This is a data-driven approach for feature space remapping, but its performance is affected by the activity routine of various users. For example, one user might often have breakfast at 6am while the other might have at 9am, or one user prefers having shower before breakfast while the other prefers the other way around so that the sensors might be mis-matched on the time scale. In addition, the density of the sensor deployment and the frequencies and sensitivity of sensors reporting events affect the mapping on the intervals between events. For example, one environment can be more densely deployed with sensors so that the time distances between events reported by different sensors can be significantly shorter than the other environment set up with much fewer sensors.
To reduce the impact of such differences in each dataset, we adopt a more general approach – semantics-based feature mapping. We will use the common knowledge, which has demonstrated generality across different smart home datasets [32]. The principle of semantics-based feature mapping is to compute similarity between a pair of sensors based on where they are deployed and which object they have attached to. Both location and object concepts are organised in an ontological hierarchy, from which a conceptual similarity measure [28] is applied to calculate the distance between two concepts. For example, a sensor in House A can be described with
Given an instance
Using Definition 1, we can perform sensor feature remapping. That is, a value
Figure 2 illustrates an example of the above process. Assume that there are two simplified datasets I and
We extract sensor feature from sensor events as

An example of sensor feature space remapping.

To share classifiers, we design a uncertainty-driven algorithm; that is, when the classifier from the current dataset cannot confidently infer an activity label to a given example, then we acquire labels from the classifiers from the other datasets. This is inspired by active learning; that is, identify uncertain examples and query human operators for annotation. The difference here is that we do not query human operators, but classifiers in the other datasets.
We describe the process in Algorithm
Then we will perform a feature space remapping that converts it to examples in the other datasets to allow the other classifiers to label. Once the other classifiers complete the inference, we will need to collect their results and integrate them to make a final decision. The existing approaches to combine classifiers include ensemble methods like boosting, bagging, and stacking. However, we are constrained by the size of training data, so a training data craving ensemble method might not be applicable. Here we borrow the recent advance in uncertainty reasoning techniques – classifier fusion with contextual reliability evaluation (CRE) [11]. It enables characterising refined reliability of a classifier on individual classes and thus allow more effective integration of classifiers.
Classifier fusion with contextual reliability evaluation
CRE is built on top of a formal mathematical theory of evidence, Dempster-Shafer Theory (DST), which propagates uncertainty values and consequently provides an indication of the certainty of inferences. In the following, we will briefly introduce the basics of DST, its application to classification, and then move on to CRE.
Basics of DST
Let
Associated with a BBA are belief and plausible functions that define the lower and upper bounds of imprecise probability of a BBA:
DST application to classification
In a classification problem, each hypothesis in Ω refers to a class of interest. For an input example X, a BBA
The combination of multiple classifiers’ results can be done through the above Dempster’s rule, which assumes that all the sources are completely reliable. However, each classifier is subject to a certain degree of reliability in that rarely it can produce 100% accuracies of inference on all the examples. To capture the reliability of each classifier, Shafer’s discounting method is often applied. Shafer has defined an evidential operation for discounting the partial mass of belief in a BBA to the total ignorance according to a discounting factor α (
Contextual reliability evaluation
CRE focuses on estimating the reliability of each classifier’s inference
CRE – Reliability
Given an input example X in a dataset D (
To do so, we find the top close neighbours
In the end, the reliability profile
CRE – Discounting factor
The second part of CRE is to estimate the discounting factor for each classifier. Dempster’s combination rule in Equation (1) can produce unreasonable results when the sources are highly conflicting [23]. To address this problem, CRE proposes a compatibility measure based on the plausibility functions of each classifier. The criteria of compatibility is to assess whether two classifiers have any common inference. Given two BBAs
The compatibility-based discounting factor helps to discount the highly conflicting situations between classifiers, which leads to a more effective application of Dempster’s rule. The derived BBA


A classic approach of dealing with a small amount of training data is leveraging unlabelled data, which is similar to the active learning approach, mentioned in Section 5.2. That is, for each dataset, we train a classifier on its labelled data and then use it to iteratively infer the labels on its unlabelled examples for T rounds or until the algorithm converges. For each iteration, we select the top k most confident examples to enlarge the labelled data pool and iteratively update the classifier. However, given our assumption that the labelled data might be too little and have not covered the whole set of activities of interest, this basic approach can only assign the labels that have been observed in the training data. We will need to leverage labels from the other datasets. This gives rise to Algorithm
A hybrid of sharing data and classifiers
The advantage of the

The objectives of the evaluation is to assess the accuracy of SLearn recognising activities by shared learning on datasets that are only partially annotated with activities of interest. More specifically, we aim to answer the following questions:
Does SLearn outperform the classic supervised and semi-supervised learning techniques in HAR?
Which SLearn algorithm is more effective and under what circumstances?
Does CRE outperform DST, in terms of supporting robust classifier fusion in the face of high conflict and uncertainty?
We measure the accuracy in overall accuracy – the ratio of the learnt labels being the same as the true labels, and class accuracy – the averaged ratio of the learnt labels being the same as the true labels in each class
Datasets
To test SLearn, we need to find the datasets that are collected in different environments but have the common set of activities to allow for shared learning. Therefore, we locate the three HAR datasets collected by the University of Amsterdam [26], which are widely used in the literature. These datasets were collected in real-world houses with different spatial layouts, a different number of sensors, and the degree of inherent noise. We believe that experimenting on these datasets gives us a comprehensive view of the effectiveness of the proposed technique.
These three datasets recorded the same set of 7 activities, including leaving the house, preparing breakfast or dinner, and sleeping. The activity distribution is presented in Fig. 3, which shows that the occurrence of the activities is imbalanced and each house has a dominant activity. These three houses are deployed with only binary sensors and a more detailed description of the datasets can be found in [8]. The House A dataset consists of 14 state-change sensors attached to household objects like doors, cupboards, and toilet flushes, while the other two datasets contain more than 20 sensors, including reed switches to measure whether doors and cupboards are open or closed; pressure mats to measure sitting on a couch or lying in bed; mercury contacts to detect the movement of objects (e.g., drawers); passive infrared to detect motion in a specific area; float sensors to measure the flush of toilet. The sensor metadata are also provided along with the dataset, and some of which are presented in Fig. 3. The metadata allows us to map sensor on the Location and Object ontological concepts and facilitate the feature space remapping process.

Activity class distributions and sensor metadata of the three datasets used for experimental evaluation.
As SLearn aims to explore how little the training data is required to build a robust activity model, we are using the shallow learning techniques as base classifiers, including Naive Bayes (NB), Random Forest (RF), Logistic Regression (LR), Neural Networks (NN), and Support Vector Machine (SVM).
Evaluation process
The key objective of SLearn is to integrate partially annotated, heterogeneous datasets to recognise a better covered set of activities. To do so, we evaluate the algorithms across different ratios of training data, from 0.1% to 80%. 0.1% is the extreme training data percentage for the chosen datasets; that is, the training data for each dataset can only contain 1 or 2 examples. This extreme percentage can be too little to train a classifier properly, but we would like to build a comprehensive performance profile of SLearn and decide what training percentage is appropriate to SLearn.
The hypothesis of SLearn is that if the examples from each dataset covers different activities; e.g., ‘leave home’ from House A and ‘use toilet’ from House B, SLearn will be able to recognise both activities on the test data in House A and B, while a traditional supervised activity recognition algorithm will only be able to recognise ‘leave home’ in House A and ‘use toilet’ in House B. Here we start with the low training ratio and systematically increase the ratio to find the optimal ratio of training data with which SLearn can achieve reasonable accuracies.
Results and discussion
This section discusses the evaluation results to address the research questions in Section 3.
SLearn model configuration
Our first experiment is to understand the performance of
The results show that none technique or distance measures significantly outperforms the others, except for that SVM performs the worst among all. This suggests that


Comparison of overall and class accuracies of CRE- and DST-based classifier fusions.
In our previous work [30], we have evaluated
In some cases, DST works better, especially when the training data is very small. For example on House B, when the percentage of training data is 0.1%, DST-based Algorithm
Comparison of SLearn and baseline techniques
Figure 6 compares overall and class accuracies of

Comparison of overall and class accuracies of
We can see that even when the ratio of training data is only 0.1%, the overall accuracies on the baseline approaches can be as high as 50%, especially on House B and C. After looking into the inference results, we find that on these two datasets, if the training data only contains the high frequency activities, then both the classifiers NB and RF will only predict these two activities, leading to the final overall accuracies to be the actual class distribution. This also explains why on House B, Algorithms

Confusion matrices on House A with
To gain a deeper insight, we plot the confusion matrices of
We have experimented three uncertainty sampling strategies of active learning including least confidence, margin of confidence, and information entropy, and the uncertainty entropy strategy produces the best accuracy. The performance of active learning with the information entropy uncertainty sampling strategy is presented in Fig. 6, including the times of queries (
Comparison between SLearn algorithms
When the training data is small, sharing data works better than sharing classifiers. As presented in Fig. 6, Algorithm
Algorithm
Challenges in activity recognition datasets
In the experiments, we have encountered two challenges in these activity recognition datasets: (1) imbalanced class distribution and (2) the variety of patterns in activities. The imbalanced class distribution often leads the high overall accuracies as the training data is more likely to contain the examples from the dominated, frequent activities. However, it does not help improve class accuracies. In the future, various sampling techniques can be applied to either under-sampling frequent activities or over-sampling infrequent activities. Also, recent data augmentation techniques [4,14,24] can be applied to either simulate data samples and augment data with variations.
Each activity can have a variety of patterns, and some activities can have similar or the same patterns. Often we need a large amount of training data to learn the difference so as to be able to distinguish them. This will be a particular challenge to SLearn. We try to address this challenge by introducing contextual reliability evaluation based classifier fusion. However, when the training data is too small, the algorithm fails to locate enough neighbours to draw a reliable conclusion. When the neighbours are labelled with a different activity, then the reliability scores are compromised.
Comparison with active learning algorithms
Figure 6 has shown that active learning outperforms SLearn algorithms. In addition to the high query ratio, another reason for the better performance is that active learning assumes the always availability of true labels for each user’s own sensor data. On the one hand, this allows annotating sensor data for individual users and thus improves their own activity recognition model. On the other hand, it has better coverage of the activity space than our proposed methods in that the labels that we share in
Knowledge engineering effort
The smart home ontologies that we are using are light-weight and generic to various smart home environments without any modifications, which has been demonstrated in our previous work [32,33]. To perform feature space remapping, we take the sensor deployment file to map sensors to their corresponding location and object concepts.
However, we understand that not all of the environments have well-recorded sensor deployment files, which will involve some manual labelling on sensors. Also if a sensor is removed or moved, or a new sensor is introduced, we will need to remap sensors and this effort is unavoidable in our current design. This limits the application of our knowledge-driven approach.
Related work
Activity recognition has been an active research topic in the last decade, and a large number of knowledge- and data-driven techniques have been proposed [31]. Among them, different approaches have been designed to address the scarcity challenge of activity annotation, including unsupervised learning, activity learning, and transfer learning. In the following, we will compare and contrast these approaches with SLearn.
Unsupervised learning
Unsupervised learning automatically partitions and characterises sensor data into patterns that can be mapped to different activities without the need of annotated training data. Pattern mining and clustering are the two mostly used techniques that support unsupervised activity recognition. Gu et al. have applied emerging patterns to mine the sequential patterns for interleaved and concurrent activities [7]. Rashidi et al. propose a method to discover the activity patterns and then manually group them into activity definitions [20]. Based on the patterns, they create a boosted version of a Hidden Markov Model (HMM) to represent the activities and their variations in order to recognise activities in real time. Similarly, Ye et al. have combined the sequential mining and clustering algorithms to discover representative sensor events for activities. Different from the work in [20], they have applied the generic ontologies to automatically map the discovered sensor sequential patterns to activity labels through a semantic matching process [32]. Yordanova et al. have also applied domain knowledge in rule-based systems to generate probabilistic models for activity recognition [9,35].
Taking a different route, researchers also have applied web mining and information retrieval techniques to extract the common sense knowledge between activities and objects via mining online documents; that is, what objects are used to perform a daily activity and how significant each object is contributed to identifying this activity [17,29,34,36]. During the reasoning process, the mined objects are mapped to sensor events and an appropriate activity will be recognised.
SLearn is not a classic activity recognition problem where for each dataset a model is trained with labelled instances and used to recognise unlabelled instances. In SLearn, the training data is assumed to be incomplete in that it might not cover all the activities of interest. SLearn is not a unsupervised learning technique as it still relies on labelled training data, but the labels can come from different datasets.
Active learning
Active learning, so called ‘query learning’, is a subfield of machine learning, which is motivated by the scenario when there is a large amount of unlabelled data but a limited and insufficient amount of labelled data. As the labelling process is tedious, time-consuming and expensive in real-world applications, active learning methods are employed to alleviate the labelling effort by selecting the most informative instances to be annotated [21].
Alemdar et al. apply active learning strategies to select the most uncertain instances to be annotated; that is, the instances sit at the boundaries of different activity classes [1]. The annotated instances are then used to iteratively update a HMM to infer daily activities. The active learning strategies have improved recognition accuracies, compared to random selection. Cheng et al. apply a density-weighted method that combines both uncertainty and density measures into an objective function to select the most representative instances for user annotation [3].
SLearn is better than an active learning problem [21] in that the latter still needs to query a user or a human operator, but SLearn queries the labels from the other datasets. However, we could use the uncertainty sampling strategies in active learning to determine when an algorithm should leverage training data or classifiers from other datasets.
Transfer learning
Want et al. [27] propose a stratified transfer learning to improve accuracies of cross-domain activity recognition. Here, cross-domains mean different positions where the accelerometers are placed. The key idea is to exploit the intra-affinity of classes to perform intra-class knowledge transfer.
Maekawa et al. [12] propose an unsupervised approach to recognise physical activities from accelerometer data. They utilise information about users’ characteristics such as height and gender to compute the similarity between users, and find and adapt the models for the new users from the similar users. van Kasteren et al. [25] propose a manual mapping between sensors in different households and learn the parameters of a target model using the EM algorithm to transit probabilities of HMM models from source to target. Similarly, Rashidi et al. [19] learn sensor mappings based on their locations and roles in activity models. The role is characterised in mutual information, measuring the mutual dependence between an activity and a sensor and suggests the relevance of using the sensor in predicting the corresponding activity. Feuz et al. [5] propose a data-driven approach to automatically map sensors based on their meta-features, which are mainly about when a sensor reports, and time intervals between events reported by this sensor and others.
SLearn works on a different assumption from the above works where they assume a complete model (that is, containing all the activities of interest) can be learnt on a source domain, while we assume each domain can only have a small fraction of data being annotated (that is, the activities having been annotated can be a subset of activities of interest in a domain) and we do not assume any domain necessarily as a source or target domain. However, transfer learning techniques such as feature remapping can be applied to SLearn. Especially, our approach is most similar to the above three, where we focus on sensor mappings to support sharing sensor data across multiple datasets. The difference is that we are using a knowledge-driven approach where sensors are modelled in location and object ontologies whose generality across different households and sensing technologies has been demonstrated in other works [32].
Conclusion and future work
This paper proposes a set of SLearn algorithms to address the problem on the scarcity of activity annotations by leveraging annotations across different datasets that can have different sensing technologies, different sensor deployment, and different users, as long as they share the same mission with compatible sensor features and activities of interest.
The evaluation results have consistently demonstrated a significant improvement on recognition accuracies when training data is small. The success of SLearn can scale activity recognition applications by leveraging each user’s sporadic annotations.
The future work is to resolve heterogeneity of activity labels in different datasets. For example, given that two datasets have annotations on ‘preparing breakfast’ and ‘preparing dinner’, can we combine them to recognise ‘making a meal’ on the third dataset? To address this question, we will first define the semantic similarity between different activity labels [32]; that is, specifying an activity with the common location and object ontologies. Then we can look into an instance transferring approach to factor the activity similarity in. For example, we will start with the recent approach that has the capacity of borrowing examples from similar but not the same classes by boosting the loss function with similarity weight for a target class in the objective function [10]. We will also adapt the current approach to accelerometer data-based activity recognition.
