Abstract
One application of Ambient Intelligence (AmI) that supports people in their daily activities is the smart home, which has become a popular topic for research over the past 10 years. The smart home can support the inhabitant in a variety of ways, such as watching for potential risks, detecting any abnormality, adapting the home for environmental conditions and inducing behavioural change. This often requires the smart home to recognise the behaviours of the inhabitant. In this paper, we introduce a method that can accurately recognise the inhabitant’s behaviours. This includes both the segmentation of the sensor stream and the identification of behaviours. We demonstrate our algorithm on sensor data from real smart homes.
Introduction
It is a well-reported fact that the populations of the world are aging. In Europe, for example, it has been reported that the number of people aged 65 years and over is projected to increase from 10% of the entire population in 1950 to more than 25% in 2050. Besides the decrease in birth rate, the global life expectancy at birth is projected to increase from 58 years in 1970–1975 to 75 years in 2045–2050 [12]. The group of elderly (aged 65 years and above) is the fastest growing segment of the world’s population.
Older adults are more frequently subject to physical disabilities and cognitive impairments such as diminished sense of touch, slower ability to react, physical weakness and memory problems than younger people. It is clearly impossible to rely solely on increasing the number of caregivers for the elderly, since even now it is difficult and expensive to find care [10]. Additionally, many people are choosing to stay in their own homes as long as possible, and hope to remain independent.
This has led to a large number of monitoring systems such as the ‘smart homes’, with the aim being to support the inhabitants, typically the elderly or cognitively impaired who are living alone, by monitoring their behaviours and detecting potentially dangerous behaviours. In this context, the ‘smart’ part of the smart home is the ability of the system to learn to use the information provided by the sensors (which are installed in the home) to recognise behaviours and identify deviations from the norm.
In a smart home, the observed behaviours are likely to be the standard human behaviours of living, and the observations will depend upon the sensors that the house is equipped with. There are a variety of sensors that could be used to collect information about the inhabitant’s behaviours. These sensors could range from cameras to body-worn, and to touch and motion sensors. Another sensing method, which is the focus of this paper, is to recognise behaviours through unobtrusive sensors that are attached to household objects such as microwave ovens, cupboards, taps, etc. Such sensors are activated when the inhabitant performs his/her daily activities in the home (e.g., opening or closing the microwave oven’s door could activate the sensor attached to the microwave oven) and this generates a sequence of sensor observations, which corresponds to the order of events involved in carrying out a particular behaviour. For example, a sensor sequence such as closing the bathroom door, turning on the heater, opening the shower door, etc., may correspond to the events of a showering behaviour.
Since sensor observations from the home in some way represent the behaviours of a human inhabitant, the behaviour recognition problem can be viewed as a task of finding a mapping between the sensor information and the behaviours of the inhabitant. In such a smart home, the behaviour itself is performed by the inhabitant, and the sensors detect evidence of that behaviour.
Given a set of behaviour labels that are mapped to sensor information, we can train any supervised machine learning method. However, data from a sensor stream consists of an unending sequence of sensor readings where the start and end of a behaviour are unknown. This poses a challenge to segment the sensory stream into appropriate pieces that represent individual behaviours before any classification can be performed.
Much of the work reported in the literature assumes that behaviours have been segmented. During training, the pre-segmented sequence of sensor readings that correspond to a behaviour are used to train the classifier. The problem arises during testing where the segmented sequence may not necessary represent the entire sequence of sensor readings that correspond to the behaviour of the inhabitant. This can affect the overall recognition performance.
Some approaches address the problem of segmentation using a fixed window length. However, it is unlikely that all of the sequences in the window belong to one behaviour and this results in other behaviours in the window not being recognised. As each behaviour can be described by different number of sensor readings, it is inappropriate to rely on a fixed window length, since it is unlikely that all sequences in the window belong to one behaviour.
This paper addresses this problem by introducing a method that can accurately perform segmentation and behaviour recognition simultaneously on the sensor stream. To evaluate the effectiveness of our proposed method, we use two different smart home datasets and compare the labels produced by our method with the labels assigned by a human.
Related work
One of the commonly used methods for behaviour recognition is the naïve Bayes classifier. Among the earlier work that used the naïve Bayes classifier is the work of Tapia, Intille and Larson [11]. Segmentation is performed by using a set of feature windows, each representing one behaviour, and the length of each feature window was determined by the average duration that the inhabitant took to carry out that behaviour. The naïve Bayes classifier was used to calculate the probability of the current behaviour by shifting the feature windows over the sensor sequences, where the class with the highest likelihood was selected as the label for the behaviour. However, this may lead to inaccurate segmentation for sequences of behaviours with different durations. Yang, Dinh and Chen [15] used 3 windows of fixed length to segment the sensor stream, where each window was set to 0.2 seconds. They applied the naïve Bayes classifier on each window to recognise human physical activity such as walking, sitting, etc. based on accelerometers. However, determining the correct size of the window is important to accurately perform segmentation.
Although the strong independence assumption in the naïve Bayes classifier makes it a tractable approach for learning, correlations among sensors are common in a smart home, which can affect the overall performance of the classifier.

An example of two HMMs, one represents the grooming behaviour and the other HMM represents the showering behaviour. The nodes with double lines refers to sensors that are shared between two behaviours.
The hidden Markov model (HMM), and variants, which all model the temporal information directly into the model, are also popular methods for behaviour recognition [2]. The HMM is a probabilistic graphical model that uses a set of hidden states to classify a sequence of observations over time. The hidden states represent the behaviours of the inhabitant and the observations are the sensor readings. The transition between states allows sequences of behaviours to be modelled. However, HMM becomes difficult to model when the number of states grow.
Some approaches this problem by using a set of HMMs and perform segmentation using a window that slides across the sensor stream. The work of Gaikwad and Narawade [3] recognised behaviours using a set of HMMs and applied a fixed window size to segment the sensor stream. The final behaviour type being determined by majority vote. However, using a pre-determined window size may result in inaccurate segmentation since sequences in the window may contain more than one behaviour.
In the work of Kalra et al. [5], they used a set of Markov chains, each represents a behaviour, and segment the sensor stream by using a variable window length. The maximum likelihood value for each behaviour model is first computed and segmentation is performed when the rate of change between two successive maximum likelihood values exceeded a predefined threshold value. The threshold is determined empirically from the training data set, which means that the data need to acquire a good representation of the behaviours in order to accurately determine the threshold value.
In common with our method, Kellokumpu, Pietikäinen and Heikkilä [6] use a set of HMMs, one for each behaviour, and apply the forward algorithm to monitor likelihood values. However, they do not use a sliding time window, preferring multiple window sizes and thresholding in order to separate out the behaviours.
Some methods represent behaviours hierarchically and use a variant of the HMM such as the hierarchical HMM to recognise behaviours at varying levels of abstraction [8,13]. In this model, the top-level represents the individual behaviours to be recognised (e.g., cooking or making coffee) and the low-level represents a set of individual behaviour events that arise from the sensor values. As this model attempts to recognise a complete model of behaviours, it requires more data for training and has a higher computational complexity, which often becomes intractable for learning.
There are also works [4,7,14] that use the Conditional Random Field (CRF), which relaxes on the independence assumption between sensor observations, for behaviour recognition. However, CRF in general is computationally intractable and often rely on approximation techniques such as Markov Chain Monte-Carlo (MCMC) for inference.
We approach the behaviour recognition problem by using a set of hidden Markov models [9] that each recognises a behaviour (e.g., we have one HMM to represent the ‘showering’ behaviour, another HMM to represent ‘doing laundry’ behaviour, etc.) and they compete to explain the current observations. In our work, the observations are the sensors that are triggered and the hidden states are the events that arise from the observations. For example, a sensor sequence such as closing the bathroom door, turning on the exhaust fan, etc., may correspond to the events of a showering behaviour. Figure 1 shows an example of two HMMs, one HMM represents the ‘grooming’ behaviour and another represents the ‘showering’ behaviour. The double lines shown in the figure refer to the sensors that are shared between these two behaviours. Since there are variations in the behaviours, the HMMs in our work are ergodic, which means that in each HMM, every state is reachable from every other state, in a finite number of steps. The HMMs were each trained using the standard Expectation-Maximisation (EM) algorithm [1] on segmented labelled data.
Once we have trained a set of HMMs
Many methods reported in the literature used a fixed window length to partition the sensor stream. However, the choice of the size of this window is important, because it is unlikely that all of the behaviours in the sequence belong to one behaviour, and so the HMM chosen to represent it will, at best, represent only some of the behaviours in the sequence. To see the importance of the problem, consider the three different cases shown in Fig. 2. In each, a behaviour w takes up much of the window and is the winning behaviour. However, the location of it in the window differs, and we want to ensure that other behaviours in the window are also recognised.

A behaviour w does not need to take up the entire window. Even assuming that the actions in a behaviour are contiguous, it could be (a) at the start of the window, (b) in the middle, or (c) at the end. If the entire window is classified as one behaviour, then a potentially large number of behaviours are missed. D is the window size and
We present an alternative solution to this problem. We use a variable window length that moves over the sequence of observations, where it has the ability to automatically configure the window size based on the sensor observations. We first slide a window of length t across the sensor stream, presenting the t observations in the window to the sets of trained HMMs for competition. A winning HMM
The computation of the conditional probability in the forward algorithm involves 3 steps: (1) initialisation, (2) induction and (3) termination. Let
The initialisation step computes the joint probability of state

Illustration of the forward variable
The termination step is to calculate the probability of the observations given the model

The solid line above the observation sequence shows the possible representations of a winning sequence using the α values. For simplicity, the α value is quantised into the set
By monitoring the forward variable (α) for each sensor observation, we can determine how well the ‘winning’ HMM matches a given observation sequence, i.e., choosing an optimal state sequence that explains the sensor observation. The changes in α value signify a ‘change’ of behaviour from the observation stream. When
If the
The second case occurs when the winning behaviour best describes observations that fall in the middle of the window, e.g.,
With regard to the remaining sequence (
Since the two cases have been considered when the winning behaviour is at the beginning and middle of the window, the only possibilities remaining are that the behaviour is at the end and either stops during the window (Fig. 4(a)) or does not (Fig. 4(c)). The first case is already dealt with, and in the second case, we could simply classify the behaviours in the window as w and start a new one at the end of the current window. If we do so, then we will have two examples of the same behaviour abutting one another. Instead we extend the size of the window (shown as a dashed arrow in Fig. 4(c)) and continue to calculate the α value for each observation until α drops.
The algorithm to simultaneously perform segmentation and behaviour recognition is as follows:
Initialisation
a set of trained HMMs,
sensor stream
D = window size, s = 1, e = D
Competition among HMMs
slide a window across sensor stream so each HMM gets inputs
calculate
compute winning HMM such that
Calculate likelihood for each sensor observation on the window
For each sensor i in
if
update
reinitialise window,
return to step (2)
if
extend window,
repeat step (3)
if
classify
update
reinitialise window,
rerun HMM competition on this new sequence,
To demonstrate our system, we used two different smart home datasets. The first dataset, which is also the primary dataset used in all the experiments, is obtained from the MIT PlaceLab [11]. The second dataset is obtained from van Kasteren [14]. These datasets were annotated with behaviours by the subject living in the home.
For the MIT PlaceLab [11] dataset, they collected data using a set of 77 state-change sensors that were installed in an apartment for a period of 16 days. From the total of 16 days, we used 14 days for training and the remaining 2 days for testing. Since we want to ensure that every behaviour (particularly doing laundry and washing dishes behaviours, which do not occur daily) is seen in the test set, we take pairs of days for testing. The main metric that we are interested in is recognition accuracy, which is the ratio of the total number of behaviours correctly identified by the algorithm over the total number of behaviours used for testing. We repeated the process 8 times, with the final recognition accuracy being calculated by averaging the accuracies in each run.
In [14], the dataset was collected over a period of 24 days, where 14 state-change sensors were used. Since the number of behaviours occurred in each day is relatively small, we used 20 days for training and the remaining 4 days for testing. We repeated the process 6 times and the final recognition accuracy is calculated by averaging the accuracies in each run.
Table 1 shows the different training-testing splits along with the number of behaviour examples and sensor observations that we used for training and testing. We also tried other numbers of days in order to investigate the amount of training data required to train the HMMs. The HMMs were each trained on the relevant labelled data in the training set using the standard Expectation-Maximization (EM) algorithm on segmented labelled data.
The different training-testing splits that we tried, along with the total number of behaviour examples and sensor observations used in each training and testing set
The different training-testing splits that we tried, along with the total number of behaviour examples and sensor observations used in each training and testing set

Illustration of competition between HMMs based on two different behaviours: toileting/showering and preparing meal/snack/beverages. The example is based on the first 85 observations of the 5th test set of MIT PlaceLab dataset. There are other behaviours in this example but they are omitted for clarity. The y-axis shows the different states (events) that arise from the individual sensor values. The ‘unrecognised observation’ label means that none of the states in the particular HMM recognises the sensor observations.
To test our system, we conducted five separate experiments. In the first, we looked at the accuracy of the algorithm to recognise behaviour based on the HMMs competition and variable window length, while in the second we compared the algorithm with fixed window length. In the third experiment, we evaluate the effectiveness of our proposed method with a baseline naïve Bayes classier. In the fourth experiment, we looked at the effects of window size on the efficiency and accuracy of the algorithm. In the fifth experiment, we looked at how much training data was required for accurate results.
Results on supervised learning based on competition between HMMs and variable window length on different datasets and training/testing splits
Results on supervised learning based on competition between HMMs and variable window length on different datasets and training/testing splits
The aim of this experiment is to test the efficacy of our method based on competition between HMMs and a variable window length. In this experiment, we used a default window size of 10, and ran the entire algorithm over the sensor observations on different test sets. The results of sliding this window over the data are shown in Fig. 5, which displays the outputs of the algorithm, with the winning behaviour at each time being clearly visible.
As the figure shows, we can determine that the subject is toileting/showering between observations
The recognition accuracy results across different test sets are shown in Table 2. Our method achieved an overall accuracy of 91% on MIT PlaceLab dataset. A low recognition accuracy is observed on the 8th test set. This is mainly due to the number of cleaning/putting away groceries behaviours that are observed in that test set where our algorithm is not able to identify this behaviour since it involves the same sensors that other behaviours would use.
We also tested on the van Kasteren dataset and our method achieved an overall accuracy of 89.5%. A low recognition accuracy is observed on the 1st test set, for the same reason observed on MIT PlaceLab dataset. Our algorithm is not able to identify the ‘going to bed’ behaviour since it involves the same sensors as the ‘toileting’ and ‘showering’ behaviours.
The results show that our method based on competition between HMMs and variable window length can be used to recognise and segment the sensor stream into behaviours.
This experiment is designed to compare the algorithm with the fixed window length. In this experiment, we used a fixed window length of sizes 5 and 10, and ran the algorithm over the sensory stream on different test sets for each evaluation. For the fixed window length, each window is shifted over the sensor stream with increments according to the window size used in the experiment. E.g., for fixed window length of size 5, the window is shifted over the sensor stream with increments of 5 sensor observations each time. In this experiment, we used the MIT PlaceLab dataset.
Comparison results between the variable window length and fixed window length based on window size = 5 and window size = 10
Comparison results between the variable window length and fixed window length based on window size = 5 and window size = 10
The results are shown in Table 3. We have conducted a significance test to compare the recognition accuracy of fixed window length and our variable window length method on window size = 10. An F-test was first carried out to determine the equivalence of the variances for these two methods. The test statistic is
A Student’s t-test is conducted to test the alternative hypothesis that the average recognition accuracy of the variable window length method is significantly higher than the average recognition accuracy of the fixed window length. The test statistic is
The problem with fixed window length is that it can only identify one winning behaviour and resulting in other behaviours in the window not being identified. Another problem with fixed window length is to determine the correct window size, since it affects the recognition performance. The size of the fixed window length is often determined empirically. As can be seen in Table 3, when the window size = 5 is used, it achieves an accuracy of 65% and drops to 57% when the window size = 10 is used. The accuracy of our proposed method is consistent even when different window sizes are used. We still achieve 91% recognition accuracy on both window sizes = 5 and 10.
In this experiment, we used the naïve Bayes classifier as a baseline to test how effective our proposed method is. The naïve Bayes classifier is a graphical model where the features are independent given the behaviour class. We trained two different naïve Bayes classifiers. The first assumes that the data from the sensor stream has been segmented according to the behaviours of the inhabitant, i.e., the start and end of a behaviour is known.
As can be seen in Table 4, our proposed method (based on competition between HMMs and variable window length) is comparable to the naïve Bayes classifier with an accuracy of 94% on MIT PlaceLab dataset and 98% on van Kasteren dataset. The high accuracy observed in this method is expected compared to our proposed method since the classifier performs classification directly on the segmented data.
The second naïve Bayes classifier performs both segmentation and behaviour recognition simultaneously, where the data are presented to the classifier by using a window that slides over the data. The size of the window is 3, which is determined by taking the average number of sensor observations that described the behaviours. Our proposed method performs better than the baseline naïve Bayes classifier on both datasets, with an accuracy of 91% on MIT PlaceLab dataset and 89.5% on van Kasteren dataset.
Comparison between our proposed method and the baseline naïve Bayes classifier
Comparison between our proposed method and the baseline naïve Bayes classifier
Although our proposed method uses a variable window length that has the ability to automatically configure its window based on the sensor observations, we want to know if the initial window size of the variable window length has any effect on the computational and recognition performances. For this reason, this experiment examines if different sizes of window have any effect on computational performance and recognition accuracy. A variety of window sizes ranging from 2, 5, 7, 10, 20, 30, 40 and 50 was used, each with 10 runs on all test datasets of the MIT PlaceLab. The number of behaviour examples used in these test datasets are shown in Table 1(a).
The results exhibited in Fig. 6 clearly show that computation time grows approximately linearly with the size of window, and therefore a shorter window length is preferred in order to keep the computational costs low. Although the results presented in the figure are based on the 2nd test set, a similar trend was observed in all other test sets.

Boxplot for computational time (in sec) across different window sizes based on 10 runs each on 2nd test set. For each box, the central line is the median, and the edges of the box are the 25th and 75th percentiles.
We also evaluated these different window sizes for recognition accuracy, i.e., the ratio of the total number of behaviours correctly identified by the algorithm over the total number of behaviours used for testing. Table 5 shows the results of using different window sizes. As the table shows, the results are not significantly different across the different window sizes.
The effect of using different window sizes on recognition accuracy
Recognition accuracy on different size of training data on different training-test sets
The objective of this experiment is to analyse the amount of training data needed to train the HMMs. The most important thing is that every behaviour is seen several times in the training set to ensure that the HMM acquires a good representation of that behaviour. From the total of 16 days of MIT PlaceLab data, we tried different splits of the data, from 5 days for training (and 11 days for testing) through 8 days, and 11 days for training. The results on recognition accuracy are presented in Table 6.
As the table shows, the size of training data does not have an impact on recognition accuracy. Even when trained on 5 days, with a small number of behaviour examples per behaviour, we are still able to achieve an accuracy of 94%. It seems that the proposed method does not need that large a set of training data, although this may not be true for more complicated behaviours.
Note that the accuracies presented in Table 2 are based on leave-two-out cross validation method, while the accuracies presented in Table 6 is trained on 5 days, through 8 days, and 11 days. In leave-two-out cross validation method (Table 2), the number of behaviours in each test set is lower since it only consist of 2 days data and thus the weightage of 1 misclassification is much higher. This explains why the accuracy in Table 2 is slightly lower.
Discussion
Our algorithms worked very well, producing over 90% recognition. However, it is still instructive to see if there are consistent reasons for the misclassifications that did occur. We identified one main reason for misclassification, which is that individual sensor observations can be in several behaviours. There are two places where this can be a problem.
The first is when the end of one behaviour contains observations that could be in the start of the next. For example, the last event for preparing lunch could be to put the leftover food in the fridge. After preparing lunch, the inhabitant proceeds to make a cup of coffee, and the first event to make a cup of coffee is to take the milk from the fridge (see observation

(a) Misclassification (at
The second place that this can be seen is where the winning behaviour is not at the start of the window, but those behaviours at the start could be interpreted as being part of that behaviour. This can be seen in Fig. 7(b) where observations
One way to reduce the misclassification is by adding extra information in order to improve the classification accuracy. This can be achieved by augmenting the current algorithm with temporal information. Although our method performs well to recognise the behaviours of the inhabitant, achieving an average accuracy of 91%, the algorithm may take a long time to run if there are large number of competing HMMs. This can be addressed by incorporating spatial information. Since the locations of the sensors implicitly provide some spatial information of where the behaviour occurs, we can use this spatial information to reduce the number competing HMMs.
The current study assumes that actions in a behaviour are contiguous, and that all of the separate parts of the behaviour are different instances of that behaviour. This may not be the case in the real environment, as behaviours are normally interleaved: a person may well make a beverage at the same time as preparing lunch, which could be done while the laundry was running. This system will not deal with these behaviours in any sensible way, which is left for future work.
Algorithms for behaviour recognition generally fall into two categories:- those that are based on an explicit representation of behaviours together with the events that characterise them, and those that mine them from sensor streams. The second has the advantage that we don’t need to know what events constitute a behaviour, and therefore it is the preferred approach by many researchers and also in the work described in this paper.
This paper has presented a system that performs behaviour recognition and segmentation of the sensor stream based on competition between a set of trained Hidden Markov Models and a variable window length. Our experimental results show that our method works effectively, achieving an average accuracy of 91% on MIT PlaceLab dataset and 89.5% on van Kasteren dataset. We have also shown a comparison between variable window length and fixed window length, and that the variable window length works best. We have evaluated our method with a baseline naïve Bayes classifier and have shown that our method performs better. We have investigated different window sizes, and found that relatively short ones work best. The experimental results also shown that our method does not need a large amount of training data.
