A supervised learning approach for behaviour recognition in smart homes

Abstract

One application of Ambient Intelligence (AmI) that supports people in their daily activities is the smart home, which has become a popular topic for research over the past 10 years. The smart home can support the inhabitant in a variety of ways, such as watching for potential risks, detecting any abnormality, adapting the home for environmental conditions and inducing behavioural change. This often requires the smart home to recognise the behaviours of the inhabitant. In this paper, we introduce a method that can accurately recognise the inhabitant’s behaviours. This includes both the segmentation of the sensor stream and the identification of behaviours. We demonstrate our algorithm on sensor data from real smart homes.

Keywords

Behaviour recognition activity segmentation hidden Markov model smart home

1. Introduction

It is a well-reported fact that the populations of the world are aging. In Europe, for example, it has been reported that the number of people aged 65 years and over is projected to increase from 10% of the entire population in 1950 to more than 25% in 2050. Besides the decrease in birth rate, the global life expectancy at birth is projected to increase from 58 years in 1970–1975 to 75 years in 2045–2050 [12]. The group of elderly (aged 65 years and above) is the fastest growing segment of the world’s population.

Older adults are more frequently subject to physical disabilities and cognitive impairments such as diminished sense of touch, slower ability to react, physical weakness and memory problems than younger people. It is clearly impossible to rely solely on increasing the number of caregivers for the elderly, since even now it is difficult and expensive to find care [10]. Additionally, many people are choosing to stay in their own homes as long as possible, and hope to remain independent.

This has led to a large number of monitoring systems such as the ‘smart homes’, with the aim being to support the inhabitants, typically the elderly or cognitively impaired who are living alone, by monitoring their behaviours and detecting potentially dangerous behaviours. In this context, the ‘smart’ part of the smart home is the ability of the system to learn to use the information provided by the sensors (which are installed in the home) to recognise behaviours and identify deviations from the norm.

In a smart home, the observed behaviours are likely to be the standard human behaviours of living, and the observations will depend upon the sensors that the house is equipped with. There are a variety of sensors that could be used to collect information about the inhabitant’s behaviours. These sensors could range from cameras to body-worn, and to touch and motion sensors. Another sensing method, which is the focus of this paper, is to recognise behaviours through unobtrusive sensors that are attached to household objects such as microwave ovens, cupboards, taps, etc. Such sensors are activated when the inhabitant performs his/her daily activities in the home (e.g., opening or closing the microwave oven’s door could activate the sensor attached to the microwave oven) and this generates a sequence of sensor observations, which corresponds to the order of events involved in carrying out a particular behaviour. For example, a sensor sequence such as closing the bathroom door, turning on the heater, opening the shower door, etc., may correspond to the events of a showering behaviour.

Since sensor observations from the home in some way represent the behaviours of a human inhabitant, the behaviour recognition problem can be viewed as a task of finding a mapping between the sensor information and the behaviours of the inhabitant. In such a smart home, the behaviour itself is performed by the inhabitant, and the sensors detect evidence of that behaviour.

Given a set of behaviour labels that are mapped to sensor information, we can train any supervised machine learning method. However, data from a sensor stream consists of an unending sequence of sensor readings where the start and end of a behaviour are unknown. This poses a challenge to segment the sensory stream into appropriate pieces that represent individual behaviours before any classification can be performed.

Much of the work reported in the literature assumes that behaviours have been segmented. During training, the pre-segmented sequence of sensor readings that correspond to a behaviour are used to train the classifier. The problem arises during testing where the segmented sequence may not necessary represent the entire sequence of sensor readings that correspond to the behaviour of the inhabitant. This can affect the overall recognition performance.

Some approaches address the problem of segmentation using a fixed window length. However, it is unlikely that all of the sequences in the window belong to one behaviour and this results in other behaviours in the window not being recognised. As each behaviour can be described by different number of sensor readings, it is inappropriate to rely on a fixed window length, since it is unlikely that all sequences in the window belong to one behaviour.

This paper addresses this problem by introducing a method that can accurately perform segmentation and behaviour recognition simultaneously on the sensor stream. To evaluate the effectiveness of our proposed method, we use two different smart home datasets and compare the labels produced by our method with the labels assigned by a human.

2. Related work

One of the commonly used methods for behaviour recognition is the naïve Bayes classifier. Among the earlier work that used the naïve Bayes classifier is the work of Tapia, Intille and Larson [11]. Segmentation is performed by using a set of feature windows, each representing one behaviour, and the length of each feature window was determined by the average duration that the inhabitant took to carry out that behaviour. The naïve Bayes classifier was used to calculate the probability of the current behaviour by shifting the feature windows over the sensor sequences, where the class with the highest likelihood was selected as the label for the behaviour. However, this may lead to inaccurate segmentation for sequences of behaviours with different durations. Yang, Dinh and Chen [15] used 3 windows of fixed length to segment the sensor stream, where each window was set to 0.2 seconds. They applied the naïve Bayes classifier on each window to recognise human physical activity such as walking, sitting, etc. based on accelerometers. However, determining the correct size of the window is important to accurately perform segmentation.

Although the strong independence assumption in the naïve Bayes classifier makes it a tractable approach for learning, correlations among sensors are common in a smart home, which can affect the overall performance of the classifier.

Fig. 1.

An example of two HMMs, one represents the grooming behaviour and the other HMM represents the showering behaviour. The nodes with double lines refers to sensors that are shared between two behaviours.

The hidden Markov model (HMM), and variants, which all model the temporal information directly into the model, are also popular methods for behaviour recognition [2]. The HMM is a probabilistic graphical model that uses a set of hidden states to classify a sequence of observations over time. The hidden states represent the behaviours of the inhabitant and the observations are the sensor readings. The transition between states allows sequences of behaviours to be modelled. However, HMM becomes difficult to model when the number of states grow.

Some approaches this problem by using a set of HMMs and perform segmentation using a window that slides across the sensor stream. The work of Gaikwad and Narawade [3] recognised behaviours using a set of HMMs and applied a fixed window size to segment the sensor stream. The final behaviour type being determined by majority vote. However, using a pre-determined window size may result in inaccurate segmentation since sequences in the window may contain more than one behaviour.

In the work of Kalra et al. [5], they used a set of Markov chains, each represents a behaviour, and segment the sensor stream by using a variable window length. The maximum likelihood value for each behaviour model is first computed and segmentation is performed when the rate of change between two successive maximum likelihood values exceeded a predefined threshold value. The threshold is determined empirically from the training data set, which means that the data need to acquire a good representation of the behaviours in order to accurately determine the threshold value.

In common with our method, Kellokumpu, Pietikäinen and Heikkilä [6] use a set of HMMs, one for each behaviour, and apply the forward algorithm to monitor likelihood values. However, they do not use a sliding time window, preferring multiple window sizes and thresholding in order to separate out the behaviours.

Some methods represent behaviours hierarchically and use a variant of the HMM such as the hierarchical HMM to recognise behaviours at varying levels of abstraction [8,13]. In this model, the top-level represents the individual behaviours to be recognised (e.g., cooking or making coffee) and the low-level represents a set of individual behaviour events that arise from the sensor values. As this model attempts to recognise a complete model of behaviours, it requires more data for training and has a higher computational complexity, which often becomes intractable for learning.

There are also works [4,7,14] that use the Conditional Random Field (CRF), which relaxes on the independence assumption between sensor observations, for behaviour recognition. However, CRF in general is computationally intractable and often rely on approximation techniques such as Markov Chain Monte-Carlo (MCMC) for inference.

3. Our proposed method

We approach the behaviour recognition problem by using a set of hidden Markov models [9] that each recognises a behaviour (e.g., we have one HMM to represent the ‘showering’ behaviour, another HMM to represent ‘doing laundry’ behaviour, etc.) and they compete to explain the current observations. In our work, the observations are the sensors that are triggered and the hidden states are the events that arise from the observations. For example, a sensor sequence such as closing the bathroom door, turning on the exhaust fan, etc., may correspond to the events of a showering behaviour. Figure 1 shows an example of two HMMs, one HMM represents the ‘grooming’ behaviour and another represents the ‘showering’ behaviour. The double lines shown in the figure refer to the sensors that are shared between these two behaviours. Since there are variations in the behaviours, the HMMs in our work are ergodic, which means that in each HMM, every state is reachable from every other state, in a finite number of steps. The HMMs were each trained using the standard Expectation-Maximisation (EM) algorithm [1] on segmented labelled data.

Once we have trained a set of HMMs $λ_{1}, λ_{2}, \dots, λ_{m}$ , our next task is to recognise the behaviours from the sensor stream. The data that is presented to the HMMs is chosen from the sensor stream using a variable window that moves over the sequence.

Many methods reported in the literature used a fixed window length to partition the sensor stream. However, the choice of the size of this window is important, because it is unlikely that all of the behaviours in the sequence belong to one behaviour, and so the HMM chosen to represent it will, at best, represent only some of the behaviours in the sequence. To see the importance of the problem, consider the three different cases shown in Fig. 2. In each, a behaviour w takes up much of the window and is the winning behaviour. However, the location of it in the window differs, and we want to ensure that other behaviours in the window are also recognised.

Fig. 2.

A behaviour w does not need to take up the entire window. Even assuming that the actions in a behaviour are contiguous, it could be (a) at the start of the window, (b) in the middle, or (c) at the end. If the entire window is classified as one behaviour, then a potentially large number of behaviours are missed. D is the window size and $O_{1}, O_{2}, \dots, O_{T}$ is the observation sequence.

We present an alternative solution to this problem. We use a variable window length that moves over the sequence of observations, where it has the ability to automatically configure the window size based on the sensor observations. We first slide a window of length t across the sensor stream, presenting the t observations in the window to the sets of trained HMMs for competition. A winning HMM $λ_{winner}$ is chosen based on the HMM that maximises the likelihood of the t observations in the window, $O_{1}, O_{2}, \dots, O_{T}$ , i.e., $λ_{winner} = {arg max}_{j} P (O_{1}, O_{2}, \dots, O_{T} | λ_{j})$ . Since we want to ensure that the majority of behaviours in the window are recognised, we perform a re-segmentation using the forward algorithm. This is achieved by calculating the likelihood of each sensor observation in the window according to the model of the winning HMM.

The computation of the conditional probability in the forward algorithm involves 3 steps: (1) initialisation, (2) induction and (3) termination. Let $α_{t} (i) = P (O_{1}, O_{2}, \dots, O_{t}, q_{t} = S_{i} | λ)$ , the probability that we have seen the first t observations and ended up in state $S_{i}$ .

The initialisation step computes the joint probability of state $S_{i}$ and the initial observation $O_{1}$ : $\begin{array}{rcl} (1) & α_{1} (i) & = & π_{i} b_{i} (O_{1}); i = 1, 2, \dots, N, \end{array}$ where $π_{i}$ is the initial state distribution and $b_{i}$ is probability of observing observation $O_{1}$ given that current state is $S_{i}$ . The induction step involves the computation of the forward variable $α_{t + 1} (j)$ . As illustrated in Fig. 3, $α_{t + 1} (j)$ is the probability that we have seen the first $t + 1$ observations and ended up in state $S_{j}$ where $\begin{array}{l} α_{t + 1} (j) = \sum_{i = 1}^{N} α_{t} (i) a_{i j} b_{j} (O_{t + 1}), \\ (2) & t = 1, 2, \dots, T - 1, j = 1, 2, \dots, N . \end{array}$ $a_{i j}$ is the probability of transition from state $S_{i}$ at time t to state $S_{j}$ at time $t + 1$ . The product of $α_{t} (i) a_{i j}$ is the probability that $O_{1}, O_{2}, \dots, O_{t}$ are observed and state $S_{j}$ is reached at time $t + 1$ from state $S_{i}$ at time t. Since it is possible to reach state $S_{j}$ at time $t + 1$ from any of the N states at time t, the probability of $S_{j}$ at time $t + 1$ is thus the summation of $α_{t} (i) a_{i j}$ over all the N states at time t. Multiplying this with the probability $b_{j} (O_{t + 1})$ gives the value of $α_{t + 1} (j)$ . The computation iterates for all the observations $t = 1, 2, \dots, T - 1$ following Eq. (2).

Fig. 3.

Illustration of the forward variable $α_{t + 1} (j)$ and the computation of $α_{t} (i)$ upon the lattice structure of observations $O_{t}$ and states $S_{i}$ . This figure is modified from [9] to illustrate how the forward variable is computed in the context of this paper.

The termination step is to calculate the probability of the observations given the model $P (O | λ)$ , which is just the sum of all $α_{T} (i)$ : $\begin{array}{rcl} (3) & P (O | λ) = \sum_{i = 1}^{N} α_{T} (i) . \end{array}$

Fig. 4.

The solid line above the observation sequence shows the possible representations of a winning sequence using the α values. For simplicity, the α value is quantised into the set ${0, 1}$ . D is the default window size and $O_{1}, O_{2}, \dots, O_{T}$ is the observation sequence. The original observation sequence is shown as the down curly brace. For details, see the text.

By monitoring the forward variable (α) for each sensor observation, we can determine how well the ‘winning’ HMM matches a given observation sequence, i.e., choosing an optimal state sequence that explains the sensor observation. The changes in α value signify a ‘change’ of behaviour from the observation stream. When $α > 0$ , it means the model of the ‘winning’ HMM recognises the sensor observations, while $α = 0$ means that the winning HMM does not ‘recognise’ the sensor observation, i.e., none of the states in the HMM recognises the sensor observation. Figure 4 shows an interpretation of Fig. 2 in terms of the α values computed by the forward algorithm applied to one particular HMM, the one selected as the ‘winner’ for this window. To simplify the illustration, the α value in the figure is quantised into the set ${0, 1}$ . When $α = 1$ , it means that the winning HMM recognises the sensor observations and 0 otherwise.

If the $α > 0$ at the beginning of the observation sequence then it is likely that the case in Fig. 4(a) is occurring. Following Fig. 4(a) we see that there is a drop in α value between observations $O_{5}$ and $O_{6}$ , which suggests that the behaviour has changed. We can therefore classify $O_{1}, O_{2}, \dots, O_{5}$ as belonging to the winning behaviour w (i.e., grooming), and then initialise a new window of default size $(D)$ at $O_{6}$ . When D is initialised, all the observations within D will then be fed to HMMs for competition and the process iterates.

The second case occurs when the winning behaviour best describes observations that fall in the middle of the window, e.g., $O_{4}, O_{5}, \dots, O_{8}$ in Fig. 4(b). Since the winning behaviour (w) does not describe observations $O_{1}$ , $O_{2}$ and $O_{3}$ , the probability for these three observations is low (i.e, $α = 0$ ) and we observe a jump in the α value at $O_{4}$ . When this is observed, a new window $(D_{2})$ is initialised that contains only the three observations (i.e., $O_{1}$ , $O_{2}$ , $O_{3}$ ) that are not explained by behaviour w. The HMM competition is then rerun on this window, where the winning behaviour for this window is ‘doing laundry’.

With regard to the remaining sequence ( $O_{4}$ and onwards) it would be possible to use HMM w (i.e., making coffee) and continue to monitor the α values. However, it was observed that sometimes there may be an overlap in individual sensor activations (e.g., $O_{4}$ and $O_{5}$ ) between ‘preparing snack’ and ‘making coffee’ behaviours. If we continue to use the winning HMM and monitor the α values, an additional step would be required by the algorithm to calculate the α value on $O_{4}$ and $O_{5}$ , which is not even described by the winning HMM. For this reason, a new window of default size is started at $O_{4}$ and the HMM competition is rerun on this sequence.

Since the two cases have been considered when the winning behaviour is at the beginning and middle of the window, the only possibilities remaining are that the behaviour is at the end and either stops during the window (Fig. 4(a)) or does not (Fig. 4(c)). The first case is already dealt with, and in the second case, we could simply classify the behaviours in the window as w and start a new one at the end of the current window. If we do so, then we will have two examples of the same behaviour abutting one another. Instead we extend the size of the window (shown as a dashed arrow in Fig. 4(c)) and continue to calculate the α value for each observation until α drops.

The algorithm to simultaneously perform segmentation and behaviour recognition is as follows:

Initialisation

a set of trained HMMs, $λ_{1}, λ_{2}, \dots, λ_{m}$ , where m = total number of behaviour classes

sensor stream $O_{1}, O_{2}, \dots, O_{T}$ , where T = length of sensor stream

D = window size, s = 1, e = D

Competition among HMMs

slide a window across sensor stream so each HMM gets inputs $(O_{s}, O_{s + 1}, \dots, O_{e})$

calculate $P (O_{s}, O_{s + 1}, \dots, O_{e} | λ_{j})$ according to the model of each HMMs, $λ_{1}, λ_{2}, \dots, λ_{m}$

compute winning HMM such that $\begin{matrix} λ_{winner} = \underset{j}{arg max} P (O_{s}, O_{s + 1}, \dots, O_{e} | λ_{j}) \end{matrix}$

Calculate likelihood for each sensor observation on the window

For each sensor i in $O_{s}, O_{s + 1}, \dots, O_{e}$ , calculate the likelihood $α_{i}$ (using Eqs (1) and (2)) according to the winning HMM, $λ_{winner}$

if $α_{i} = 0$ , repeat step (3) until $α_{i} > 0$

update $s = i$

reinitialise window, $e = s + D$

return to step (2)

if $α_{i} > 0$ and current sensor observation is at the end of the window (i.e., $i = e$ )

extend window, $e = i + D$

repeat step (3)

if $α_{i} > 0$ and current sensor observation is not at the end of the window (i.e., $i < e$ ), repeat step (3) until $α_{i} = 0$

classify $O_{s}$ , $O_{s + 1}$ , $O_{i - 1}$ as $λ_{winner}$

update $s = i$

reinitialise window, $e = s + D$

rerun HMM competition on this new sequence, $O_{s}, O_{s + 1}, \dots, O_{e}$ by returning to step (2)

4. Evaluation method

To demonstrate our system, we used two different smart home datasets. The first dataset, which is also the primary dataset used in all the experiments, is obtained from the MIT PlaceLab [11]. The second dataset is obtained from van Kasteren [14]. These datasets were annotated with behaviours by the subject living in the home.

For the MIT PlaceLab [11] dataset, they collected data using a set of 77 state-change sensors that were installed in an apartment for a period of 16 days. From the total of 16 days, we used 14 days for training and the remaining 2 days for testing. Since we want to ensure that every behaviour (particularly doing laundry and washing dishes behaviours, which do not occur daily) is seen in the test set, we take pairs of days for testing. The main metric that we are interested in is recognition accuracy, which is the ratio of the total number of behaviours correctly identified by the algorithm over the total number of behaviours used for testing. We repeated the process 8 times, with the final recognition accuracy being calculated by averaging the accuracies in each run.

In [14], the dataset was collected over a period of 24 days, where 14 state-change sensors were used. Since the number of behaviours occurred in each day is relatively small, we used 20 days for training and the remaining 4 days for testing. We repeated the process 6 times and the final recognition accuracy is calculated by averaging the accuracies in each run.

Table 1 shows the different training-testing splits along with the number of behaviour examples and sensor observations that we used for training and testing. We also tried other numbers of days in order to investigate the amount of training data required to train the HMMs. The HMMs were each trained on the relevant labelled data in the training set using the standard Expectation-Maximization (EM) algorithm on segmented labelled data.

Table 1
The different training-testing splits that we tried, along with the total number of behaviour examples and sensor observations used in each training and testing set

(a) MIT PlaceLab Dataset

Training-test Sets No. of Behaviour Examples No. of Sensor Observations

Training Testing Training Testing

1st set 279 31 1672 133

2nd set 256 54 1456 349

3rd set 290 20 1688 117

4th set 277 33 1561 244

5th set 261 49 1563 242

6th set 276 34 1592 213

7th set 273 37 1625 180

8th set 258 52 1478 327

(b) van Kasteren Dataset

Training-test Sets No. of Behaviour Examples No. of Sensor Observations

Training Testing Training Testing

1st set 277 42 1137 181

2nd set 239 80 983 335

3rd set 266 53 1140 178

4th set 270 49 1086 232

5th set 288 31 1197 121

6th set 255 64 1047 271

5. Experimental results

Fig. 5.

Illustration of competition between HMMs based on two different behaviours: toileting/showering and preparing meal/snack/beverages. The example is based on the first 85 observations of the 5th test set of MIT PlaceLab dataset. There are other behaviours in this example but they are omitted for clarity. The y-axis shows the different states (events) that arise from the individual sensor values. The ‘unrecognised observation’ label means that none of the states in the particular HMM recognises the sensor observations.

To test our system, we conducted five separate experiments. In the first, we looked at the accuracy of the algorithm to recognise behaviour based on the HMMs competition and variable window length, while in the second we compared the algorithm with fixed window length. In the third experiment, we evaluate the effectiveness of our proposed method with a baseline naïve Bayes classier. In the fourth experiment, we looked at the effects of window size on the efficiency and accuracy of the algorithm. In the fifth experiment, we looked at how much training data was required for accurate results.

5.1. Experiment 1: Competition between HMMs and variable window length

Table 2
Results on supervised learning based on competition between HMMs and variable window length on different datasets and training/testing splits

(a) MIT PlaceLab Dataset

Test Sets No. of Behaviour Examples for Testing No. of Behaviours Correctly Identified Recognition Accuracy

1st set 31 28 90%

2nd set 54 49 91%

3rd set 20 19 95%

4th set 33 30 91%

5th set 49 45 92%

6th set 34 31 91%

7th set 37 35 95%

8th set 52 44 85%

Average 91%

Standard Deviation 3.2

(b) van Kasteren Dataset

Test Sets No. of Behaviour Examples for Testing No. of Behaviours Correctly Identified Recognition Accuracy

1st set 42 36 86%

2nd set 80 72 90%

3rd set 53 49 92%

4th set 49 43 88%

5th set 31 28 90%

6th set 64 58 91%

Average 89.5%

Standard Deviation 2.2

The aim of this experiment is to test the efficacy of our method based on competition between HMMs and a variable window length. In this experiment, we used a default window size of 10, and ran the entire algorithm over the sensor observations on different test sets. The results of sliding this window over the data are shown in Fig. 5, which displays the outputs of the algorithm, with the winning behaviour at each time being clearly visible.

As the figure shows, we can determine that the subject is toileting/showering between observations $O_{5}$ and $O_{20}$ , followed by preparing meal/snack/beverages between observation $O_{23}$ and $O_{27}$ . The y-axis shows the different states (events) that arise from the individual sensor values when performing a behaviour. The ‘unrecognised observation’ label shown in the figure means that none of the states in the particular HMM recognises the sensor observations. This is observed when $α = 0$ .

The recognition accuracy results across different test sets are shown in Table 2. Our method achieved an overall accuracy of 91% on MIT PlaceLab dataset. A low recognition accuracy is observed on the 8th test set. This is mainly due to the number of cleaning/putting away groceries behaviours that are observed in that test set where our algorithm is not able to identify this behaviour since it involves the same sensors that other behaviours would use.

We also tested on the van Kasteren dataset and our method achieved an overall accuracy of 89.5%. A low recognition accuracy is observed on the 1st test set, for the same reason observed on MIT PlaceLab dataset. Our algorithm is not able to identify the ‘going to bed’ behaviour since it involves the same sensors as the ‘toileting’ and ‘showering’ behaviours.

The results show that our method based on competition between HMMs and variable window length can be used to recognise and segment the sensor stream into behaviours.

5.2. Experiment 2: Comparison between variable window length and fixed window length

This experiment is designed to compare the algorithm with the fixed window length. In this experiment, we used a fixed window length of sizes 5 and 10, and ran the algorithm over the sensory stream on different test sets for each evaluation. For the fixed window length, each window is shifted over the sensor stream with increments according to the window size used in the experiment. E.g., for fixed window length of size 5, the window is shifted over the sensor stream with increments of 5 sensor observations each time. In this experiment, we used the MIT PlaceLab dataset.

Table 3
Comparison results between the variable window length and fixed window length based on window size = 5 and window size = 10

Test Sets Recognition Accuracy

Window Size = 5 Window Size = 10

Variable Window Length Fixed Window Length Variable Window Length Fixed Window Length

1st Set 90% 55% 90% 48%

2nd Set 91% 69% 91% 54%

3rd Set 95% 75% 95% 75%

4th Set 91% 76% 91% 58%

5th Set 92% 49% 92% 51%

6th Set 91% 68% 91% 62%

7th Set 95% 70% 95% 59%

8th Set 85% 60% 85% 46%

Average 91% 65% 91% 57%

Std. Deviation 3.2 9.6 3.2 9.3

Test Sets	Recognition Accuracy
1st Set	90%	55%	90%	48%
2nd Set	91%	69%	91%	54%
3rd Set	95%	75%	95%	75%
4th Set	91%	76%	91%	58%
5th Set	92%	49%	92%	51%
6th Set	91%	68%	91%	62%
7th Set	95%	70%	95%	59%
8th Set	85%	60%	85%	46%
Average	91%	65%	91%	57%
Std. Deviation	3.2	9.6	3.2	9.3

The results are shown in Table 3. We have conducted a significance test to compare the recognition accuracy of fixed window length and our variable window length method on window size = 10. An F-test was first carried out to determine the equivalence of the variances for these two methods. The test statistic is $F = (\frac{S_{1}^{2}}{S_{2}^{2}}) (\frac{σ_{2}^{2}}{σ_{1}^{2}}) = 0.1183$ with p-value = 0.005. Thus, the null hypothesis of equal variances is rejected.

A Student’s t-test is conducted to test the alternative hypothesis that the average recognition accuracy of the variable window length method is significantly higher than the average recognition accuracy of the fixed window length. The test statistic is $T = \frac{({\bar{x}}_{1} - {\bar{x}}_{2})}{\sqrt{\frac{S_{1}^{2}}{n_{1}} + \frac{S_{2}^{2}}{n_{2}}}} = \frac{34}{3.48} = 9.8$ with $p - value = 2.5 \times 10^{- 6}$ . Thus, the null hypothesis is rejected and we can conclude that the average accuracy of our variable window length method is significantly higher than the average accuracy of the fixed window length and that our approach of re-segmentation improves the recognition results, with an average accuracy of 91%.

The problem with fixed window length is that it can only identify one winning behaviour and resulting in other behaviours in the window not being identified. Another problem with fixed window length is to determine the correct window size, since it affects the recognition performance. The size of the fixed window length is often determined empirically. As can be seen in Table 3, when the window size = 5 is used, it achieves an accuracy of 65% and drops to 57% when the window size = 10 is used. The accuracy of our proposed method is consistent even when different window sizes are used. We still achieve 91% recognition accuracy on both window sizes = 5 and 10.

5.3. Experiment 3: Comparison with the baseline method

In this experiment, we used the naïve Bayes classifier as a baseline to test how effective our proposed method is. The naïve Bayes classifier is a graphical model where the features are independent given the behaviour class. We trained two different naïve Bayes classifiers. The first assumes that the data from the sensor stream has been segmented according to the behaviours of the inhabitant, i.e., the start and end of a behaviour is known.

As can be seen in Table 4, our proposed method (based on competition between HMMs and variable window length) is comparable to the naïve Bayes classifier with an accuracy of 94% on MIT PlaceLab dataset and 98% on van Kasteren dataset. The high accuracy observed in this method is expected compared to our proposed method since the classifier performs classification directly on the segmented data.

The second naïve Bayes classifier performs both segmentation and behaviour recognition simultaneously, where the data are presented to the classifier by using a window that slides over the data. The size of the window is 3, which is determined by taking the average number of sensor observations that described the behaviours. Our proposed method performs better than the baseline naïve Bayes classifier on both datasets, with an accuracy of 91% on MIT PlaceLab dataset and 89.5% on van Kasteren dataset.

Table 4
Comparison between our proposed method and the baseline naïve Bayes classifier

(a) MIT PlaceLab Dataset

Test Sets Recognition Accuracy

Competition between HMMs with variable window length Naïve Bayes Classifier

On Segmented Sensor Stream On Unsegmented Sensor Stream

1st Set 90% 96% 80%

2nd Set 91% 91% 83%

3rd Set 95% 95% 77%

4th Set 91% 94% 80%

5th Set 92% 96% 77%

6th Set 91% 91% 75%

7th Set 95% 97% 88%

8th Set 85% 93% 81%

Average 91% 94% 80%

Std. Deviation 3.2 2.3 4.1

(b) van Kasteren Dataset

Test Sets Recognition Accuracy

Competition between HMMs with variable window length Naïve Bayes Classifier

On Segmented Sensor Stream On Unsegmented Sensor Stream

1st Set 86% 100% 85%

2nd Set 90% 95% 82%

3rd Set 92% 98% 83%

4th Set 88% 100% 91%

5th Set 90% 97% 83%

6th Set 91% 95% 86%

Average 89.5% 97.5% 85%

Std. Deviation 2.2 2.3 3.3

5.4. Experiment 4: Size of default window

Although our proposed method uses a variable window length that has the ability to automatically configure its window based on the sensor observations, we want to know if the initial window size of the variable window length has any effect on the computational and recognition performances. For this reason, this experiment examines if different sizes of window have any effect on computational performance and recognition accuracy. A variety of window sizes ranging from 2, 5, 7, 10, 20, 30, 40 and 50 was used, each with 10 runs on all test datasets of the MIT PlaceLab. The number of behaviour examples used in these test datasets are shown in Table 1(a).

The results exhibited in Fig. 6 clearly show that computation time grows approximately linearly with the size of window, and therefore a shorter window length is preferred in order to keep the computational costs low. Although the results presented in the figure are based on the 2nd test set, a similar trend was observed in all other test sets.

Fig. 6.

Boxplot for computational time (in sec) across different window sizes based on 10 runs each on 2nd test set. For each box, the central line is the median, and the edges of the box are the 25th and 75th percentiles.

We also evaluated these different window sizes for recognition accuracy, i.e., the ratio of the total number of behaviours correctly identified by the algorithm over the total number of behaviours used for testing. Table 5 shows the results of using different window sizes. As the table shows, the results are not significantly different across the different window sizes.

Table 5

The effect of using different window sizes on recognition accuracy

Test Sets	Recognition Accuracy on Different Window Sizes

	2	5	7	10	20	30	40	50
1st Set	90%	90%	90%	90%	90%	90%	90%	90%
2nd Set	91%	91%	91%	91%	91%	89%	89%	89%
3rd Set	95%	95%	95%	95%	90%	90%	90%	90%
4th Set	91%	91%	91%	91%	91%	91%	91%	91%
5th Set	92%	92%	92%	92%	92%	92%	92%	92%
6th Set	91%	91%	91%	91%	91%	91%	91%	91%
7th Set	95%	95%	95%	95%	95%	97%	97%	97%
8th Set	85%	85%	85%	85%	85%	85%	83%	83%
Average	91%	91%	91%	91%	91%	91%	90%	90%
Std. Deviation	3.2	3.2	3.2	3.2	2.8	3.3	3.9	3.9

Table 6

Recognition accuracy on different size of training data on different training-test sets

Training Set		Test Set		Recognition Accuracy

No. of Days	No. of Behaviour Examples	No. of Days	No. of Behaviour Examples
5 Days	95	11 Days	215	94%
8 Days	138	8 Days	172	94%
11 Days	203	5 Days	107	93%
Average				94%
Standard Deviation				0.6

5.5. Experiment 5: Size of training data

The objective of this experiment is to analyse the amount of training data needed to train the HMMs. The most important thing is that every behaviour is seen several times in the training set to ensure that the HMM acquires a good representation of that behaviour. From the total of 16 days of MIT PlaceLab data, we tried different splits of the data, from 5 days for training (and 11 days for testing) through 8 days, and 11 days for training. The results on recognition accuracy are presented in Table 6.

As the table shows, the size of training data does not have an impact on recognition accuracy. Even when trained on 5 days, with a small number of behaviour examples per behaviour, we are still able to achieve an accuracy of 94%. It seems that the proposed method does not need that large a set of training data, although this may not be true for more complicated behaviours.

Note that the accuracies presented in Table 2 are based on leave-two-out cross validation method, while the accuracies presented in Table 6 is trained on 5 days, through 8 days, and 11 days. In leave-two-out cross validation method (Table 2), the number of behaviours in each test set is lower since it only consist of 2 days data and thus the weightage of 1 misclassification is much higher. This explains why the accuracy in Table 2 is slightly lower.

5.6. Discussion

Our algorithms worked very well, producing over 90% recognition. However, it is still instructive to see if there are consistent reasons for the misclassifications that did occur. We identified one main reason for misclassification, which is that individual sensor observations can be in several behaviours. There are two places where this can be a problem.

The first is when the end of one behaviour contains observations that could be in the start of the next. For example, the last event for preparing lunch could be to put the leftover food in the fridge. After preparing lunch, the inhabitant proceeds to make a cup of coffee, and the first event to make a cup of coffee is to take the milk from the fridge (see observation $O_{5}$ in Fig. 7(a)). This will not pose a problem if the second behaviour (i.e., making coffee) happens immediately after the first (i.e., preparing lunch). However, if the second behaviour happened two hours after the first, that would be a totally different unrelated behaviour.

Fig. 7.

(a) Misclassification (at $O_{5}$ , shown in dashed arrow) occurs when the end of one behaviour contains the observations that could be in the start of the next behaviour. (b) Observations $O_{1}$ , $O_{2}$ and $O_{3}$ (shown in dashed box) are misclassified as ‘preparing meal’. This occurs when the winning behaviour (i.e., ‘preparing meal’) is not at the start of the window, but behaviours at the start could be interpreted as being part of that behaviour. Solid arrows represent ground truth and dashed arrows represent the predicted output.

The second place that this can be seen is where the winning behaviour is not at the start of the window, but those behaviours at the start could be interpreted as being part of that behaviour. This can be seen in Fig. 7(b) where observations $O_{1}$ , $O_{2}$ and $O_{3}$ are misclassified as ‘preparing meal’. It was experimentally observed that this was more likely to happen where the size of the window was large, because more behaviours were observed.

One way to reduce the misclassification is by adding extra information in order to improve the classification accuracy. This can be achieved by augmenting the current algorithm with temporal information. Although our method performs well to recognise the behaviours of the inhabitant, achieving an average accuracy of 91%, the algorithm may take a long time to run if there are large number of competing HMMs. This can be addressed by incorporating spatial information. Since the locations of the sensors implicitly provide some spatial information of where the behaviour occurs, we can use this spatial information to reduce the number competing HMMs.

The current study assumes that actions in a behaviour are contiguous, and that all of the separate parts of the behaviour are different instances of that behaviour. This may not be the case in the real environment, as behaviours are normally interleaved: a person may well make a beverage at the same time as preparing lunch, which could be done while the laundry was running. This system will not deal with these behaviours in any sensible way, which is left for future work.

6. Conclusions

Algorithms for behaviour recognition generally fall into two categories:- those that are based on an explicit representation of behaviours together with the events that characterise them, and those that mine them from sensor streams. The second has the advantage that we don’t need to know what events constitute a behaviour, and therefore it is the preferred approach by many researchers and also in the work described in this paper.

This paper has presented a system that performs behaviour recognition and segmentation of the sensor stream based on competition between a set of trained Hidden Markov Models and a variable window length. Our experimental results show that our method works effectively, achieving an average accuracy of 91% on MIT PlaceLab dataset and 89.5% on van Kasteren dataset. We have also shown a comparison between variable window length and fixed window length, and that the variable window length works best. We have evaluated our method with a baseline naïve Bayes classifier and have shown that our method performs better. We have investigated different window sizes, and found that relatively short ones work best. The experimental results also shown that our method does not need a large amount of training data.

References

Dempster,

Laird and

Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B (Methodological) 39(1) (1977), 1–38.

M.-A.

Dragan and

Mocanu, Human activity recognition in smart environments, in: Proc. of the 19th International Conference on Control Systems and Computer Science (CSCS), 2013, pp. 495–502.

M.S.K.

Gaikwad and

Narawade, HMM classifier for human activity recognition, Computer Science & Engineering: An International Journal (CSEIJ) 2(4) (2012), 27–36.

Q.-B.

Gao and

S.-L.

Sun, Trajectory-based human activity recognition using Hidden Conditional Random Fields, in: International Conference on Machine Learning and Cybernetics (ICMLC), 2012, pp. 1091–1097.

Kalra,

Zhao,

Soto and

Milios, Detection of daily living activities using a two-stage Markov model, Journal of Ambient Intelligence and Smart Environments 5 (2013), 273–285.

Kellokumpu,

Pietikäinen and

Heikkilä, Human activity recognition using sequences of postures, in: Machine Vision Applications, 2005, pp. 570–573.

Nazerfard,

Das,

Holder and

Cook, Conditional random fields for activity recognition in smart environments, in: Proc. of the ACM International Conference on Health Informatics (SIGHIT), 2010, pp. 282–286.

Pinquier,

Karaman,

Letoupin,

Guyot,

Megret,

Benois-Pineau,

Gaestel and

J.-F.

Dartigues, Strategies for multiple feature fusion with Hierarchical HMM: Application to activity recognition from wearable audiovisual sensors, in: Proc. of the 21st International Conference on Pattern Recognition (ICPR), 2012, pp. 3192–3195.

Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. of the IEEE 77(2) (1989), 257–286.

10.

Simoens,

Villeneuve and

Hurst, Tackling nurse shortages in OECD countries, OECD Health Working Papers No. 19, 2005.

11.

Tapia,

Intille and

Larson, Activity recognition in the home using simple and ubiquitous sensors, in: Pervasive, 2004, pp. 158–175.

12.

United Nations, World population prospects: The 2006 revision, http://www.un.org/esa/population/publications/wpp2006/.

13.

van Kasteren,

Englebienne and

Kröse, Hierarchical activity recognition using automatically clustered actions, in: Proc. of the Second International Joint Conference on AmI 2011, pp. 16–18.

14.

van Kasteren,

Noulas,

Englebienne and

Kröse, Accurate activity recognition in a home setting, in: UbiComp, 2008, pp. 1–9.

15.

Yang,

Dinh and

Chen, Implementation of a wearerable real-time system for physical activity recognition based on Naive Bayes classifier, in: International Conference on Bioinformatics and Biomedical Technology (ICBBT), 2010, pp. 101–105.

(a) MIT PlaceLab Dataset
Training-test Sets	No. of Behaviour Examples		No. of Sensor Observations

	Training	Testing	Training	Testing
1st set	279	31	1672	133
2nd set	256	54	1456	349
3rd set	290	20	1688	117
4th set	277	33	1561	244
5th set	261	49	1563	242
6th set	276	34	1592	213
7th set	273	37	1625	180
8th set	258	52	1478	327

(b) van Kasteren Dataset
Training-test Sets	No. of Behaviour Examples		No. of Sensor Observations

	Training	Testing	Training	Testing
1st set	277	42	1137	181
2nd set	239	80	983	335
3rd set	266	53	1140	178
4th set	270	49	1086	232
5th set	288	31	1197	121
6th set	255	64	1047	271

(a) MIT PlaceLab Dataset
Test Sets	No. of Behaviour Examples for Testing	No. of Behaviours Correctly Identified	Recognition Accuracy
1st set	31	28	90%
2nd set	54	49	91%
3rd set	20	19	95%
4th set	33	30	91%
5th set	49	45	92%
6th set	34	31	91%
7th set	37	35	95%
8th set	52	44	85%
Average			91%
Standard Deviation			3.2

Test Sets	Recognition Accuracy

	Window Size = 5		Window Size = 10

	Variable Window Length	Fixed Window Length	Variable Window Length	Fixed Window Length
1st Set	90%	55%	90%	48%
2nd Set	91%	69%	91%	54%
3rd Set	95%	75%	95%	75%
4th Set	91%	76%	91%	58%
5th Set	92%	49%	92%	51%
6th Set	91%	68%	91%	62%
7th Set	95%	70%	95%	59%
8th Set	85%	60%	85%	46%
Average	91%	65%	91%	57%
Std. Deviation	3.2	9.6	3.2	9.3