Privacy implications of room climate data

Abstract

Smart heating applications promise to increase energy efficiency and comfort by collecting and processing room climate data. While it has been suspected that the sensed data may leak crucial personal information about the occupants, this belief has up until now not been supported by evidence.

In this work, we investigate privacy risks arising from the collection of room climate measurements. We assume that an attacker has access to the most basic measurements only: temperature and relative humidity. We train machine learning classifiers to predict the presence and number of room occupants and to discriminate between different types of activities. On data that was collected at three different locations, we show that occupancy can be detected from data measured by a single sensor with up to $93.5 %$ accuracy. One can even distinguish between the cases that no, one, or two persons are present with up to $66.4 %$ accuracy. Moreover, the four actions reading, working on a PC, standing, and walking, can be discriminated with up to $56.8 %$ accuracy, which is likewise clearly better than guessing ( $25 %$ ). Constraining the set of actions allows to achieve even higher prediction rates. For example, we discriminate standing and walking occupants with $96.3 %$ accuracy. In addition, we show that the accuracy can be increased in most cases if an attacker has access to measurements from two different sensors located in the same room.

Our results provide evidence that even the leakage of such ‘inconspicuous’ data as temperature and relative humidity can seriously violate privacy.

Keywords

Privacy room climate occupany detection activity recognition

1. Introduction

The vision of the Internet of Things (IoT) is to enhance work processes, energy efficiency, and living comfort by interconnecting actuators, mobile devices and sensors. These networks of embedded technologies enable applications such as smart heating, home automation, and smart metering, among many others. Sensors are of crucial importance in these applications. Data gathered by sensors is used to represent the current state of the environment, for instance in smart heating, sensors measure the room climate. Using these information and a user-defined configuration of the targeted state of room climate, the application regulates heating, ventilation, and air conditioning.

While the collection of room climate data is obviously essential to enable smart heating, it may at the same time impose the risk of privacy violations. Consequently, it is commonly believed among security experts that leaking room climate data may result in privacy violations and hence that the data needs to be cryptographically protected [5,15,46]. However, these claims have not been supported by scientific evidence so far. Thus, one could question whether in practice additional effort for protecting the data would be justified.

The current situation with room climate data is comparable to the area of smart metering [22,27,30,52]. In 1989, Hart [22] was the first to draw attention to the fact that smart metering appliances can be exploited as surveillance devices. Since then, research has shown far-reaching privacy violations through fine-granular power consumption monitoring, ranging from occupancy and everyday activities detection [35] up to recognizing which program a TV was displaying [18].

Various techniques have been proposed over the years to mitigate privacy risks of smart metering [4,28,43,44,52]. This issue has become such a grave concern that the German Federal Office for Information Security published a protection profile for smart meters in 2014 [2]. By considering privacy implications of smart heating, we hope to initiate consumer protection research and policy debate in this area, analogous to the developments in smart metering described above.

1.1. Research questions

In this work, we are the first to investigate room climate data from the perspective of possible privacy violations. More precisely, we address the following research questions:

Occupancy detection: Can an attacker determine the presence of a person in a room using only room climate data, i.e., temperature and relative humidity?

Occupancy estimation: Can an attacker determine the number of persons present in a room using only this room climate data?

Activity recognition: Can an attacker recognize activities of the occupant in the room using only the temperature and relative humidity data?

Our threat scenario targets buildings with multiple rooms that are similar in size, layout, furnishing, and positions of the sensors. These properties are typical for office buildings, dormitories, cruise ships, and hotels, among others. Assuming that an attacker is able to train a classifier that recognizes pre-defined activities, possible privacy violations are, e.g., tracking presence and working practices of employees in offices, or the disclosure of lifestyle and intimate activities in private spaces. All these situations present intrusions in the privacy of the occupants. In contrast to surveillance cameras and motion sensors, the occupant does not expect to be monitored. Also, legal restrictions regarding privacy might apply to surveillance cameras and motion sensors but not to room climate sensors.

1.2. Experiments

To evaluate these threats, we present experiments that consider occupancy detection, occupancy estimation, and activity recognition based on the analysis of room climate data from a privacy perspective. We measured room climate data in three office-like rooms and distinguished between the activities reading, standing, walking, and working on a laptop. The data was collected from sensors that measure temperature and relative humidity at a regular time interval of a few seconds.

Reflecting the most restrictive scenario, we analyzed how much information can be derived from the measurements of a single sensor only. Although we assume that in smart heating applications, only one sensor per room is most likely to be installed, each room was equipped with several sensors in order to evaluate any impact of the position of the sensor in the room. Additionally, we combined and evaluated data from two sensors simultaneously to investigate whether increasing the available data affects classification noticeably.

In our procedure, occupants performed a pre-defined sequence of tasks in the experimental space. In sum, we collected almost 115 hours of room climate sensor data from a total of 36 participants. The collected room climate data was analyzed using an off-the-shelf machine learning classification algorithm.

This work is an enhanced version of [36] and extends the analysis significantly as follows:

We investigate not only if the presence of persons can be detected but also if the number of persons can be estimated (occupancy estimation). The new results are displayed in Table 6 and are discussed in Section 5.3.

We analyze for all considered scenarios (including occupancy estimation) whether the combination of measurements taken from two sensors instead of one allows to improve the overall results. The new results are displayed in Table 7 and Table 8 and are discussed in Section 5.5.

An overview of the additional contributions can be found in Table 1.

Table 1
Overview of the scenarios analyzed in [36] and this work

Occupancy detection Activity recognition Occupancy estimation

Single sensor [36] and this work [36] and this work This work

Two sensors This work This work This work

	Occupancy detection	Activity recognition	Occupancy estimation
Single sensor	[36] and this work	[36] and this work	This work
Two sensors	This work	This work	This work

1.3. Summary of results

Evaluating our collected room climate data, the attacker detects the presence of a person with detection rates of up to $93.5 %$ depending on location and the sensor position, which is significantly higher than guessing ( $50 %$ ). Discriminating between one and two persons present in the room is not a similarly easy task for the attacker as classification results are not above $68 %$ and some are around $48 %$ , which is in the region of guessing ( $50 %$ ). Consequently, differentiating between absence, presence of one, and of two persons yields results lying in between these two with rates up to $72.2 %$ . The attacker can distinguish between four activities (reading, standing, walking, and working on a laptop) with detection rates up to $56.8 %$ , which is also significantly better than guessing ( $25 %$ ). We can also distinguish between three activities (sitting, standing and walking) with detection rates up to $81.0 %$ , as opposed to $33.3 %$ if guessing. Furthermore, we distinguish between standing and walking with detection rates up to $96.3 %$ . Additionally, evaluating the performance of pairs of sensors predominantly shows an increase of detection rates. However, the exact difference fluctuates from almost none to approximately $10 %$ . Thus, we show that the fears of privacy violation by leaking room climate data are well justified. Furthermore, we analyze the influence of the room size, positions of the sensor, and amount of the measured sensor data on the accuracy. In summary, we provide the first steps in verifying the common belief that room climate data leaks privacy-sensitive information.

1.4. Outline

The remainder of this paper is organized as follows. In Section 2, we give an overview of related work. Section 3 presents the threat model considered in this work. In Section 4, we introduce the experimental design and methods. The results and complementary observations of our experiments are presented and discussed in Sections 5 to 7, respectively. We draw conclusions in Section 8.

2. Related work

Over the last decade, several experiments have been conducted to detect occupancy in sensor-equipped spaces and to recognize people’s activities as summarized in Table 2. Activity recognition has been considered for basic activities, such as leaving or arriving at home, or sleeping [32], as well as for more detailed views, including toileting, showering, and eating [49].

Most of the previous research uses types of sensors that are different from temperature and relative humidity. For example, $C O_{2}$ represents a useful source for occupancy detection and estimation [51]. Additionally, sensors detecting motion based on passive infrared (PIR) [1,6,9,16,19,21,31,39,48,54], sound [14,17,19,39], barometric pressure [33], and door switches [6,11,12,16,34,48,53] are utilized for occupancy estimation. For evaluation, different machine learning techniques are used, e.g., HMM [9,16,31,32,49,51], ARHMM [1,21], ANN [9,14,31,34], Naïve Bayes [55], and decision trees [3,16,17,19,53,55].

Table 2
Overview of previous experiments on occupancy detection (D), occupancy estimation (E), which aims at determining the number of people in a room, and activity recognition (A) with a focus on selected sensors; AML denotes acoustic, motion, and lighting sensors

Work Target Rel. humidity Temperature CO₂ Ventilation AML Switches Wearables

van Kasteren et al., 2008 [49] A ○ ○ ○ ○ ○ ● ○

Lam et al., 2009 [31] E ○ ○ ● ○ ● ○ ○

Dong et al., 2010 [9] E ● ● ● ○ ● ○ ○

Lu et al., 2010 [32] A ○ ○ ○ ○ ● ● ○

Hailemariam et al., 2011 [19] D ○ ○ ○ ○ ● ○ ○

Han et al., 2012 [21] E ● ● ● ● ● ○ ○

Zhang et al., 2012 [54] E ● ● ● ○ ● ○ ○

Ekwevugbe et al., 2013 [14] E ○ ○ ● ○ ● ○ ○

Ebadat et al., 2013 [11] E ○ ● ● ● ○ ○ ○

Ai et al., 2014 [1] E ● ● ● ● ● ● ○

Wörner et al., 2014 [51] D ○ ○ ● ○ ○ ○ ○

Yang et al., 2014 [53] D/E ● ● ● ○ ● ● ○

Masood et al., 2015 [33] E ○ ● ● ○ ○ ○ ○

Ebadat et al., 2015 [12] E ○ ● ● ● ○ ● ○

Candanedo & Feldheim, 2016 [3] D ● ● ● ○ ● ○ ○

Mehr et al., 2016 [34] A ○ ○ ○ ○ ○ ● ○

Sprint et al., 2016 [48] A ○ ● ○ ○ ● ● ○

Cicirelli et al., 2016 [6] A ○ ○ ○ ○ ● ● ●

Pedersen et al., 2017 [39] D ● ● ● ○ ● ○ ○

Fan et al., 2017 [16] A ● ● ○ ● ● ● ●

Ghaffarzadegan et al., 2017 [17] D/E ○ ○ ○ ○ ● ○ ○

Zimmermann et al., 2018 [55] D/E ● ● ● ○ ○ ○ ○

This work D/E/A ● ● ○ ○ ○ ○ ○

Work	Target	Rel. humidity	Temperature	CO₂	Ventilation	AML	Switches	Wearables
van Kasteren et al., 2008 [49]	A	○	○	○	○	○	●	○
Lam et al., 2009 [31]	E	○	○	●	○	●	○	○
Dong et al., 2010 [9]	E	●	●	●	○	●	○	○
Lu et al., 2010 [32]	A	○	○	○	○	●	●	○
Hailemariam et al., 2011 [19]	D	○	○	○	○	●	○	○
Han et al., 2012 [21]	E	●	●	●	●	●	○	○
Zhang et al., 2012 [54]	E	●	●	●	○	●	○	○
Ekwevugbe et al., 2013 [14]	E	○	○	●	○	●	○	○
Ebadat et al., 2013 [11]	E	○	●	●	●	○	○	○
Ai et al., 2014 [1]	E	●	●	●	●	●	●	○
Wörner et al., 2014 [51]	D	○	○	●	○	○	○	○
Yang et al., 2014 [53]	D/E	●	●	●	○	●	●	○
Masood et al., 2015 [33]	E	○	●	●	○	○	○	○
Ebadat et al., 2015 [12]	E	○	●	●	●	○	●	○
Candanedo & Feldheim, 2016 [3]	D	●	●	●	○	●	○	○
Mehr et al., 2016 [34]	A	○	○	○	○	○	●	○
Sprint et al., 2016 [48]	A	○	●	○	○	●	●	○
Cicirelli et al., 2016 [6]	A	○	○	○	○	●	●	●
Pedersen et al., 2017 [39]	D	●	●	●	○	●	○	○
Fan et al., 2017 [16]	A	●	●	○	●	●	●	●
Ghaffarzadegan et al., 2017 [17]	D/E	○	○	○	○	●	○	○
Zimmermann et al., 2018 [55]	D/E	●	●	●	○	○	○	○
This work	D/E/A	●	●	○	○	○	○	○

Occupancy detection. The authors of [19] present a system aiming at occupancy detection. They conclude that using only PIR-based sensors delivers the most reliable results for their cubicle office setting. Using the information collected by a Netatmo weather station with a granularity of five minutes, [51] provides evidence that the station’s CO₂ sensor data leaks information of a room’s occupancy. Detecting presence of one or two persons in a two-people office is presented in [3]. The authors used temperature, humidity, light, and CO₂ sensors, whereas they achieved the best results when combining data from only two of those, concluding that any combination including the lighting sensor, and likewise, the time of day help overall performance. A single-person office and a three-room dormitory (shared by two persons), each equipped with temperature, humidity, CO₂, volatile organic compound, PIR, and acoustics sensors, are used for occupancy detection in [39]. As seen before, CO₂ combined with PIR is a suitable solution for determining the state of a room.

Occupancy estimation. Detecting occupancy levels ranging from zero to three in an open-plan office setting is presented in [31]. To this end, the authors use several different sensor types, i.e., CO, CO₂, total volatile organic compounds, small particulates, acoustics, illumination, motion, temperature, and relative humidity sensors, whereas the latter two are not contributing to their findings. The authors deem the CO₂ and acoustics information the most valuable. This work is reaffirmed and extended by [9] with occupancy levels ranging from zero to four. In [54], the authors use the same workplace and sensor node settings. Here, relative humidity and temperature contribute to their results. In an office setting, [21] presents a system dedicated to detecting occupancy levels ranging from zero to six occupants while, at the same time, providing a measure of indoor air quality. They deploy sensor nodes comprising PIR, CO₂, air velocity, temperature, and relative humidity sensors. Continuing this work, the authors of [1] re-used the previously developed system to increase its performance for occupancy estimation. In [14], a sensor node network is deployed in an open-plan office. They evaluated the usefulness of using acoustics, (desktop PC case) temperature, CO₂, volatile organic compound, PIR, temperature, and relative humdity sensors. For their scenario, estimate occupancy levels ranging from zero to five, they focused on acoustics, (desktop PC case) temperature, CO₂, and PIR as the best combination. In [11], estimating occupancy levels from zero to four in an office setting is presented, using HVAC actuation, temperature, and CO₂ sensors. This work is extended by [12]. They additionally use the information from deployed door switches. The authors of [53] evaluate the usefulness of various sensor types in an office building. For occupancy detection and estimation, they find CO₂, door status, and lighting levels the most valuable, thereby excluding the deployed temperature and relative humidity sensors. In [33], the authors aim at estimating occupancy levels of a class room for 35 people, with a maximum number of occupants 22 being present, utilizing CO₂, temperature, and barometric pressure sensors. Especially, the latter was deemed useful. However, they targeted five different levels of occupancy of five values each, i.e., 0–4, 5–9, and so forth. Another interesting approach is to derive (the level of) occupancy from acoustics sensors only in a residential and commercial setting as presented in [17]. They use audio data from a living room with levels of occupancy from zero to three and a single-person office with occupancies ranging from zero to two. Using an indoor air quality system including CO₂, volatile organic compound, temperature, and relative humidity sensors to detect occupancy levels in four student dormitories is shown in [55]. The occupancy levels ranged from zero to three, whereas temperature and humidity are far less valuable than CO₂.

Activity recognition. The authors of [49] deployed 14 state-change nodes (or digital switches) that were installed in key positions in a residential apartment. For instance, one node was installed as a toilet flush sensor, which would notice and report toileting activity. Their goal was to automatically detect daily living activities via a strategically deployed sensor/switch network. Similarly, [34] presents a system to detect routine activities of daily living using data collected from 77 state-change sensors deployed in a residential apartment. In [32], the main goal is to achieve savings by autonomously regulating the heating of a residential HVAC system. By using information from deployed motion (PIR) sensors and door switches, a central unit makes a decision about the current occupancy state. They further use historical data to establish a baseline occupancy-sleep-absence pattern, serving the purpose of heating the home before the occupants are expected to return home. Another use case for activity recognition is ambient assisted living. In [48], the idea is to detect health or behavior change in a residential apartment. For this, they use motion, lighting, temperature sensors, and door switches to detect activities of daily living. Likewise, the authors of [6] develop a framework allowing recognizing activities in a residential home. For better resolution, they complement PIR sensors and switches with wearables. Using temperature, humidity, small particulates, and PIR sensors, door switches, wearables, and HVAC events, [16] offers activity recognition for ambient assisted living as a service.

This work. In contrast to previous work, our results rely exclusively on temperature1

Note that in [3], additional classification results using temperature as the only predictor are reported in the range of $67 %$ to $87 %$ . The underlying distribution of absence ( $79 %$ , $64 %$ , and $79 %$ ) and presence ( $21 %$ , $36 %$ , and $21 %$ ) in their three datasets is unbalanced and, thus, may have biased the classifier.

and relative humidity. Previously published experimental results involved other or additional types of sensors, such as

C O_{2}

, acoustics, motion, or lighting (the latter three are referred to as AML in Table 2), door switches or states of appliances (also gathered with the help of switches), such as water taps or WC flushes. For this reason, our detection results are not directly comparable to these works.

3. Threat model

The overall aim of our work is to understand the potential privacy implications if room climate data is accessed by other parties. Consequently, we consider in general a threat model where the attacker is an outsider, i.e., a party that is not present in the room, who gets access to the room climate data. The goal of the attacker is to extract information from the data that otherwise would not be known to her. Obviously, the term “information” is subject to manifold interpretations. Moreover, the type and granularity of measurements are most likely having impact on what kind of information can be extracted. Therefore, we need to make more explicit what information an attacker is aiming for and what kind of measurements are at her disposal.

In our work, we focus on the question whether basic information are extractable. That is, we consider as attack goal to gain information about the state of occupancy, i.e., if and how many persons are present in the room, as well as the activity of the occupants without their consent. With respect to the available data, it is obvious that the more information an attacker can gather, the more likely she can deduce privacy-harming information from the measurements. However, in practice only few sensors will be installed within one room, especially as one sensor node per room is sufficient to monitor the room climate, and these sensors will measure a very limited set of data only. Therefore, we base our analysis on the attacker model that considers a room climate system where only few sensor nodes are used and only basic data is measured.

That is, the main focus of our experiments will be on the case that only one single sensor is present. If it turns out that a single sensor already provides sufficient data to derive privacy-harming information about the occupants, the situation may get even worse when more sensors are accessible. To examine this, we will also consider the case that an attacker can make use of the measurements of two different sensors to analyze if and how much the attacker performs better compared to the 1-sensor-case.

In out threat model, we assume that this sensor node takes only the two most basic measurements, temperature and relative humidity. These data are the fundamental properties to describe room climate. Note that our restricted data is in contrast to existing work (cf. Table 2 and Section 2) that based their experiments on more types of measurements or used data that is less common to characterize room climate.

We consider a sensor system that measures the climate of a room, denoted as target location. At the target location, temperature and relative humidity sensors are installed that report the measured values in regular intervals to a central database. We consider an attacker model where the attacker has access to this database and aims to derive information about the occupants at the target location. Furthermore, we assume that the attacker has access to either the target location itself, or rooms similar in size, layout, sensor positions, and furniture. Such situations are given, for example, at office buildings, hotels, cruise ships, and student dormitories. These locations, denoted as training location, are used to train the classifier, which is a machine learning algorithm learning the input data labeled with the ground truth. As the attacker has full control over the training location, she can freely choose what actions are taking place during the measurements. For example, she could do measurements while no persons are present at the training location, or some persons are present and execute predefined activities.

There are various scenarios, in which an attacker has incentives to collect and analyze room climate data. For example, the management of a company aims at observing the presence and working practices of employees in the offices. In another case, a provider of private spaces (hotels, dormitories, etc.) wants to disclose lifestyle and intimate activities in these spaces. This information may be utilized for targeted advertising or sold to insurance companies. In any case, the evaluation of room climate data provides the attacker with the possibility to undermine the privacy of the occupants.

The procedure of these attacks is as follows: First, the attacker collects training data at a training location, which might be the target location or other rooms similar in size, layout, sensor positions, and furniture. The attacker also records the ground truth for all events that shall be distinguished. Examples of events are number of persons present in the room (including the case that the room is empty), or different activities such as working, walking, and sleeping. The training data is recorded with a sample rate of a few seconds and split into windows (i.e., a temperature curve and a relative humidity curve) of same time lengths, usually one to three minutes. Using the collected training data, the attacker trains a machine learning classifier. After the classifier is trained, it can be used to classify windows of climate data from the target location to determine the events. The classifier works on previously collected data, thus reconstructing past events, and also on live-recorded data, thus determining current events “on-the-fly” at the target location.

Note that we do not claim that analyzing room climate data is the most effective attack to gain the information specified above. It is our ambition to demonstrate that room climate data potentially reveals privacy-harming information and hence needs to be protected. In other words: the question is not “What is the best attack to determine occupancy and/or activities?” but “Should room climate data be considered private data?”. Note that our experiments show that occupancy and activities can be decided in almost real-time, i.e., after measuring the room temperature and humidity for a small time window. Even if an attacker may get access to such data only at some later point in time, the risk of privacy violation remains, e.g., the measurements may allow to determine the user profile.

4. Experimental design and methods

We conducted a study to investigate the feasibility of detecting and estimating occupancy as well as inferring activities in an office environment from temperature and relative humidity. From March to April 2016, we performed experiments at two locations simultaneously, Location A and Location B, with a distance of approximately 200 km between them. In addition, from January to February 2017, we conducted further experiments at a third location, denoted as Location C, which is located in the same building as Location B.

4.1. Experimental setup and tasks

The experimental spaces at the three locations are different in size, layout, and positions of the sensors. Thus, each target location is also the training location in our study. At Location A, the room has a floor area of 16.5 m² and was equipped with room climate sensors at four positions as shown in Fig. 1(ii). At Location B, the room has a floor area of 30.8 m², i.e., roughly twice as much as at Location A, and had room climate sensors installed at three positions as illustrated in Fig. 1(i). Location C has a floor area of 13.9 m² and was equipped with room climate sensors at five positions as shown in Fig. 1(iii). In all locations, the room climate sensors measured temperature and relative humidity. The number of deployed sensors varied due to limitations of hardware availability.

Fig. 1.

Floor plans of the experiment spaces including sensor node locations, h indicates the node’s height.

Our goal was to determine to which extent the presence and activities of occupants influence the room climate data. Therefore, we measured temperature and relative humidity during phases of absence as well as phases of presence. If occupants were present, these persons had to perform one task or a sequence of tasks. We defined the following experimental tasks (see also Fig. 2):

Read

Sit on an office chair next to a desk and read.

Stand

Stand in the middle of the room, try to avoid movements.

Walk

Walk slowly and randomly through the room.

Work

Sit on an office chair next to a desk and use a laptop, which is located on the desk.

Fig. 2.

The defined tasks performed by participants at Location A.

To eliminate confounding factors, we defined location default settings applying to all locations. Essentially, all windows were required to remain closed and no person was allowed in the room when not in use for the experiment. The rooms have radiators for heating, which were adjusted to a constant level. At Location A and B, we used shutters fixed in such positions that enough light was provided for reading and working.

4.2. Sensor data collection

We used a homogeneous hardware and software setup at all locations for data collection, which is described in the following.

4.2.1. Hardware

At each location, we set up a sensor network consisting of several Moteiv Tmote Sky sensor nodes with an integrated IEEE 802.15.4-compliant radio [7] as well as an integrated temperature and relative humidity sensor. The nodes have the Contiki operating system [10] version 2.7 installed. In addition, we deployed a webcam that took pictures in a 3-second interval at Location A. These were used for verification during the data collection phase only, and were not given to the classification algorithms.

4.2.2. Software

For sensor data collection, we customized the Collect-View application included in Contiki 2.7, which provides a graphical user interface to manage the sensor network. For our purposes, we implemented an additional control panel offering a customized logging system. The measurement settings of the Collect-View application were set to a report interval of 4 seconds with a variance of 1 second, i.e., each sensor node reported its current values in a time interval of $4 \pm 1$ seconds. The variance is a feature provided by Collect-View to decrease the risk of packet collisions during over-the-air transmissions.

4.2.3. Collected data

We structured data collection in units and aimed for a good balance between presence and absence as well as the different tasks among all units, as this is needed for the later analysis using machine learning. Each unit has a fixed time duration, t, where exactly one or two persons were present ( $t \in {10, 30, 60}$ , in minutes) who executed predefined activities. In the presence of two occupants, both performed the same activities simultaneously, which allows to investigate whether it is possible to distinguish the number of persons in a room. If the presence time was t minutes, then the absence time before and after it, respectively, was determined as $\frac{t}{2} + 5$ minutes, where 5 minutes served as buffer. This accounts for both, the equal distribution of presence time and absence time, respectively, and the fact that temperature and humidity settle within a 15-minute period after the 60-minute presence of one person. In Section 4.3, we present a detailed description of the experimental procedure.

Overall, we collected around 115 hours of sensor data, 66 hours with at least one person being present. A more extensive overview of the amount of measured sensor data is shown in Table 3. To encourage replication and further investigations, all collected sensor data is available as open data sets on GitHub.2

²
https://github.com/IoTsec/Room-Climate-Datasets

Table 3

Measured sensor data of all locations (in hours)

Variable	Value	Recorded Time [h]

		Location A	Location B	Location C
Occupancy	0	20:38:26	15:21:00	13:21:42
	1	14:41:56	11:33:06	13:44:29
	2	14:51:05	11:41:22	–
Task (one occupant)	Read	4:46:13	2:56:44	3:19:47
	Stand	2:45:27	2:34:20	3:28:27
	Walk	2:43:53	2:37:12	3:20:05
	Work	4:03:33	3:00:20	3:20:52
Task (two occupants)	Read	4:26:43	2:58:43	–
	Stand	2:51:08	2:37:11	–
	Walk	2:43:23	2:38:20	–
	Work	4:27:57	2:58:24	–

4.3. Experimental procedure

The participants were assigned to at least one experimental unit with fixed presence times and tasks, and provided with a script for their actions (that is, for how long and in which order the tasks should be performed). Every participant performed each unit twice, with the same tasks, but possibly on different days and in a permuted chronological order. Tasks were performed in blocks of 10, 20, or 30 minutes. Thus, 10-minute units contained only one task of 10 minutes; 30-minute units consisted of either three tasks of 10 or one task of 10 plus one of 20 minutes; 60-minute units were composed of either two tasks of 20 plus two of 10, or one task of 10, 20, and 30 minutes each.

At the beginning of the presence time for each unit, i.e., the time period where one or two persons had to be present, the experimental supervisor unlocked the room door to let the participants in. The participants started with the first task and were instructed by phone (at Locations A and C) or through the glass pane (at Location B) when it was time to change activities or to leave the room.

Table 4
Demographic data of participants, μ denotes the average, σ denotes the standard deviation

Characteristic Location

A B C

Gender f: 3 2 5

m: 11 10 5

Weight [kg] μ: 74.9 81.7 63.1

σ: 8.0 12.1 10.0

Height [cm] μ: 175.9 178.4 170.7

σ: 9.2 5.3 9.3

Age μ: 33.7 30.3 25.6

σ: 8.2 4.8 2.8

Characteristic	Location
Gender	f:	3	2	5
m:	11	10	5
Weight [kg]	μ:	74.9	81.7	63.1
σ:	8.0	12.1	10.0
Height [cm]	μ:	175.9	178.4	170.7
σ:	9.2	5.3	9.3
Age	μ:	33.7	30.3	25.6
σ:	8.2	4.8	2.8

4.4. Participants and ethical principles

For participating in the experiment, 14 subjects volunteered at Location A, 12 subjects at Location B, and 10 subjects at Location C as shown in Table 4. Demographic data of participants was collected in order to facilitate replication and future experiments. All subjects gave their written informed consent after the study protocol was approved by the data protection office.3

³
Ethical review boards at both locations only consider medical experiments.

We assigned each participant to a random ID. All collected sensor data as well as the demographic data is only linked to this ID.

4.5. Classifier design

We used classification to predict occupancy and activities in the rooms. We adopt an approach that has successfully been used in several applications of biosignal processing, namely extraction of a number of statistical descriptors with subsequent feature selection [25,29].

The features use measurements from short time windows. We experimented with windows of different lengths, namely 60 s, 90 s, 120 s, 150 s, and 180 s. The offset between two consecutive windows was set to 30 s. We excluded all windows where only a part of the measurements belongs to the same activity.

The feature set was composed from a number of statistical descriptors that were computed on temperature and humidity measurements within these windows. These are mean value, variance, skewness, kurtosis, number of nonzero values, entropy, difference between maximum and minimum value of the window (i.e., value range), correlation between temperature and humidity, and mean and slope of the regression line for the measurement window before the current window. Additionally, we subtracted from the measurements their least-square linear regression line, and computed all of the listed statistics on the subtraction residuals. Feature selection was performed using a sequential forward search [50, Ch. 7.1 & 11.8], with an inner leave-one-subject-out cross-validation [23, Ch. 7] to determine the performance of each feature set. For classification, we used the Naïve Bayes classifier. To avoid a bias in the results, we randomly selected identical numbers of windows per class for training, validation, and testing. For implementation, we used the ECST software [45], which wraps the WEKA library [20].

As performance measures, we use accuracy (i.e., the number of correctly classified windows divided by the number of all windows), and per-class sensitivity (i.e., the number of correctly classified windows for a specific class divided by the number of all windows of this class). Classification accuracy was deemed statistically significant if it was significantly higher than random guessing which is the best choice if the classifier could not learn any useful information during training. For each experiment, a binomial test with significance level $p < 0.01$ was carried out using the R software [40].

Note that neither the features nor the rather simple Naïve Bayes classifier are particularly tailored to predicting privacy leaks. However, we also show that such an unoptimized system is able to correctly predict occupancy and action types and hence produce privacy leaks. Higher detection rates results can be expected if more advanced classifiers are applied to this task.

5. Results

In this section, we present the experimental results. First, a visual inspection of the collected data is presented, followed by the machine learning-aided occupancy detection and estimation, and activity recognition using one or two sensors.

Fig. 3.

Visualization of two examples of room climate measurements. The grey background indicates the presence of the occupant in the experimental space.

5.1. Visual inspection

We started our evaluation by analyzing the raw sensor data. Hence, we implemented a visualization script in MATLAB, which plots this data. The visualizations of two measurements are exemplarily depicted in Fig. 3.

The visualizations show an immediate rise of the temperature and humidity as soon as an occupant enters the room. Furthermore, variations in temperature and humidity increase rapidly and can be clearly seen. Thus, one can visually distinguish between phases of occupancy and non-occupancy. One can also notice different patterns during the performance of the tasks. As Fig. 3(i) shows, an occupant walking in the experimental space causes a constant increase of temperature and humidity with only small variations. In contrast, an occupant standing in the room causes the largest variations of humidity compared to the other defined tasks (cf. Fig. 3(ii)). The effects of the tasks reading and working on temperature and humidity in the depicted figures are very similar: both variables tend to increase showing medium variations. For further analysis of the data, we used machine learning as outlined in Section 4.5.

5.2. Occupancy detection

Occupancy detection describes the binary detection of occupants in the experimental space based on features from windows with length of 180 seconds (cf. Section 4.5). This is a two-class task, namely to distinguish whether an occupant is present (true) or not (false). We only considered training and testing data within the same room (but separated training and testing both by the days and participants of the acquisition). We randomly selected the same number of positive and negative cases from the data. Thus, simply guessing the state has a success probability of $50 %$ . However, our classification results are considerably higher than that. Table 5 shows that the highest accuracies per location were $93.5 %$ (Location A), $88.5 %$ (Location B), and $91.0 %$ (Location C). Considering all sensors of all three locations, detection accuracy ranges between $66.8 %$ (Sensor B3) and $93.5 %$ (Sensor A1) as shown in Fig. 4(i). All classification accuracies were statistically significantly different from random guessing. This indicates that an attacker can reveal the presence of occupants in a target location with a high probability.

Table 5
Classification accuracy for occupancy detection. Notation: ‘Occup.’, sensitivity for class occupancy. ‘No Occup.’, sensitivity for class no occupancy. ‘Guess’, probability of correct guessing. ‘Acc.’, classification accuracy

Scenario Sensor Sensitivity [%] Guess [%] Acc. [%]

Occup. No Occup.

Occupancy A1 94.1 93.0 50.0 93.5

A2 94.5 85.0 50.0 89.7

A3 92.0 76.4 50.0 84.2

A4 77.8 79.1 50.0 78.4

B1 91.9 85.1 50.0 88.5

B2 85.3 77.2 50.0 81.3

B3 69.7 63.9 50.0 66.8

C1 92.9 89.2 50.0 91.0

C2 89.9 87.4 50.0 88.6

C3 90.0 82.0 50.0 86.0

C4 89.8 87.6 50.0 88.7

C5 92.5 88.8 50.0 90.7

Scenario	Sensor	Sensitivity [%]	Guess [%]	Acc. [%]
Occupancy	A1	94.1	93.0	50.0	93.5
A2	94.5	85.0	50.0	89.7
A3	92.0	76.4	50.0	84.2
A4	77.8	79.1	50.0	78.4
B1	91.9	85.1	50.0	88.5
B2	85.3	77.2	50.0	81.3
B3	69.7	63.9	50.0	66.8
C1	92.9	89.2	50.0	91.0
C2	89.9	87.4	50.0	88.6
C3	90.0	82.0	50.0	86.0
C4	89.8	87.6	50.0	88.7
C5	92.5	88.8	50.0	90.7

5.3. Occupancy estimation

Results for detecting multiple persons are shown in Table 6. Here, we first restate the results for deciding whether a single person is in a room or not, denoted as “0–1 person?”. Then, we report the results for deciding whether one or two persons are in a room, denoted as “1–2 persons?”. In the third column, we report results on the joint classification problem, denoted as “0–1–2 persons?”. While the first two columns cover two-class classification tasks (i.e., with guessing chance of $50 %$ ), the third column covers a three-class classification task (with guessing chance of $33.3 %$ ). Decisions whether a room is empty or occupied by a single person achieve relatively high accuracies, for some sensors, close to or even above $90 %$ . However, the decision between one or two persons is much harder. At Location A, three out of four sensors perform around guessing chance, whereas at Location B, performances range in between $60 %$ and $70 %$ . The three-class task confirms these two initial results, lying between the well-discernible case of an empty room and the much harder case of determining the exact number of occupants. From an attacker’s perspective, deciding whether one or two persons are present in a target location is an ambiguous task.

5.4. Activity recognition

Activity recognition reports the current activity of an occupant in the experimental space. The four activity tasks are described in Section 4.1. The recognition results for these tasks are shown in Fig. 4.

Activity4 classifies between the activities $Read$ , $Stand$ , $Walk$ , $Work$ . As shown in Fig. 4(ii), the accuracy of recognizing activities achieved by the machine learning pipeline ranged from $23.9 %$ (Sensor C1) to $56.8 %$ (Sensor A1). Overall, the accuracy of Activity4 was statistically significantly better than the probability of guessing the correct task ( $25 %$ ) for 8 out of 12 sensors. Thus, the distinction between multiple activities is possible, but depends on the target location and the position of the sensor.

Fig. 4.

Classification accuracy for occupancy detection and activity recognition. In each diagram, the guessing probability is plotted as a line. Each symbol represents the accuracy that we achieved with a single sensor. A circle marks a statistically significant result, while an ‘x’ represents a statistically insignificant result.

In the next step, we investigated whether an attacker can increase the recognition accuracies by distinguishing between a smaller set of activities. To this end, we combined two tasks to a meta task, e.g., the tasks Read and Work became Sit. The model Activity3 classifies between the tasks Sit, Stand, and Walk. The probability of correct guessing is thus $33.3 %$ . This model is typical to represent activities of an occupant in a private space or an office room. For Activity3, the achieved accuracy ranged from $31.8 %$ (Sensor C1) to $81.0 %$ (Sensor A1). Our results were statistically significant for 10 out of the 12 sensors deployed in the three locations. Assuming a known layout of the target location, the attacker might be able to determine the position of the occupant in the space and infer activities such as watching TV, exercising, cooking or eating.

Table 6

Occupancy detection and estimation accuracies for two- and three-class tasks at Locations A and B

Sensor	0–1 person?	1–2 persons?	0–1–2 persons?
A1	93.5	47.7	59.7
A2	89.7	47.8	58.7
A3	84.2	68.0	66.4
A4	78.4	48.2	54.9
B1	88.5	67.6	72.2
B2	81.2	60.7	60.3
B3	66.8	67.3	56.6

The model Activity2 classifies between the tasks Sit and Upright, whereby Sit is as previously Read or Work, and Upright combines Stand and Walk. In this classification, the attacker distinguishes whether an occupant is at a certain posture. The model Activity2a classifies between the tasks Read and Work, and the model Activity2b classifies between the tasks Stand and Walk. Activity2a indicates that an attacker can even distinguish between the sedentary activities, such as reading a book or working on the laptop. In contrast, Activity2b shows that an attacker can differentiate between standing and moving activities. Thus, an attacker can detect movements at the target location. For Activity2, Activity2a, and Activity2b, the probability to guess the correct class is $50 %$ . Using these models, the attacker can infer various work and life habits.

For Activity2, our accuracy varies between $54.6 %$ (Sensor C2) and $82.1 %$ (Sensor A1), and all accuracies are statistically significant. For Activity2a, the lowest and highest accuracies were $54.2 %$ (Sensor B3) and $76.6 %$ (Sensor C2), respectively, which resulted in statistically significant results for 11 out of 12 sensors. For Activity2b, the achieved accuracy ranged from $53.3 %$ (Sensor C4) to $95.1 %$ (Sensor A1) and the results for 10 out of 12 sensors were statistically significant.

5.5. Multi-sensor classification

Another interesting question is whether the availability of data from multiple sensors from within a room can improve the prediction performance. To assess this question, we concatenated the feature vectors of two sensors prior to the classification. Consistently, the evaluation process was kept identical to the previous analyses.

Table 7
Occupancy detection and estimation accuracies for two- and three-class tasks at Locations A and B based on two sensors. The differences to the accuracies of single sensors are given in parentheses. For example, the accuracy when measuring with sensors A1 and A2 increases to $96.3 %$ . When measuring with A1 only, the accuracy is $93.5 %$ (cf. Table 5), which constitutes the difference of $+ 2.8 %$ . The difference to measuring with A2 only is $+ 6.6 %$

Sensor Pair 0–1 person? 1–2 persons? 0–1–2 persons?

A1-A2 96.3 $(+ 2.8 / + 6.6)$ 60.8 $(+ 13.1 / + 13.0)$ 68.1 $(+ 8.4 / + 9.4)$

A1-A3 93.9 $(+ 0.4 / + 9.7)$ 71.5 $(+ 23.8 / + 3.5)$ 75.7 $(+ 16.0 / + 9.3)$

A1-A4 94.2 $(+ 0.7 / + 15.8)$ 56.6 $(+ 8.9 / + 8.4)$ 69.0 $(+ 9.3 / + 14.1)$

A2-A3 90.1 $(+ 0.4 / + 5.9)$ 70.9 $(+ 23.1 / + 2.9)$ 74.5 $(+ 15.8 / + 8.1)$

A2-A4 87.8 $(- 1.9 / + 9.4)$ 53.7 $(+ 5.9 / + 5.5)$ 61.2 $(+ 2.5 / + 6.3)$

A3-A4 84.8 $(+ 0.6 / + 6.4)$ 69.8 $(+ 1.8 / + 21.6)$ 70.6 $(+ 4.2 / + 15.7)$

B1-B2 91.7 $(+ 3.2 / + 10.5)$ 72.0 $(+ 4.4 / + 11.3)$ 75.1 $(+ 2.9 / + 14.8)$

B1-B3 89.6 $(+ 1.1 / + 22.8)$ 67.9 $(+ 0.3 / + 0.6)$ 71.7 $(- 0.5 / + 15.1)$

B2-B3 83.4 $(+ 2.2 / + 16.6)$ 61.4 $(+ 0.7 / - 5.9)$ 64.8 $(+ 4.5 / + 8.2)$

Sensor Pair	0–1 person?	1–2 persons?	0–1–2 persons?
A1-A2	96.3	$(+ 2.8 / + 6.6)$	60.8	$(+ 13.1 / + 13.0)$	68.1	$(+ 8.4 / + 9.4)$
A1-A3	93.9	$(+ 0.4 / + 9.7)$	71.5	$(+ 23.8 / + 3.5)$	75.7	$(+ 16.0 / + 9.3)$
A1-A4	94.2	$(+ 0.7 / + 15.8)$	56.6	$(+ 8.9 / + 8.4)$	69.0	$(+ 9.3 / + 14.1)$
A2-A3	90.1	$(+ 0.4 / + 5.9)$	70.9	$(+ 23.1 / + 2.9)$	74.5	$(+ 15.8 / + 8.1)$
A2-A4	87.8	$(- 1.9 / + 9.4)$	53.7	$(+ 5.9 / + 5.5)$	61.2	$(+ 2.5 / + 6.3)$
A3-A4	84.8	$(+ 0.6 / + 6.4)$	69.8	$(+ 1.8 / + 21.6)$	70.6	$(+ 4.2 / + 15.7)$
B1-B2	91.7	$(+ 3.2 / + 10.5)$	72.0	$(+ 4.4 / + 11.3)$	75.1	$(+ 2.9 / + 14.8)$
B1-B3	89.6	$(+ 1.1 / + 22.8)$	67.9	$(+ 0.3 / + 0.6)$	71.7	$(- 0.5 / + 15.1)$
B2-B3	83.4	$(+ 2.2 / + 16.6)$	61.4	$(+ 0.7 / - 5.9)$	64.8	$(+ 4.5 / + 8.2)$

Tbale 7 shows the results for occupancy estimation by using a pair of sensors. Overall, the detection accuracies improve by several percent when combining the feature vectors of two sensors. However, distinguishing between one and two persons in a room is still a hard task, with accuracies ranging approximately between $50 %$ and $70 %$ . The three-class problem also slightly improves, but still, by far the most reliable classification task is to determine whether there is a person in the room at all, with almost all accuracies now in the range of $90 %$ and above.

Table 8

Activity Recognition accuracies for all three locations based on a single sensor and on two sensors

One sensor						Two sensors

Sensor	Act.4	Act.3	Act.2	Act.2a	Act.2b	Sensor pair	Act.4	Act.3	Act.2	Act.2a	Act.2b
A1	56.8	81.0	82.6	57.5	96.3	A1-A2	50.2	75.4	83.1	74.8	91.0
A2	39.7	56.1	68.5	64.1	65.9	A1-A3	50.1	76.7	80.4	72.4	88.3
A3	32.7	47.2	66.7	60.3	68.7	A1-A4	54.5	79.0	83.0	67.3	89.4
A4	26.8	39.8	62.8	65.6	69.0	A2-A3	32.6	50.0	65.9	74.8	67.1
						A2-A4	41.1	55.1	71.0	75.5	69.1
						A3-A4	31.1	43.8	68.3	75.3	46.8
B1	55.1	76.0	75.6	63.4	71.5	B1-B2	52.7	77.0	80.0	70.1	86.0
B2	35.2	56.4	74.5	62.1	73.1	B1-B3	55.6	74.5	76.4	64.2	79.6
B3	33.3	39.5	63.9	50.9	62.2	B2-B3	39.3	54.5	72.1	59.8	62.7
C1	28.3	40.0	58.4	59.3	43.5	C1-C2	29.5	46.8	55.8	65.2	62.5
C2	25.8	41.7	60.0	71.6	55.3	C1-C3	38.8	52.5	60.2	63.8	76.9
C3	34.3	54.2	62.4	66.3	60.7	C1-C4	28.6	44.6	67.8	64.4	53.3
C4	25.1	36.8	62.6	66.1	53.3	C1-C5	36.6	52.4	68.5	64.1	65.4
C5	30.9	57.7	75.5	65.8	72.5	C2-C3	32.2	52.5	62.4	76.5	72.8
						C2-C4	29.0	48.9	64.5	77.3	59.3
						C2-C5	30.5	49.3	72.3	75.5	67.1
						C3-C4	40.2	54.7	67.9	72.4	69.7
						C3-C5	42.4	60.5	70.9	74.7	77.4
						C4-C5	35.3	57.1	72.6	72.4	64.5

Table 8 shows the results for activity recognition. Results for a single sensor are shown on the left side, results for pairs of two sensors on the right side. Interestingly, classification of activities appears to be a similarly hard task for pairs of sensors as it is for single sensors: using the information from multiple sensors does not really improve the results for activity classification. One reason might be that the information between multiple sensors is correlated for this task, such that the addition of multiple sensors only enlarges the feature space, but does not add significant information. From a privacy perspective, however, this is somewhat soothing: although the achieved accuracies are in almost all cases well above guessing chance, it is apparently not a straightforward task to accurately predict the activity of a person in the room.

6. Further observations

6.1. Length of measurement windows

The length of the measurement windows influences the accuracy of detection. We evaluated window sizes in the range between 60 and 180 seconds. Exemplarily, we analyzed the average accuracy of occupancy detection depending on the window size for all three locations. As shown in Fig. 5, the accuracy increases with a longer window size. We achieved the best results with the longest window sizes of 180 seconds.

This indicates that the highest accuracies are possible if longer time periods are considered. From a practical perspective, it is not advisable to extend the window size to a much larger duration than a few minutes since we assume that the performed activity is consistent for the whole duration of the window.

Fig. 5.

Average accuracy over all sensors from each location for occupancy detection depending on the window size.

6.2. Selected features

To assess the feasibility of an attacker that has only access to either temperature data or relative humidity data, we evaluated whether it might be enough to solely collect one type of room climate data. In the classification process, an attacker derives a set of features from temperature and relative humidity data and selects the best-performing features for each sensor and classification goal automatically (cf. Section 4.5). Analysis shows that features computed from temperature and relative humidity are of similar importance. In our evaluation, $57.9 %$ of the selected features are derived from temperature measurements, and $52.3 %$ from relative humidity measurements.4

⁴
Note that some features are based on both, temperature and relative humidity, which is why the sum of both numbers exceeds $100 %$ .

We also compared the features in terms of differences between the three locations as well as differences between occupancy detection and activity recognition. In all these cases, there are no significant differences between the importance of temperature and relative humidity. An attacker restricted to either temperature or relative humidity data will perform worse than with both data.

6.3. Size and layout of rooms

All our locations are office-like rooms, which have a similar layout (rectangular) but differ in size and furnishing. In our evaluation, the accuracy correlates with the size of the target location. As shown in Fig. 5, we had the highest average accuracy in occupancy detection with Location C, which has also the smallest ground area of 13.9 m². Location A has a ground area of 16.5 m², and has a slightly lower average accuracy. Location B is almost twice as large (30.8 m²) and shows the worst average accuracy compared to the other locations. Thus, our experiment indicates that an increasing room size leads to decreasing accuracy on average. An attacker achieves higher accuracies by monitoring target locations of a small size compared to target locations of larger sizes.

6.4. Position of sensors

According to our threat model in Section 3, the attacker controls the target location’s layout. Thus, we assume an attacker who can decide where to install room climate sensors in the target location. We consider how the position of a room climate sensor influences the accuracy of derived information. For occupancy detection, we had the best accuracy with a sensor node that is located in the center point at the ceiling of the target location (Sensors A1, B1, C1). In this position, the sensor has the largest gathering area to measure the climate of the room. Sensors mounted to the walls or on shelves perform differently in our experiments. For occupancy estimation, the best sensors differ per location, i.e., wall-mounted A3 outperforms all others at Location A, wheres B3 (similar to A3) and B1 are almost on par. For activity recognition, the central sensor nodes performed best at Location A and B, but not at Location C.

From the attacker perspective, the best position to deploy a room climate sensor is at the ceiling in the center of the target location. In large rooms, multiple sensors at the ceiling could be installed, each covering a subsection of the room.

7. Discussion

As our experiments reveal, knowing the temperature and relative humidity of a room allows the detection of the presence of people and to recognize certain activities with a significantly higher probability than guessing. By evaluating temperature and relative humidity curves of the length of 180 seconds, we were able to detect the presence of an occupant in one of our experimental spaces with an accuracy of 93.5% using a single sensor. Occupancy estimation results show no definitive trend as three out of four sensors perform around guessing chance for deciding between one or two persons, yet, the best accuracies range around $68 %$ . The classification accuracies for the three-class problem, deciding between zero, one, and two persons being present lie consequently between the occupancy detection and one vs. two persons results with $72.2 %$ as best result. In terms of activity recognition, we distinguished between four activities with an accuracy up to 56.8%, between three activities up to 81.0%, and between two activities up to 95.1%. Thus, an attacker focusing on the detection of a specific activity is more successful than an attacker that aims to classify a broader variety of activities. In the following, we discuss relevance, implications, and limitations of our results.

7.1. Privacy implications

We show that an attacker might be able to infer life and work habits of the occupants from the room climate data. Thus, the attacker is able to distinguish between sitting, standing, and moving, which already might reveal the position and activities of the occupant in the room. Moreover, the attacker can distinguish between upright and sedentary activities, between moving and standing, and between working on the laptop or reading a book.

Given the limited amount of recorded sensor data, the achieved accuracies in occupancy detection and activity recognition give a clear indication that occupants are subject to privacy violations according to the threat model described in Section 3. However, occupancy estimation respectively activity recognition are not straightforward since the achieved accuracies are low respectively differ between the different sensor positions and locations.

On the bright side, it is also reassuring that simply increasing the number of sensors is not as ominous as one might fear since the relative accuracy increase is rather slim. In other words, the most restrictive scenario, i.e., one deployed sensor, is sufficient. Hence, for an attacker, the benefit of deploying and exploiting more than one sensor is at least questionable.

Further experiments are required for a better assessment of the privacy risks induced by the room climate data. Our work provides promising directions for these assessments. For example, we demonstrated the existence of the information leak with the Naïve Bayes classifier. Naïve Bayes is arguably one of the simplest machine learning classifiers. In future work, it would be interesting to explore upper boundaries for the detection of presence/absence, occupancy estimation, and different activities by using more advanced classifiers such as the recently popular deep learning algorithms.

7.2. Policy implications

Smart heating is one of the most desirable smart home applications. For example, in a representative smart home survey of German consumers from 2015, $34 %$ of the participants stated that they are interested in technologies for intelligent heating or are planning to acquire such a system [8]. Another survey with 1,000 US and 600 Canadian consumers found that for $72 %$ of them, the most desired smart home device would be a self-adjusting thermostat, and $37 %$ reported that they were likely to purchase one in the next 12 months [38].

While consumers understand the benefits of smart heating systems, their privacy implications may be difficult to fathom. For example, in a recent representative survey with 461 American adults by Pew Research [41], the participants were presented with a scenario of installing a smart thermostat “in return for sharing data about some of the basic activities that take place in your house like when people are there and when they move from room to room”. Whereas $55 %$ of respondents said that this scenario was not acceptable for them, $27 %$ said that it was acceptable, with remaining $17 %$ answering “it depends”. Furthermore, in a worldwide survey with 9,000 respondents from nine countries (Australia, Brazil, Canada, France, Germany, India, Mexico, the UK, and the US), $54 %$ of respondents said that “they might be willing to share their personal data collected from their smart home with companies in exchange for money” [26].5

⁵
Methodological details, such as representativeness, breakdown by country and the exact formulation of the questions, are not known about this survey.

According to the results of the above surveys, quite high percentage of consumers might be willing to share their room climate data. On the other hand, our experiments show that this sharing may have serious privacy implications. Moreover, our findings indicate that room climate data measurements should be classified as personal data that can be used to profile the user, as specified in Art. 4 of the General Data Protection Regulation (GDPR) [42]. For example, for single households, room climate data may leak personal data about when the person is at home, and whether this person is in company of some other person at a certain time. As a consequence, the collection and processing of climate data is subject to the GDPR. More precisely, it is our understanding that before climate data can be processed, a Data Protection Impact Assessment (DPIA) needs to be conducted. Citing from Art. 35 of the GDPR: “Where a type of processing in particular using new technologies, and taking into account the nature, scope, context and purposes of the processing, is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall, prior to the processing, carry out an assessment of the impact of the envisaged processing operations on the protection of personal data.” Note that the EU Commission already developed DPIA for similar IoT applications: Smart Metering6

⁶

https://ec.europa.eu/energy/en/content/dpia-template-smart-grid-and-smart-metering-systems

and RFID.7

⁷

https://ec.europa.eu/digital-single-market/en/news/privacy-and-data-protection-impact-assessment-framework-rfid-applications

These may serve as a starting point for smart home heating applications.

Moreover, according to Art. 25 GDPR, data protection by design and by default should be applied to smart heating systems. Our experiments show that the amount of personal information that can be inferred from room climate data is highly dependent from the granularity of the measurements, and also from the placement of the sensors. In this way, our findings may provide a starting point for data minimization guidelines, as they indicate that privacy in smart heating systems can be protected by adjusting data collection granularity and sensor placement.

7.3. Location-independent classification

An important question is whether it is possible to perform location-independent classification, i.e., to train the classifier with sensor data of one location and then use it to classify sensor data at the target location that is not similar to the training location in size, layout, and sensor positions. If this was possible, the service providers of smart heating applications would be able to detect occupancy and to recognize activities without having access to the target locations.

According to their privacy statements, popular smart thermostats from Nest [37], Ecobee [13], and Honeywell [24] send measured climate data to the service providers’ databases. To evaluate these privacy threats, we used the room climate data of the best-performing sensor of a location as training data set for other locations. For example, to classify events of an arbitrary sensor of Location A, we trained the classifier with room climate data collected by Sensor B1 or Sensor C1. We gained statistically significant results for a few combinations in occupancy detection but the majority of our occupancy detection results was not significant. Since discriminating between one and two persons proved unreliable, occupancy estimation was consequently excluded. For activity recognition, we were not able to gain statistically significant results.

However, the possibility of location-independent attackers cannot be excluded. Absence of significant results in our experiments may be merely due to the limited amount of data. Future studies should be conducted to gather data from various rooms up to a point where the combined results hold for arbitrary locations. Having more data from a multitude of rooms available would help the machine learning classifiers to recognize and ignore data characteristics that are specific to either of the experimental rooms. Consequently, the algorithms could better identify the distinct data characteristics of the different classes in occupancy detection and activity recognition. This would enable location-independent classification of room climate data, in which the training location is not similar to the target location regarding size, layout, furnishing, and positions of the sensors.

7.4. Future work

We think that the idea of sharing smart home data for various benefits will continue to be intensively discussed in the future, and therefore, consumers and policy makers should be made aware of the level of detail inferable from smart home data. Which rewards are actually beneficial for consumers? Moreover, which kind of data sharing is ethically permissible? Only by answering these questions it would be possible to design fair policies and establish beneficial personal data markets [47]. In this work, we take the first step towards informing the policy for the smart heating scenario.

We further suggest to investigate countermeasures against the revealed privacy threats. Apart from data minimization techniques mentioned in Section 7.2, such as regulating the frequency of measurements and the placement of the sensors, other privacy-enhancing measures could be considered. Would the addition of noise onto the measured sensor data decrease the information gain significantly? How does the clothing of occupants impact the classification results? Can the influences of human activities on the room climate be recreated by technical means to simulate occupancy of unoccupied spaces? Future work can address these and further conceivable countermeasures as well as their effort, costs, and impact.

8. Conclusions

We investigated the common belief that data collected by room climate sensors divulge private information about the occupants. To this end, we conducted experiments aiming to reflect realistic conditions, i.e., considering an attacker who has access to typical room climate data (temperature and relative humidity) only. Our experiments revealed that knowing a sequence of temperature and relative humidity measurements already allows the detection of the presence of people and to recognize certain activities with high accuracy. Contrarily, the distinction between the presence of one or two persons is evidently harder, while using data from two different sensors slightly improves occupancy detection, but activity recognition remains a hard task. Nonetheless, our results confirm that the need for protection of room climate data is justified: the leakage of such ‘inconspicuous’ sensor data as temperature and relative humidity can seriously violate privacy in smart spaces. Future work is required to determine the level of privacy invasion in more depth and develop appropriate countermeasures.

Footnotes

Acknowledgments

We would like to thank all anonymous reviewers for their invaluable comments. The work is supported by the German Research Foundation (DFG) under Grant AR 671/3-1: WSNSec – Developing and Applying a Comprehensive Security Framework for Sensor Networks.

References

Ai,

Fan and

R.X.

Gao, Occupancy estimation for smart buildings by an auto-regressive hidden Markov model, in: American Control Conference, ACC 2014, Portland, OR, USA, June 4–6, 2014, IEEE, 2014, pp. 2234–2239. ISBN 978-1-4799-3272-6. doi:10.1109/ACC.2014.6859372.

BSI, Protection Profile for the Gateway of a Smart Metering System (Smart Meter Gateway PP), 2014, https://www.commoncriteriaportal.org/files/ppfiles/pp0073b_pdf.pdf.

L.M.

Candanedo and

Feldheim, Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models, Energy and Buildings 112 (2016), 28–39. ISSN 0378-7788. https://www-sciencedirect-com.web.bisu.edu.cn/science/article/pii/S0378778815304357. doi:10.1016/j.enbuild.2015.11.071.

Cavoukian,

Polonetsky and

Wolf, SmartPrivacy for the Smart Grid: Embedding privacy into the design of electricity conservation, Identity in the Information Society 3(2) (2010), 275–294. ISSN 1876-0678. doi:10.1007/s12394-010-0046-y.

Chaos Computer Club: Guidelines for Smart Home solutions, 2016, (in German).

Cicirelli,

Fortino,

Giordano,

Guerrieri,

Spezzano and

Vinci, On the design of smart homes: A framework for activity recognition in home environment, Journal of Medical Systems 40(9) (2016), 200. ISSN 1573-689X. doi:10.1007/s10916-016-0549-7.

Moteiv Corporation, Tmote Sky Datasheet, Moteiv Corporation, 2006.

Deloitte , Ready for Takeoff? 2015, Consumer Survey, (in German), https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/smart-home-consumer-survey.html.

Dong,

Andrews,

K.P.

Lam,

Höynck,

Zhang,

Y.-S.

Chiou and

Benitez, An information technology enabled sustainability test-bed (ITEST) for occupancy detection through an environmental sensing network, Energy and Buildings 42(7) (2010), 1038–1046. ISSN 0378-7788. https://www-sciencedirect-com.web.bisu.edu.cn/science/article/pii/S037877881000023X. doi:10.1016/j.enbuild.2010.01.016.

10.

Dunkels,

Grönvall and

Voigt, Contiki – a lightweight and flexible operating system for tiny networked sensors, in: 29th Annual IEEE International Conference on Local Computer Networks, 2004, IEEE, 2004, pp. 455–462. doi:10.1109/LCN.2004.38.

11.

Ebadat,

Bottegal,

Varagnolo,

Wahlberg and

K.H.

Johansson, Estimation of building occupancy levels through environmental signals deconvolution, in: BuildSys 2013, Proceedings of the 5th ACM Workshop on Embedded Systems for Energy-Efficient Buildings, Roma, Italy, November 13–14, 2013, 2013, pp. 8–188. doi:10.1145/2528282.2528290.

12.

Ebadat,

Bottegal,

Varagnolo,

Wahlberg and

K.H.

Johansson, Regularized deconvolution-based approaches for estimating room occupancies, IEEE Trans. Automation Science and Engineering 12(4) (2015), 1157–1168. doi:10.1109/TASE.2015.2471305.

13.

Ecobee , Privacy Policy & Terms of Use, 2015.

14.

Ekwevugbe,

Brown,

Pakka and

Fan, Real-time building occupancy sensing using neural-network based sensor network, in: 7th IEEE International Conference on Digital Ecosystems and Technologies (DEST) 2013, 2013, pp. 114–119. ISSN 2150-4938. doi:10.1109/DEST.2013.6611339.

15.

European Union Agency For Network And Information Security, Security and Resilience of Smart Home Environments – Good practices and recommendations, 2015.

16.

Fan,

Xie,

Li,

Huang,

Wang,

Chen,

Xie and

Chen, Activity recognition as a service for smart home: Ambient assisted living application via sensing home, in: 2017 IEEE International Conference on AI Mobile Services (AIMS), 2017, pp. 54–61. doi:10.1109/AIMS.2017.29.

17.

Ghaffarzadegan,

Reiss,

Ruhs,

Duerichen and

Feng, Occupancy detection in commercial and residential environments using audio signal, in: Proc. Interspeech 2017, 2017, pp. 3802–3806. doi:10.21437/Interspeech.2017-524.

18.

Greveler,

Glösekötterz,

Justusy and

Loehr, Multimedia content identification through smart meter power usage profiles, in: Proceedings of the International Conference on Information and Knowledge Engineering (IKE), 2012.

19.

Hailemariam,

Goldstein,

Attar and

Khan, Real-time occupancy detection using decision trees with multiple sensor types, in: 2011 Spring Simulation Multi-Conference, SpringSim ’11, Boston, MA, USA, April 03-07, 2011, 2011, pp. 141–148. http://dl.acm.org/citation.cfm?id=2048555.

20.

Hall,

Frank,

Holmes,

Pfahringer,

Reutemann and

I.H.

Witten, The WEKA data mining software: An update, SIGKDD Explor. Newsl. 11(1) (2009), 10–18. ISSN 1931-0145. doi:10.1145/1656274.1656278.

21.

Han,

R.X.

Gao and

Fan, Occupancy and indoor environment quality sensing for smart buildings, in: 2012 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), 2012, pp. 882–887. ISSN 1091-5281. doi:10.1109/I2MTC.2012.6229557.

22.

G.W.

Hart, Residential energy monitoring and computerized surveillance via utility power flows, Technology and Society Magazine, IEEE 8(2) (1989), 12–16. doi:10.1109/44.31557.

23.

Hastie,

Tibshirani and

J.H.

Friedman, The Elements of Statistical Learning, 2nd edn, Springer, New York, NY, USA, 2009.

24.

Honeywell, Honeywell connected home privacy statement, 2015.

25.

Huppert,

Paulus,

Paulsen,

Burkart,

Wullich and

Eskofier, Quantification of nighttime micturition with an ambulatory sensor-based system, IEEE Journal of Biomedical and Health Informatics 20(3) (2016), 865–872. doi:10.1109/JBHI.2015.2421487.

26.

Intel Security: Intel Security’s International Internet of Things Smart Home Survey Shows Many Respondents Sharing Personal Data for Money, 2016, https://newsroom.intel.com/news-releases/intel-securitys-international-internet-of-things-smart-home-survey.

27.

Jawurek,

Johns and

Kerschbaum, Plug-in privacy for smart metering billing, in: Privacy Enhancing Technologies – 11th International Symposium, PETS 2011, Proceedings, Waterloo, ON, Canada, July 27–29, 2011,

Fischer-Hübner and

Hopper, eds, Lecture Notes in Computer Science, Vol. 6794, Springer, 2011, pp. 192–210. ISBN 978-3-642-22262-7. doi:10.1007/978-3-642-22263-4_11.

28.

Jawurek,

Kerschbaum and

Danezis, in: SoK: Privacy Technologies for Smart Grids – a Survey of Options., Microsoft Res, Cambridge, UK, 2012.

29.

Jensen,

Blank,

Kugler and

Eskofier, Unobtrusive and energy-efficient swimming exercise tracking using on-node processing, IEEE Sensors Journal 16(10) (2016), 3972–3980. doi:10.1109/JSEN.2016.2530019.

30.

Kursawe,

Danezis and

Kohlweiss, Privacy-friendly aggregation for the smart-grid, in: Privacy Enhancing Technologies – 11th International Symposium, PETS 2011, Proceedings, Waterloo, ON, Canada, July 27–29, 2011,

Fischer-Hübner and

Hopper, eds, Lecture Notes in Computer Science, Vol. 6794, Springer, 2011, pp. 175–191. ISBN 978-3-642-22262-7. doi:10.1007/978-3-642-22263-4_10.

31.

K.P.

Lam,

Höynck,

Dong,

Andrews,

Y.-S.

Chiou,

Benitez and

Choi, Occupancy detection through an extensive environmental sensor network in an open-plan office building, in: Proc. of Building Simulation 09, an IBPSA Conference, 2009.

32.

Lu,

Sookoor,

Srinivasan,

Gao,

Holben,

Stankovic,

Field and

Whitehouse, The smart thermostat: Using occupancy sensors to save energy in homes, in: Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems, ACM, 2010, pp. 211–224.

33.

M.K.

Masood,

Y.C.

Soh and

V.W.

Chang, Real-time occupancy estimation using environmental parameters, in: 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, Ireland, July 12–17, 2015, IEEE, 2015, pp. 1–8. ISBN 978-1-4799-1960-4. doi:10.1109/IJCNN.2015.7280781.

34.

H.D.

Mehr,

Polat and

Cetin, Resident activity recognition in smart homes by using artificial neural networks, in: 2016 4th International Istanbul Smart Grid Congress and Fair (ICSG), 2016, pp. 1–5. doi:10.1109/SGCF.2016.7492428.

35.

Molina-Markham,

Shenoy,

Fu,

Cecchet and

Irwin, Private memoirs of a smart meter, in: Proceedings of the 2Nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building, BuildSys ’10, ACM, New York, NY, USA, 2010, pp. 61–66. ISBN 978-1-4503-0458-0. doi:10.1145/1878431.1878446.

36.

Morgner,

Müller,

Ring,

Eskofier,

Riess,

Armknecht and

Benenson, Privacy implications of room climate data, in: Computer Security – ESORICS 2017 – 22nd European Symposium on Research in Computer Security, Proceedings, Part II, Oslo, Norway, September 11–15, 2017, 2017, pp. 324–343.

37.

Nest , Privacy Statement for Nest Products and Services, 2016.

38.

icontrol Networks: 2015 State of the Smart Home Report.

39.

T.H.

Pedersen,

K.U.

Nielsen and

Petersen, Method for room occupancy detection based on trajectory of indoor climate sensor data, Building and Environment 115 (2017), 147–156. ISSN 0360-1323. https://www-sciencedirect-com.web.bisu.edu.cn/science/article/pii/S0360132317300367. doi:10.1016/j.buildenv.2017.01.023.

40.

R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2014. http://www.R-project.org.

41.

Rainie and

Duggan, Pew Research: Privacy and Information Sharing, 2016, http://www.pewinternet.org/2016/01/14/privacy-and-information-sharing.

42.

Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Official Journal of the European Union L 119 (2016), 1–88, http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L:2016:119:TOC.

43.

Reinhardt,

Englert and

Christin, Averting the privacy risks of smart metering by local data preprocessing, Pervasive and Mobile Computing 16 (2015), 171–183. doi:10.1016/j.pmcj.2014.10.002.

44.

Rial and

Danezis, Privacy-preserving smart metering, in: Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society, WPES ’11, ACM, New York, NY, USA, 2011, pp. 49–60. ISBN 978-1-4503-1002-4. doi:10.1145/2046556.2046564.

45.

Ring,

Jensen,

Kugler and

Eskofier, Software-based performance and complexity analysis for the design of embedded classification systems, in: Proceedings of the 21st International Conference on Pattern Recognition, ICPR, Tsukuba, Japan, November 11–15, 2012, IEEE Computer Society, 2012, pp. 2266–2269. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=6460616. ISBN 978-1-4673-2216-4.

46.

Selinger, Test: Smart Home Kits Leave the Door Wide Open – for Everyone, 2014, https://www.av-test.org/en/news/news-single-view/test-smart-home-kits-leave-the-door-wide-open-for-everyone/.

47.

Spiekermann,

Acquisti,

Böhme and

K.-L.

Hui, The challenges of personal data markets and privacy, Electronic Markets 25(2) (2015), 161–167. doi:10.1007/s12525-015-0191-0.

48.

Sprint,

Cook,

Fritz and

Schmitter-Edgecombe, Detecting health and behavior change by analyzing smart home sensor data, in: 2016 IEEE International Conference on Smart Computing (SMARTCOMP), 2016, pp. 1–3. doi:10.1109/SMARTCOMP.2016.7501687.

49.

van Kasteren,

Noulas,

Englebienne and

Kröse, Accurate activity recognition in a home setting, in: Proceedings of the 10th International Conference on Ubiquitous Computing, ACM, 2008.

50.

I.H.

Witten,

Frank and

M.A.

Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn, Morgan Kaufmann, Burlington, MA, USA, 2011.

51.

Wörner,

von Bomhard,

Roeschlin and

Wortmann, Look twice: Uncover hidden information in room climate sensor data, in: 4th International Conference on the Internet of Things, IoT 2014, Cambridge, MA, USA, October 6–8, 2014, IEEE, 2014, pp. 25–30. ISBN 978-1-4799-5154-3. doi:10.1109/IOT.2014.7030110.

52.

Yang,

Li,

Qi,

Qardaji,

McLaughlin and

McDaniel, Minimizing private data disclosures in the smart grid, in: Proceedings of the 2012 ACM Conference on Computer and Communications Security, ACM, 2012, pp. 415–427.

53.

Yang,

Li,

Becerik-Gerber and

M.D.

Orosz, A systematic approach to occupancy modeling in ambient sensor-rich buildings, Simulation 90(8) (2014), 960–977. doi:10.1177/0037549713489918.

54.

Zhang,

K.P.

Lam,

Y.-S.

Chiou and

Dong, Information-theoretic environment features selection for occupancy detection in open office spaces, Building Simulation 5(2) (2012), 179–188. ISSN 1996-8744. doi:10.1007/s12273-012-0075-6.

55.

Zimmermann,

Weigel and

Fischer, Fusion of non-intrusive environmental sensors for occupancy detection in smart homes, IEEE Internet of Things Journal 5(4) (2018), 2343–2352. ISSN 2327-4662. doi:10.1109/JIOT.2017.2752134.

Privacy implications of room climate data

Abstract

Keywords

1. Introduction

1.1. Research questions

1.2. Experiments

Table 1 Overview of the scenarios analyzed in [36] and this work Occupancy detection Activity recognition Occupancy estimation Single sensor [36] and this work [36] and this work This work Two sensors This work This work This work

1.4. Outline

2. Related work

4. Experimental design and methods

4.1. Experimental setup and tasks

4.2.1. Hardware

4.2.2. Software

4.2.3. Collected data

2 https://github.com/IoTsec/Room-Climate-Datasets

Table 4 Demographic data of participants, μ denotes the average, σ denotes the standard deviation Characteristic Location A B C Gender f: 3 2 5 m: 11 10 5 Weight [kg] μ: 74.9 81.7 63.1 σ: 8.0 12.1 10.0 Height [cm] μ: 175.9 178.4 170.7 σ: 9.2 5.3 9.3 Age μ: 33.7 30.3 25.6 σ: 8.2 4.8 2.8

3 Ethical review boards at both locations only consider medical experiments.

5. Results

5.2. Occupancy detection

5.4. Activity recognition

6.1. Length of measurement windows

4 Note that some features are based on both, temperature and relative humidity, which is why the sum of both numbers exceeds 100 % .

6.4. Position of sensors

7. Discussion

7.1. Privacy implications

7.2. Policy implications

5 Methodological details, such as representativeness, breakdown by country and the exact formulation of the questions, are not known about this survey.

7.4. Future work

8. Conclusions

Footnotes

Acknowledgments

References

Table 1
Overview of the scenarios analyzed in [36] and this work

Occupancy detection Activity recognition Occupancy estimation

Single sensor [36] and this work [36] and this work This work

Two sensors This work This work This work

²
https://github.com/IoTsec/Room-Climate-Datasets

³
Ethical review boards at both locations only consider medical experiments.

⁴
Note that some features are based on both, temperature and relative humidity, which is why the sum of both numbers exceeds $100 %$ .

⁵
Methodological details, such as representativeness, breakdown by country and the exact formulation of the questions, are not known about this survey.