Abstract
Smart heating applications promise to increase energy efficiency and comfort by collecting and processing room climate data. While it has been suspected that the sensed data may leak crucial personal information about the occupants, this belief has up until now not been supported by evidence.
In this work, we investigate privacy risks arising from the collection of room climate measurements. We assume that an attacker has access to the most basic measurements only: temperature and relative humidity. We train machine learning classifiers to predict the presence and number of room occupants and to discriminate between different types of activities. On data that was collected at three different locations, we show that occupancy can be detected from data measured by a single sensor with up to
Our results provide evidence that even the leakage of such ‘inconspicuous’ data as temperature and relative humidity can seriously violate privacy.
Introduction
The vision of the Internet of Things (IoT) is to enhance work processes, energy efficiency, and living comfort by interconnecting actuators, mobile devices and sensors. These networks of embedded technologies enable applications such as smart heating, home automation, and smart metering, among many others. Sensors are of crucial importance in these applications. Data gathered by sensors is used to represent the current state of the environment, for instance in smart heating, sensors measure the room climate. Using these information and a user-defined configuration of the targeted state of room climate, the application regulates heating, ventilation, and air conditioning.
While the collection of room climate data is obviously essential to enable smart heating, it may at the same time impose the risk of privacy violations. Consequently, it is commonly believed among security experts that leaking room climate data may result in privacy violations and hence that the data needs to be cryptographically protected [5,15,46]. However, these claims have not been supported by scientific evidence so far. Thus, one could question whether in practice additional effort for protecting the data would be justified.
The current situation with room climate data is comparable to the area of smart metering [22,27,30,52]. In 1989, Hart [22] was the first to draw attention to the fact that smart metering appliances can be exploited as surveillance devices. Since then, research has shown far-reaching privacy violations through fine-granular power consumption monitoring, ranging from occupancy and everyday activities detection [35] up to recognizing which program a TV was displaying [18].
Various techniques have been proposed over the years to mitigate privacy risks of smart metering [4,28,43,44,52]. This issue has become such a grave concern that the German Federal Office for Information Security published a protection profile for smart meters in 2014 [2]. By considering privacy implications of smart heating, we hope to initiate consumer protection research and policy debate in this area, analogous to the developments in smart metering described above.
Research questions
In this work, we are the first to investigate room climate data from the perspective of possible privacy violations. More precisely, we address the following research questions:
Occupancy detection: Can an attacker determine the presence of a person in a room using only room climate data, i.e., temperature and relative humidity?
Occupancy estimation: Can an attacker determine the number of persons present in a room using only this room climate data?
Activity recognition: Can an attacker recognize activities of the occupant in the room using only the temperature and relative humidity data?
Our threat scenario targets buildings with multiple rooms that are similar in size, layout, furnishing, and positions of the sensors. These properties are typical for office buildings, dormitories, cruise ships, and hotels, among others. Assuming that an attacker is able to train a classifier that recognizes pre-defined activities, possible privacy violations are, e.g., tracking presence and working practices of employees in offices, or the disclosure of lifestyle and intimate activities in private spaces. All these situations present intrusions in the privacy of the occupants. In contrast to surveillance cameras and motion sensors, the occupant does not expect to be monitored. Also, legal restrictions regarding privacy might apply to surveillance cameras and motion sensors but not to room climate sensors.
Experiments
To evaluate these threats, we present experiments that consider occupancy detection, occupancy estimation, and activity recognition based on the analysis of room climate data from a privacy perspective. We measured room climate data in three office-like rooms and distinguished between the activities reading, standing, walking, and working on a laptop. The data was collected from sensors that measure temperature and relative humidity at a regular time interval of a few seconds.
Reflecting the most restrictive scenario, we analyzed how much information can be derived from the measurements of a single sensor only. Although we assume that in smart heating applications, only one sensor per room is most likely to be installed, each room was equipped with several sensors in order to evaluate any impact of the position of the sensor in the room. Additionally, we combined and evaluated data from two sensors simultaneously to investigate whether increasing the available data affects classification noticeably.
In our procedure, occupants performed a pre-defined sequence of tasks in the experimental space. In sum, we collected almost 115 hours of room climate sensor data from a total of 36 participants. The collected room climate data was analyzed using an off-the-shelf machine learning classification algorithm.
This work is an enhanced version of [36] and extends the analysis significantly as follows:
We investigate not only if the presence of persons can be detected but also if the number of persons can be estimated (occupancy estimation). The new results are displayed in Table 6 and are discussed in Section 5.3. We analyze for all considered scenarios (including occupancy estimation) whether the combination of measurements taken from two sensors instead of one allows to improve the overall results. The new results are displayed in Table 7 and Table 8 and are discussed in Section 5.5.
An overview of the additional contributions can be found in Table 1.
Overview of the scenarios analyzed in [36] and this work
Overview of the scenarios analyzed in [36] and this work
Evaluating our collected room climate data, the attacker detects the presence of a person with detection rates of up to
Outline
The remainder of this paper is organized as follows. In Section 2, we give an overview of related work. Section 3 presents the threat model considered in this work. In Section 4, we introduce the experimental design and methods. The results and complementary observations of our experiments are presented and discussed in Sections 5 to 7, respectively. We draw conclusions in Section 8.
Related work
Over the last decade, several experiments have been conducted to detect occupancy in sensor-equipped spaces and to recognize people’s activities as summarized in Table 2. Activity recognition has been considered for basic activities, such as leaving or arriving at home, or sleeping [32], as well as for more detailed views, including toileting, showering, and eating [49].
Most of the previous research uses types of sensors that are different from temperature and relative humidity. For example,
Overview of previous experiments on occupancy detection (D), occupancy estimation (E), which aims at determining the number of people in a room, and activity recognition (A) with a focus on selected sensors; AML denotes acoustic, motion, and lighting sensors
Overview of previous experiments on occupancy detection (D), occupancy estimation (E), which aims at determining the number of people in a room, and activity recognition (A) with a focus on selected sensors; AML denotes acoustic, motion, and lighting sensors
Occupancy detection. The authors of [19] present a system aiming at occupancy detection. They conclude that using only PIR-based sensors delivers the most reliable results for their cubicle office setting. Using the information collected by a Netatmo weather station with a granularity of five minutes, [51] provides evidence that the station’s CO2 sensor data leaks information of a room’s occupancy. Detecting presence of one or two persons in a two-people office is presented in [3]. The authors used temperature, humidity, light, and CO2 sensors, whereas they achieved the best results when combining data from only two of those, concluding that any combination including the lighting sensor, and likewise, the time of day help overall performance. A single-person office and a three-room dormitory (shared by two persons), each equipped with temperature, humidity, CO2, volatile organic compound, PIR, and acoustics sensors, are used for occupancy detection in [39]. As seen before, CO2 combined with PIR is a suitable solution for determining the state of a room.
Occupancy estimation. Detecting occupancy levels ranging from zero to three in an open-plan office setting is presented in [31]. To this end, the authors use several different sensor types, i.e., CO, CO2, total volatile organic compounds, small particulates, acoustics, illumination, motion, temperature, and relative humidity sensors, whereas the latter two are not contributing to their findings. The authors deem the CO2 and acoustics information the most valuable. This work is reaffirmed and extended by [9] with occupancy levels ranging from zero to four. In [54], the authors use the same workplace and sensor node settings. Here, relative humidity and temperature contribute to their results. In an office setting, [21] presents a system dedicated to detecting occupancy levels ranging from zero to six occupants while, at the same time, providing a measure of indoor air quality. They deploy sensor nodes comprising PIR, CO2, air velocity, temperature, and relative humidity sensors. Continuing this work, the authors of [1] re-used the previously developed system to increase its performance for occupancy estimation. In [14], a sensor node network is deployed in an open-plan office. They evaluated the usefulness of using acoustics, (desktop PC case) temperature, CO2, volatile organic compound, PIR, temperature, and relative humdity sensors. For their scenario, estimate occupancy levels ranging from zero to five, they focused on acoustics, (desktop PC case) temperature, CO2, and PIR as the best combination. In [11], estimating occupancy levels from zero to four in an office setting is presented, using HVAC actuation, temperature, and CO2 sensors. This work is extended by [12]. They additionally use the information from deployed door switches. The authors of [53] evaluate the usefulness of various sensor types in an office building. For occupancy detection and estimation, they find CO2, door status, and lighting levels the most valuable, thereby excluding the deployed temperature and relative humidity sensors. In [33], the authors aim at estimating occupancy levels of a class room for 35 people, with a maximum number of occupants 22 being present, utilizing CO2, temperature, and barometric pressure sensors. Especially, the latter was deemed useful. However, they targeted five different levels of occupancy of five values each, i.e., 0–4, 5–9, and so forth. Another interesting approach is to derive (the level of) occupancy from acoustics sensors only in a residential and commercial setting as presented in [17]. They use audio data from a living room with levels of occupancy from zero to three and a single-person office with occupancies ranging from zero to two. Using an indoor air quality system including CO2, volatile organic compound, temperature, and relative humidity sensors to detect occupancy levels in four student dormitories is shown in [55]. The occupancy levels ranged from zero to three, whereas temperature and humidity are far less valuable than CO2.
Activity recognition. The authors of [49] deployed 14 state-change nodes (or digital switches) that were installed in key positions in a residential apartment. For instance, one node was installed as a toilet flush sensor, which would notice and report toileting activity. Their goal was to automatically detect daily living activities via a strategically deployed sensor/switch network. Similarly, [34] presents a system to detect routine activities of daily living using data collected from 77 state-change sensors deployed in a residential apartment. In [32], the main goal is to achieve savings by autonomously regulating the heating of a residential HVAC system. By using information from deployed motion (PIR) sensors and door switches, a central unit makes a decision about the current occupancy state. They further use historical data to establish a baseline occupancy-sleep-absence pattern, serving the purpose of heating the home before the occupants are expected to return home. Another use case for activity recognition is ambient assisted living. In [48], the idea is to detect health or behavior change in a residential apartment. For this, they use motion, lighting, temperature sensors, and door switches to detect activities of daily living. Likewise, the authors of [6] develop a framework allowing recognizing activities in a residential home. For better resolution, they complement PIR sensors and switches with wearables. Using temperature, humidity, small particulates, and PIR sensors, door switches, wearables, and HVAC events, [16] offers activity recognition for ambient assisted living as a service.
This work. In contrast to previous work, our results rely exclusively on temperature1
Note that in [3], additional classification results using temperature as the only predictor are reported in the range of
The overall aim of our work is to understand the potential privacy implications if room climate data is accessed by other parties. Consequently, we consider in general a threat model where the attacker is an outsider, i.e., a party that is not present in the room, who gets access to the room climate data. The goal of the attacker is to extract information from the data that otherwise would not be known to her. Obviously, the term “information” is subject to manifold interpretations. Moreover, the type and granularity of measurements are most likely having impact on what kind of information can be extracted. Therefore, we need to make more explicit what information an attacker is aiming for and what kind of measurements are at her disposal.
In our work, we focus on the question whether basic information are extractable. That is, we consider as attack goal to gain information about the state of occupancy, i.e., if and how many persons are present in the room, as well as the activity of the occupants without their consent. With respect to the available data, it is obvious that the more information an attacker can gather, the more likely she can deduce privacy-harming information from the measurements. However, in practice only few sensors will be installed within one room, especially as one sensor node per room is sufficient to monitor the room climate, and these sensors will measure a very limited set of data only. Therefore, we base our analysis on the attacker model that considers a room climate system where only few sensor nodes are used and only basic data is measured.
That is, the main focus of our experiments will be on the case that only one single sensor is present. If it turns out that a single sensor already provides sufficient data to derive privacy-harming information about the occupants, the situation may get even worse when more sensors are accessible. To examine this, we will also consider the case that an attacker can make use of the measurements of two different sensors to analyze if and how much the attacker performs better compared to the 1-sensor-case.
In out threat model, we assume that this sensor node takes only the two most basic measurements, temperature and relative humidity. These data are the fundamental properties to describe room climate. Note that our restricted data is in contrast to existing work (cf. Table 2 and Section 2) that based their experiments on more types of measurements or used data that is less common to characterize room climate.
We consider a sensor system that measures the climate of a room, denoted as target location. At the target location, temperature and relative humidity sensors are installed that report the measured values in regular intervals to a central database. We consider an attacker model where the attacker has access to this database and aims to derive information about the occupants at the target location. Furthermore, we assume that the attacker has access to either the target location itself, or rooms similar in size, layout, sensor positions, and furniture. Such situations are given, for example, at office buildings, hotels, cruise ships, and student dormitories. These locations, denoted as training location, are used to train the classifier, which is a machine learning algorithm learning the input data labeled with the ground truth. As the attacker has full control over the training location, she can freely choose what actions are taking place during the measurements. For example, she could do measurements while no persons are present at the training location, or some persons are present and execute predefined activities.
There are various scenarios, in which an attacker has incentives to collect and analyze room climate data. For example, the management of a company aims at observing the presence and working practices of employees in the offices. In another case, a provider of private spaces (hotels, dormitories, etc.) wants to disclose lifestyle and intimate activities in these spaces. This information may be utilized for targeted advertising or sold to insurance companies. In any case, the evaluation of room climate data provides the attacker with the possibility to undermine the privacy of the occupants.
The procedure of these attacks is as follows: First, the attacker collects training data at a training location, which might be the target location or other rooms similar in size, layout, sensor positions, and furniture. The attacker also records the ground truth for all events that shall be distinguished. Examples of events are number of persons present in the room (including the case that the room is empty), or different activities such as working, walking, and sleeping. The training data is recorded with a sample rate of a few seconds and split into windows (i.e., a temperature curve and a relative humidity curve) of same time lengths, usually one to three minutes. Using the collected training data, the attacker trains a machine learning classifier. After the classifier is trained, it can be used to classify windows of climate data from the target location to determine the events. The classifier works on previously collected data, thus reconstructing past events, and also on live-recorded data, thus determining current events “on-the-fly” at the target location.
Note that we do not claim that analyzing room climate data is the most effective attack to gain the information specified above. It is our ambition to demonstrate that room climate data potentially reveals privacy-harming information and hence needs to be protected. In other words: the question is not “What is the best attack to determine occupancy and/or activities?” but “Should room climate data be considered private data?”. Note that our experiments show that occupancy and activities can be decided in almost real-time, i.e., after measuring the room temperature and humidity for a small time window. Even if an attacker may get access to such data only at some later point in time, the risk of privacy violation remains, e.g., the measurements may allow to determine the user profile.
Experimental design and methods
We conducted a study to investigate the feasibility of detecting and estimating occupancy as well as inferring activities in an office environment from temperature and relative humidity. From March to April 2016, we performed experiments at two locations simultaneously, Location A and Location B, with a distance of approximately 200 km between them. In addition, from January to February 2017, we conducted further experiments at a third location, denoted as Location C, which is located in the same building as Location B.
Experimental setup and tasks
The experimental spaces at the three locations are different in size, layout, and positions of the sensors. Thus, each target location is also the training location in our study. At Location A, the room has a floor area of 16.5 m2 and was equipped with room climate sensors at four positions as shown in Fig. 1(ii). At Location B, the room has a floor area of 30.8 m2, i.e., roughly twice as much as at Location A, and had room climate sensors installed at three positions as illustrated in Fig. 1(i). Location C has a floor area of 13.9 m2 and was equipped with room climate sensors at five positions as shown in Fig. 1(iii). In all locations, the room climate sensors measured temperature and relative humidity. The number of deployed sensors varied due to limitations of hardware availability.

Floor plans of the experiment spaces including sensor node locations, h indicates the node’s height.
Our goal was to determine to which extent the presence and activities of occupants influence the room climate data. Therefore, we measured temperature and relative humidity during phases of absence as well as phases of presence. If occupants were present, these persons had to perform one task or a sequence of tasks. We defined the following experimental tasks (see also Fig. 2):
Sit on an office chair next to a desk and read.
Stand in the middle of the room, try to avoid movements.
Walk slowly and randomly through the room.
Sit on an office chair next to a desk and use a laptop, which is located on the desk.

The defined tasks performed by participants at Location A.
To eliminate confounding factors, we defined location default settings applying to all locations. Essentially, all windows were required to remain closed and no person was allowed in the room when not in use for the experiment. The rooms have radiators for heating, which were adjusted to a constant level. At Location A and B, we used shutters fixed in such positions that enough light was provided for reading and working.
We used a homogeneous hardware and software setup at all locations for data collection, which is described in the following.
Hardware
At each location, we set up a sensor network consisting of several Moteiv Tmote Sky sensor nodes with an integrated IEEE 802.15.4-compliant radio [7] as well as an integrated temperature and relative humidity sensor. The nodes have the Contiki operating system [10] version 2.7 installed. In addition, we deployed a webcam that took pictures in a 3-second interval at Location A. These were used for verification during the data collection phase only, and were not given to the classification algorithms.
Software
For sensor data collection, we customized the Collect-View application included in Contiki 2.7, which provides a graphical user interface to manage the sensor network. For our purposes, we implemented an additional control panel offering a customized logging system. The measurement settings of the Collect-View application were set to a report interval of 4 seconds with a variance of 1 second, i.e., each sensor node reported its current values in a time interval of
Collected data
We structured data collection in units and aimed for a good balance between presence and absence as well as the different tasks among all units, as this is needed for the later analysis using machine learning. Each unit has a fixed time duration, t, where exactly one or two persons were present (
Overall, we collected around 115 hours of sensor data, 66 hours with at least one person being present. A more extensive overview of the amount of measured sensor data is shown in Table 3. To encourage replication and further investigations, all collected sensor data is available as open data sets on GitHub.2
Measured sensor data of all locations (in hours)
The participants were assigned to at least one experimental unit with fixed presence times and tasks, and provided with a script for their actions (that is, for how long and in which order the tasks should be performed). Every participant performed each unit twice, with the same tasks, but possibly on different days and in a permuted chronological order. Tasks were performed in blocks of 10, 20, or 30 minutes. Thus, 10-minute units contained only one task of 10 minutes; 30-minute units consisted of either three tasks of 10 or one task of 10 plus one of 20 minutes; 60-minute units were composed of either two tasks of 20 plus two of 10, or one task of 10, 20, and 30 minutes each.
At the beginning of the presence time for each unit, i.e., the time period where one or two persons had to be present, the experimental supervisor unlocked the room door to let the participants in. The participants started with the first task and were instructed by phone (at Locations A and C) or through the glass pane (at Location B) when it was time to change activities or to leave the room.
Demographic data of participants, μ denotes the average, σ denotes the standard deviation
Demographic data of participants, μ denotes the average, σ denotes the standard deviation
For participating in the experiment, 14 subjects volunteered at Location A, 12 subjects at Location B, and 10 subjects at Location C as shown in Table 4. Demographic data of participants was collected in order to facilitate replication and future experiments. All subjects gave their written informed consent after the study protocol was approved by the data protection office.3
Ethical review boards at both locations only consider medical experiments.
We used classification to predict occupancy and activities in the rooms. We adopt an approach that has successfully been used in several applications of biosignal processing, namely extraction of a number of statistical descriptors with subsequent feature selection [25,29].
The features use measurements from short time windows. We experimented with windows of different lengths, namely 60 s, 90 s, 120 s, 150 s, and 180 s. The offset between two consecutive windows was set to 30 s. We excluded all windows where only a part of the measurements belongs to the same activity.
The feature set was composed from a number of statistical descriptors that were computed on temperature and humidity measurements within these windows. These are mean value, variance, skewness, kurtosis, number of nonzero values, entropy, difference between maximum and minimum value of the window (i.e., value range), correlation between temperature and humidity, and mean and slope of the regression line for the measurement window before the current window. Additionally, we subtracted from the measurements their least-square linear regression line, and computed all of the listed statistics on the subtraction residuals. Feature selection was performed using a sequential forward search [50, Ch. 7.1 & 11.8], with an inner leave-one-subject-out cross-validation [23, Ch. 7] to determine the performance of each feature set. For classification, we used the Naïve Bayes classifier. To avoid a bias in the results, we randomly selected identical numbers of windows per class for training, validation, and testing. For implementation, we used the ECST software [45], which wraps the WEKA library [20].
As performance measures, we use accuracy (i.e., the number of correctly classified windows divided by the number of all windows), and per-class sensitivity (i.e., the number of correctly classified windows for a specific class divided by the number of all windows of this class). Classification accuracy was deemed statistically significant if it was significantly higher than random guessing which is the best choice if the classifier could not learn any useful information during training. For each experiment, a binomial test with significance level
Note that neither the features nor the rather simple Naïve Bayes classifier are particularly tailored to predicting privacy leaks. However, we also show that such an unoptimized system is able to correctly predict occupancy and action types and hence produce privacy leaks. Higher detection rates results can be expected if more advanced classifiers are applied to this task.
Results
In this section, we present the experimental results. First, a visual inspection of the collected data is presented, followed by the machine learning-aided occupancy detection and estimation, and activity recognition using one or two sensors.

Visualization of two examples of room climate measurements. The grey background indicates the presence of the occupant in the experimental space.
We started our evaluation by analyzing the raw sensor data. Hence, we implemented a visualization script in MATLAB, which plots this data. The visualizations of two measurements are exemplarily depicted in Fig. 3.
The visualizations show an immediate rise of the temperature and humidity as soon as an occupant enters the room. Furthermore, variations in temperature and humidity increase rapidly and can be clearly seen. Thus, one can visually distinguish between phases of occupancy and non-occupancy. One can also notice different patterns during the performance of the tasks. As Fig. 3(i) shows, an occupant walking in the experimental space causes a constant increase of temperature and humidity with only small variations. In contrast, an occupant standing in the room causes the largest variations of humidity compared to the other defined tasks (cf. Fig. 3(ii)). The effects of the tasks reading and working on temperature and humidity in the depicted figures are very similar: both variables tend to increase showing medium variations. For further analysis of the data, we used machine learning as outlined in Section 4.5.
Occupancy detection
Occupancy detection describes the binary detection of occupants in the experimental space based on features from windows with length of 180 seconds (cf. Section 4.5). This is a two-class task, namely to distinguish whether an occupant is present (true) or not (false). We only considered training and testing data within the same room (but separated training and testing both by the days and participants of the acquisition). We randomly selected the same number of positive and negative cases from the data. Thus, simply guessing the state has a success probability of
Classification accuracy for occupancy detection. Notation: ‘Occup.’, sensitivity for class occupancy. ‘No Occup.’, sensitivity for class no occupancy. ‘Guess’, probability of correct guessing. ‘Acc.’, classification accuracy
Classification accuracy for occupancy detection. Notation: ‘Occup.’, sensitivity for class occupancy. ‘No Occup.’, sensitivity for class no occupancy. ‘Guess’, probability of correct guessing. ‘Acc.’, classification accuracy
Results for detecting multiple persons are shown in Table 6. Here, we first restate the results for deciding whether a single person is in a room or not, denoted as “0–1 person?”. Then, we report the results for deciding whether one or two persons are in a room, denoted as “1–2 persons?”. In the third column, we report results on the joint classification problem, denoted as “0–1–2 persons?”. While the first two columns cover two-class classification tasks (i.e., with guessing chance of
Activity recognition
Activity recognition reports the current activity of an occupant in the experimental space. The four activity tasks are described in Section 4.1. The recognition results for these tasks are shown in Fig. 4.
Activity4 classifies between the activities

Classification accuracy for occupancy detection and activity recognition. In each diagram, the guessing probability is plotted as a line. Each symbol represents the accuracy that we achieved with a single sensor. A circle marks a statistically significant result, while an ‘x’ represents a statistically insignificant result.
In the next step, we investigated whether an attacker can increase the recognition accuracies by distinguishing between a smaller set of activities. To this end, we combined two tasks to a meta task, e.g., the tasks
Occupancy detection and estimation accuracies for two- and three-class tasks at Locations A and B
The model Activity2 classifies between the tasks
For Activity2, our accuracy varies between
Another interesting question is whether the availability of data from multiple sensors from within a room can improve the prediction performance. To assess this question, we concatenated the feature vectors of two sensors prior to the classification. Consistently, the evaluation process was kept identical to the previous analyses.
Occupancy detection and estimation accuracies for two- and three-class tasks at Locations A and B based on two sensors. The differences to the accuracies of single sensors are given in parentheses. For example, the accuracy when measuring with sensors A1 and A2 increases to
. When measuring with A1 only, the accuracy is
(cf. Table 5), which constitutes the difference of
. The difference to measuring with A2 only is
Occupancy detection and estimation accuracies for two- and three-class tasks at Locations A and B based on two sensors. The differences to the accuracies of single sensors are given in parentheses. For example, the accuracy when measuring with sensors A1 and A2 increases to
Tbale 7 shows the results for occupancy estimation by using a pair of sensors. Overall, the detection accuracies improve by several percent when combining the feature vectors of two sensors. However, distinguishing between one and two persons in a room is still a hard task, with accuracies ranging approximately between
Activity Recognition accuracies for all three locations based on a single sensor and on two sensors
Table 8 shows the results for activity recognition. Results for a single sensor are shown on the left side, results for pairs of two sensors on the right side. Interestingly, classification of activities appears to be a similarly hard task for pairs of sensors as it is for single sensors: using the information from multiple sensors does not really improve the results for activity classification. One reason might be that the information between multiple sensors is correlated for this task, such that the addition of multiple sensors only enlarges the feature space, but does not add significant information. From a privacy perspective, however, this is somewhat soothing: although the achieved accuracies are in almost all cases well above guessing chance, it is apparently not a straightforward task to accurately predict the activity of a person in the room.
Length of measurement windows
The length of the measurement windows influences the accuracy of detection. We evaluated window sizes in the range between 60 and 180 seconds. Exemplarily, we analyzed the average accuracy of occupancy detection depending on the window size for all three locations. As shown in Fig. 5, the accuracy increases with a longer window size. We achieved the best results with the longest window sizes of 180 seconds.
This indicates that the highest accuracies are possible if longer time periods are considered. From a practical perspective, it is not advisable to extend the window size to a much larger duration than a few minutes since we assume that the performed activity is consistent for the whole duration of the window.

Average accuracy over all sensors from each location for occupancy detection depending on the window size.
To assess the feasibility of an attacker that has only access to either temperature data or relative humidity data, we evaluated whether it might be enough to solely collect one type of room climate data. In the classification process, an attacker derives a set of features from temperature and relative humidity data and selects the best-performing features for each sensor and classification goal automatically (cf. Section 4.5). Analysis shows that features computed from temperature and relative humidity are of similar importance. In our evaluation,
Note that some features are based on both, temperature and relative humidity, which is why the sum of both numbers exceeds
We also compared the features in terms of differences between the three locations as well as differences between occupancy detection and activity recognition. In all these cases, there are no significant differences between the importance of temperature and relative humidity. An attacker restricted to either temperature or relative humidity data will perform worse than with both data.
All our locations are office-like rooms, which have a similar layout (rectangular) but differ in size and furnishing. In our evaluation, the accuracy correlates with the size of the target location. As shown in Fig. 5, we had the highest average accuracy in occupancy detection with Location C, which has also the smallest ground area of 13.9 m2. Location A has a ground area of 16.5 m2, and has a slightly lower average accuracy. Location B is almost twice as large (30.8 m2) and shows the worst average accuracy compared to the other locations. Thus, our experiment indicates that an increasing room size leads to decreasing accuracy on average. An attacker achieves higher accuracies by monitoring target locations of a small size compared to target locations of larger sizes.
Position of sensors
According to our threat model in Section 3, the attacker controls the target location’s layout. Thus, we assume an attacker who can decide where to install room climate sensors in the target location. We consider how the position of a room climate sensor influences the accuracy of derived information. For occupancy detection, we had the best accuracy with a sensor node that is located in the center point at the ceiling of the target location (Sensors A1, B1, C1). In this position, the sensor has the largest gathering area to measure the climate of the room. Sensors mounted to the walls or on shelves perform differently in our experiments. For occupancy estimation, the best sensors differ per location, i.e., wall-mounted A3 outperforms all others at Location A, wheres B3 (similar to A3) and B1 are almost on par. For activity recognition, the central sensor nodes performed best at Location A and B, but not at Location C.
From the attacker perspective, the best position to deploy a room climate sensor is at the ceiling in the center of the target location. In large rooms, multiple sensors at the ceiling could be installed, each covering a subsection of the room.
Discussion
As our experiments reveal, knowing the temperature and relative humidity of a room allows the detection of the presence of people and to recognize certain activities with a significantly higher probability than guessing. By evaluating temperature and relative humidity curves of the length of 180 seconds, we were able to detect the presence of an occupant in one of our experimental spaces with an accuracy of 93.5% using a single sensor. Occupancy estimation results show no definitive trend as three out of four sensors perform around guessing chance for deciding between one or two persons, yet, the best accuracies range around
Privacy implications
We show that an attacker might be able to infer life and work habits of the occupants from the room climate data. Thus, the attacker is able to distinguish between sitting, standing, and moving, which already might reveal the position and activities of the occupant in the room. Moreover, the attacker can distinguish between upright and sedentary activities, between moving and standing, and between working on the laptop or reading a book.
Given the limited amount of recorded sensor data, the achieved accuracies in occupancy detection and activity recognition give a clear indication that occupants are subject to privacy violations according to the threat model described in Section 3. However, occupancy estimation respectively activity recognition are not straightforward since the achieved accuracies are low respectively differ between the different sensor positions and locations.
On the bright side, it is also reassuring that simply increasing the number of sensors is not as ominous as one might fear since the relative accuracy increase is rather slim. In other words, the most restrictive scenario, i.e., one deployed sensor, is sufficient. Hence, for an attacker, the benefit of deploying and exploiting more than one sensor is at least questionable.
Further experiments are required for a better assessment of the privacy risks induced by the room climate data. Our work provides promising directions for these assessments. For example, we demonstrated the existence of the information leak with the Naïve Bayes classifier. Naïve Bayes is arguably one of the simplest machine learning classifiers. In future work, it would be interesting to explore upper boundaries for the detection of presence/absence, occupancy estimation, and different activities by using more advanced classifiers such as the recently popular deep learning algorithms.
Policy implications
Smart heating is one of the most desirable smart home applications. For example, in a representative smart home survey of German consumers from 2015,
While consumers understand the benefits of smart heating systems, their privacy implications may be difficult to fathom. For example, in a recent representative survey with 461 American adults by Pew Research [41], the participants were presented with a scenario of installing a smart thermostat “in return for sharing data about some of the basic activities that take place in your house like when people are there and when they move from room to room”. Whereas
Methodological details, such as representativeness, breakdown by country and the exact formulation of the questions, are not known about this survey.
According to the results of the above surveys, quite high percentage of consumers might be willing to share their room climate data. On the other hand, our experiments show that this sharing may have serious privacy implications. Moreover, our findings indicate that room climate data measurements should be classified as personal data that can be used to profile the user, as specified in Art. 4 of the General Data Protection Regulation (GDPR) [42]. For example, for single households, room climate data may leak personal data about when the person is at home, and whether this person is in company of some other person at a certain time. As a consequence, the collection and processing of climate data is subject to the GDPR. More precisely, it is our understanding that before climate data can be processed, a Data Protection Impact Assessment (DPIA) needs to be conducted. Citing from Art. 35 of the GDPR: “Where a type of processing in particular using new technologies, and taking into account the nature, scope, context and purposes of the processing, is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall, prior to the processing, carry out an assessment of the impact of the envisaged processing operations on the protection of personal data.” Note that the EU Commission already developed DPIA for similar IoT applications: Smart Metering6
Moreover, according to Art. 25 GDPR, data protection by design and by default should be applied to smart heating systems. Our experiments show that the amount of personal information that can be inferred from room climate data is highly dependent from the granularity of the measurements, and also from the placement of the sensors. In this way, our findings may provide a starting point for data minimization guidelines, as they indicate that privacy in smart heating systems can be protected by adjusting data collection granularity and sensor placement.
An important question is whether it is possible to perform location-independent classification, i.e., to train the classifier with sensor data of one location and then use it to classify sensor data at the target location that is not similar to the training location in size, layout, and sensor positions. If this was possible, the service providers of smart heating applications would be able to detect occupancy and to recognize activities without having access to the target locations.
According to their privacy statements, popular smart thermostats from Nest [37], Ecobee [13], and Honeywell [24] send measured climate data to the service providers’ databases. To evaluate these privacy threats, we used the room climate data of the best-performing sensor of a location as training data set for other locations. For example, to classify events of an arbitrary sensor of Location A, we trained the classifier with room climate data collected by Sensor B1 or Sensor C1. We gained statistically significant results for a few combinations in occupancy detection but the majority of our occupancy detection results was not significant. Since discriminating between one and two persons proved unreliable, occupancy estimation was consequently excluded. For activity recognition, we were not able to gain statistically significant results.
However, the possibility of location-independent attackers cannot be excluded. Absence of significant results in our experiments may be merely due to the limited amount of data. Future studies should be conducted to gather data from various rooms up to a point where the combined results hold for arbitrary locations. Having more data from a multitude of rooms available would help the machine learning classifiers to recognize and ignore data characteristics that are specific to either of the experimental rooms. Consequently, the algorithms could better identify the distinct data characteristics of the different classes in occupancy detection and activity recognition. This would enable location-independent classification of room climate data, in which the training location is not similar to the target location regarding size, layout, furnishing, and positions of the sensors.
Future work
We think that the idea of sharing smart home data for various benefits will continue to be intensively discussed in the future, and therefore, consumers and policy makers should be made aware of the level of detail inferable from smart home data. Which rewards are actually beneficial for consumers? Moreover, which kind of data sharing is ethically permissible? Only by answering these questions it would be possible to design fair policies and establish beneficial personal data markets [47]. In this work, we take the first step towards informing the policy for the smart heating scenario.
We further suggest to investigate countermeasures against the revealed privacy threats. Apart from data minimization techniques mentioned in Section 7.2, such as regulating the frequency of measurements and the placement of the sensors, other privacy-enhancing measures could be considered. Would the addition of noise onto the measured sensor data decrease the information gain significantly? How does the clothing of occupants impact the classification results? Can the influences of human activities on the room climate be recreated by technical means to simulate occupancy of unoccupied spaces? Future work can address these and further conceivable countermeasures as well as their effort, costs, and impact.
Conclusions
We investigated the common belief that data collected by room climate sensors divulge private information about the occupants. To this end, we conducted experiments aiming to reflect realistic conditions, i.e., considering an attacker who has access to typical room climate data (temperature and relative humidity) only. Our experiments revealed that knowing a sequence of temperature and relative humidity measurements already allows the detection of the presence of people and to recognize certain activities with high accuracy. Contrarily, the distinction between the presence of one or two persons is evidently harder, while using data from two different sensors slightly improves occupancy detection, but activity recognition remains a hard task. Nonetheless, our results confirm that the need for protection of room climate data is justified: the leakage of such ‘inconspicuous’ sensor data as temperature and relative humidity can seriously violate privacy in smart spaces. Future work is required to determine the level of privacy invasion in more depth and develop appropriate countermeasures.
Footnotes
Acknowledgments
We would like to thank all anonymous reviewers for their invaluable comments. The work is supported by the German Research Foundation (DFG) under Grant AR 671/3-1: WSNSec – Developing and Applying a Comprehensive Security Framework for Sensor Networks.
