Abstract
In cloning attacks, an adversary captures a sensor node, reprograms it, makes multiple copies, and inserts these copies, into the network. Cloned nodes subvert sensor network processing from within. In a companion paper [2], we show how to detect and remove clones from sensor networks using random key predistribution security measures. Keys that are present on the cloned nodes are detected by using authentication statistics based on key usage frequency. For consistency with existing random key predistribution literature, and ease of explanation, the network in that paper used an Erdos-Renyi topology. In the Erdos-Renyi topology, the probability of connection between any two nodes in the network is uniform. Since the communications ranges of sensor nodes are limited, this topology is flawed. This article applies the clone detection approach from [2] to more realistic network topologies. Grid and ad hoc topologies reflect the node connectivity patterns of networks of nodes with range limits. We provide analytical methods for choosing detection thresholds that accurately detect clones. We use simulations to verify our method. In particular we find the limitations of this approach, such as the number of nodes that can be inserted without being detected.
Keywords
Introduction
Sensor networks deployed in hostile environments are vulnerable to physical attacks; including node cloning. In a cloning attack, an adversary captures a sensor node and copies its state. She reverse engineers the node, reprograms it, makes multiple copies of it, and inserts these copies back into the network. Chan and Perrig [1] explain how clones can falsify sensor data; extract data from the network; and/or stage denial of service attacks. A typical sensor network will contain hundreds or thousands of nodes in a hostile environment. It will not be possible to constantly monitor nodes to detect potential tampering. Therefore it is necessary to develop a security approach for detecting and removing cloned nodes.
In [2] we present a statistical approach for detecting cloning attacks in networks whose security guarantees are based on random key predistribution. In random key predistribution, first introduced in [3], before deployment each node is equipped with a key-ring of k keys chosen at random from the same large pool of P keys. Given k and P, there is a well defined probability that any two nodes will share at least one cryptographic key and use symmetric key cryptography to communicate. After deployment, nodes find their common keys and can communicate securely. A full discussion of random key predistribution, cloning issues and motivation for this work is in [2] and references therein.
Each node uses its keys as an authentication token. We analyze statistics on how often keys are used for authentication to determine which keys are being used by cloned nodes. Note that node authentication occurs only during network initialization. In this paper key usage refers only to the node authentication process, it does not refer to the number of times a key is used to encrypt or decrypt individual packets in the sensor network.
Each node constructs a Bloom filter [4] of the keys it has in common with its neighbors (i.e., the neighbor's authentication credentials). This Bloom filter is securely transmitted to a server that records key usage data. The use of Bloom filters significantly reduces the traffic volume and provides greater security since keys are not sent over the network. The server creates a key usage histogram from the Bloom filters. An extension of this approach that does not require a centralized collection point is in [5]. We do not present that approach, since it contains complications that are not relevant to determining the clone detection thresholds discussed here. In [2], we derive the probability density function for key usage, and show how the false positive rate α determines the key usage threshold. It also determines at what point the network fractures and becomes unusable.
Keys whose use exceeds the threshold value are thought to be used by clones and are ostracized from the network. The key detection protocol succeeds in removing all cloned keys from the network when a high false positive rate α is allowed. In [2], we show that using large values for α does not fracture networks with Erdos-Renyi topologies. The security analysis in [2] shows how our protocol tolerates many common attacks.
Unfortunately the Erdos-Renyi model is not realistic for wireless sensor networks. It posits that a connection between any two nodes is equally likely, which does not match wireless communication with limited communications ranges. The Erdos-Renyi model is used in the literature [3, 6] because of its simplicity and because it has been widely studied in Mathematics literature [7].
This paper adapts the approach from [2] to grid and ad hoc network topologies [8] that better reflects the range constraints of wireless sensor networks. Grid networks are highly structured; each node can communicate only with a fixed number of immediate neighbors. Ad hoc networks are less structured; nodes are placed at random. They can only communicate with the other nodes within communication range r. Each node's neighborhood size is defined by a random process.
The remainder of this paper is organized as follows. Section 2 discusses the network models under consideration. Section 3 presents the clone detection protocol. Section 4 analyzes the proposed method and explains how we determine a threshold value for the number of times a key may be used. Simulation results verifying our analysis are presented in Section 5. Section 6 gives conclusions and possible future extensions to our research.
Sensor Network Topologies
This section presents two sensor network topologies, which we will use to analyze clone detection for realistic sensor networks using random key predistribution security.
Grid Network Topology
Grid network topologies are regular tessellations of the sensor field. The use of this topology for analyzing sensor networks originates in [8]. When users control node placement, nodes are often distributed in a regular pattern. Grid topology analysis relies on results from percolation theory [9].
In this paper we analyze a two-dimensional rectangular grid. The same approach can easily be adapted to other tessellations, like those in [9]. Every node in the network (ignoring edge effects) has exactly four nodes in its neighborhood. Communications between a node and its neighbors occur only when they share a common key.
Figure 1a shows a 2-dimensional grid of nodes in a 100 by 100 meter square region. The 100 sensor nodes are placed at regular intervals with 10 meters between neighboring nodes. Figure 1b shows the communications links that are established between nodes that use a random key predistribution scheme.

(a) A 2-dimensional grid of nodes and (b) communications links where neighbors share a key.
In ad hoc networks, there is no control over node placement. A model of their topologies was proposed by [10]: the sensor network has N nodes. These N nodes are uniformly randomly distributed in a square region (i.e., the x and y coordinates are random variables in the range [0‥max].) They can communicate only with nodes that are within their communications range r.
[10]'s ad hoc model assumes that two nodes within range of each other communicate with probability 1. We allow nodes within range of each other to communicate with probability p
c
[11]. For modeling random key predistribution (with key pool size P and k keys in the keyring), p
c
is the probability that two nodes share a key [3]:
Figure 2a shows an example distribution of nodes in a square region. Again, 100 sensor nodes are placed in a 100 by 100 meter region. Figure 1b shows the communication links enabled by random key predistribution.

(a) Random distribution of sensor nodes and (b) its associated ad hoc network.
Unlike the Erdos-Renyi topology, nodes can not communicate if they are separated by a distance greater than the radio range r. Unlike grid networks, little is assumed about node placement. The ad hoc model better captures the topology of networks deployed in hostile environments.
As shown in [2], cloning attacks can be detected by analyzing key usage statistics. We use the fact that each node randomly selects k keys from the pool without replacement to predict the distribution of the number of nodes possessing a given key. The distribution of the number of times a key is used is a direct consequence of the number of nodes possessing the key.
When clones are inserted into the network this key usage distribution is changed, since the cloned keys are present on a greater number of nodes than normal. The keys are therefore used more frequently than keys that have not been cloned. By collecting key usage statistics we can determine which keys have been cloned. Note that we consider only the use of keys to create a communications link between two nodes during network initialization.
The sensor network has a pool of P keys for a set of N authorized nodes. Each of the N nodes has a key ring of k keys taken randomly from the pool. C cloned nodes are added to the network. The total number of nodes is T = N + C. A base station B is outside the network with a copy of the entire key pool. Every node has B's public key that is used to communicate securely with the base station. B collects key usage statistics from the network and detects the keys that are used by cloned nodes.
We use counting Bloom filters to collect key usage statistics. Each node makes a counting Bloom filter of the keys it uses to communicate with neighboring nodes. It appends a random number (nonce) to the Bloom filter and encrypts the result using B's public key. This encrypted data structure is forwarded to B. B decrypts the Bloom filters it receives. The nonces are used to foil replay attacks; messages with duplicate nonces are discarded. B performs membership queries on the counting Bloom filters it receives and calculates how many times each of the keys are used in the network.
Keys used too often are considered cloned. B makes a Bloom filter of the cloned keys. For authentication purposes, the filter is encrypted using B's secret key. The Bloom filter is broadcast through the sensor network using a gossip protocol [12]. Each node checks the keys in its key ring for membership in the Bloom filter of cloned keys. Cloned keys are removed from the local key ring and all connections using compromised keys are terminated. More details are in [2].
Threshold Detection
A sensor network has N nodes active in a region. The key pool contains P keys. Each node has a key ring of k keys chosen at random from the pool. Also let n denote the neighborhood size, i.e., the maximum number of nodes that a given node can connect to. In [2] we derived the following theorems for the mean and variance of the number of times a key is used for making connections in the network,
Theorem
The expected number of times a key is used for making connections in the network is:
with variance:
(Note that M
j
is not canceled in (2), since it is required to compute the variance in (3).) Where M
j
is the probability a key is on exactly j nodes, and N
j
is the expected number of times a key on exactly j nodes is used. If P
d
(j) the probability a key is on exactly j nodes and C
j
denotes the number of times that key, k
j
on j nodes is used for making connections in the network, then
In the case of Erdos-Renyi networks, C
j
(the maximum number of times that key, k
j
on j nodes can be used for making connections in the network) is given by
Corollary 1. The maximum number of times that key k
j
on j nodes is used for making connections in the grid network, provided that no two of those j nodes have any other key in common is:
Proof. The number of times a key is used by a node for making connections is dependent on the number of nodes in the neighborhood with that key. For a key k
j
is located on exactly j nodes in the sensor network and a node with k
j
, the probability p
i
that the key is on i other nodes in the neighborhood is:
The expected number of times a key that is on exactly i other nodes in the neighborhood is used to make connections is:
Since the neighborhood size is n, i can vary from 1 to n; making the expected number of times that the key is used by the node for making connections in it's neighborhood:
Because key k
j
is on exactly j other nodes, the expected number of times it is used to make connections is:
For the grid network n = 4, making the number of times that a key on j nodes can be used to create connections in the network, provided that no two of those j nodes have any other key in common is:
Figure 3 shows simulation results and predicted values for mean key usage and variance as network size increases for grid networks.

Mean number of times a key is used to create communications links in a grid sensor network: simulation (a) and analytical prediction (b). The associated variance: simulation (c) and analytical prediction (d). For all the graphs, P = 1000, k = 50, number of nodes is on the x axis, 35 repetitions, and 95% confidence interval.
Corollary 2. Maximum number of times a key, k
j
on j nodes can be used to initialize communications links for an ad hoc network provided that no two of those j nodes have any other key in common is:
Where
Proof. The expected number of times that the key k
j
(on exactly j nodes) is used to initialize connections is given by equation (9). For an ad hoc network, there are N sensor nodes with communications range r randomly deployed in a square region of area A. For a node has range r, the area covered by its neighborhood is π.r2. Since A is the total area, the probability any node lies within the range of a given node is:
Since all nodes have the same probability pr of landing in the neighborhood, the number of nodes in the neighborhood will follow a binomial distribution. The probability that any node has i nodes in its neighborhood becomes:
The expected degree of a node in the network is therefore (where pc is given by (1)):
In practice, nodes at a distance of r or less from the edge of the surveillance domain have a smaller neighborhood. Since the effective area where nodes can be placed is less than π.r 2 . These edge effects are a simulation artifact. We do not consider them in our analytical estimations. Luckily, they do not lead to any appreciable divergence between our analytical predictions, and the simulation verification. Figure 4 shows simulation results and predicted values for the mean and variance of (11) as network size increases.

Mean number of times a key is used to create communications links in an ad hoc sensor network: simulation (a) and analytical prediction (b). The associated variance: simulation (c) and analytical prediction (d). As with Fig. 3, P = 1000, k = 50, x axis is the number of nodes, 35 repetitions, and 95% confidence intervals.
We determine whether or not a key is cloned by comparing U
i
(the number of times key i is used to establish connections indicated by the Bloom filter data) to μ
k
(from Equations (4), (5), and (11)). If U
i
is significantly higher than μ
k
, the key is probably being used by cloned nodes. A key is thought to be cloned if the key fits either
a normal curve or the distribution from Theorem with likelihood less than false positive rate α.
The value U i which matches α becomes the threshold T. Keys used more than T times are marked as clones. A detailed discussion of hypothesis testing for threshold determination is in [2].
For an ad hoc sensor network of 900 nodes, key pool size P 1000, key ring size k 50, and range r 10 m in a 100 m by 100 m region, Table 1 gives threshold values for various settings of α obtained analytically and using simulations. Normal distribution threshold values are obtained using statistical tables. Theorem 2 values are based on setting the α P most used keys to being clones. Simulation values are the mean of 35 repetitions. Note the similarity of the results.
Key Use Thresholds for Ad Hoc Networks
Figure 5 compares analytical predictions of the number of keys (out of 1000 keys) detected by the algorithm as clones for different values of false positive rate α and different number of clones inserted in to the system for ad hoc networks. Figure 6 includes both true positives and false positives.

Number of keys detected as cloned in an ad hoc network as the number of clones inserted into the network varies.
Table 2 gives threshold key use values for varying settings of the accepted false positive rate α, for a grid sensor network of 1600 nodes with key pool size P 1000 and key ring size k 50. The threshold values are obtained using the same prediction and simulation approaches as for ad hoc networks.
Figure 6 gives analytical predictions of the number of keys (out of 1000 keys) detected by the algorithm as clones for different values of false positive rate α and different number of clones inserted into the system for grid networks. Figure 6 includes both true positives and false positives.

Number of keys detected as cloned as the number of clones inserted into the grid network varies.
Key Use Thresholds for Grid Networks
This section gives simulation results to verify our clone detection approach. We present results for both grid and ad hoc network topologies. The sensor networks have a key pool size (P) 0f 1000 keys. Each node has 50 keys in its key ring (k). The ad hoc network has 900 nodes. The grid network has 1600 nodes.
Figure 7 plots the maximum component size of the network versus the false positive rate α for both grid and ad hoc networks. The figure shows an inflection point for component size when the value of α is somewhere between 0.3 and 0.5. In ad hoc networks, the inflection point occurs when α is approximately 0.8. False positive rates above those values will fracture the network and disable the sensor network. The prediction model in [2], predicts that the phase change in the ad hoc network occurs at α = 0.62. Also a known result from percolation theory is that the inflection point for grid networks of 4 neighbors occurs when the probability of connection between any two nodes is 0.5. The corresponding value of α for grid networks is 0.47. These predictions are consistent with our simulation results.

Maximum component size as the false positive rate α varies for (a) grid and (b) ad hoc topologies.
Figure 8 shows simulation results for the number of keys (out of 1000 keys) that are detected as clones for different values of α nd different number of clones inserted in to the system. The values presented include both true and false positives.

Number of keys detected as cloned as the number of clones inserted into the network varies. for (a) ad hoc and (b) grid networks
Figure 9 shows the number of cloned keys detected for different α settings and different numbers of clones for both ad hoc and grid networks. These values do not include false positives. The keys that are detected are the keys that are most highly used in the network. As can be seen from the figures, larger values of α result in a larger number of cloned keys being detected, but this comes at the cost of a higher number of legitimate keys being falsely detected as clones. This is an acceptable tradeoff, since this result in very few legitimate nodes being rejected from the network.

Number of cloned keys detected versus α. The key ring size k is 50. (a) Grid Networks and (b) ad hoc networks.
Figures 10 and 11 present the results of running our clone detection protocol for different values of the false positive rate α and for a different number of cloned nodes in the network for grid and ad hoc networks respectively. A value of α = 0.48 for Grid networks results in most cloned nodes being removed from the network. The value of α should be slightly less than 0.48, since 0.47 is the critical point where the network fractures.

Number of cloned nodes still connected to the grid network after running the protocol as α varies.

Number of cloned nodes still connected to the ad hoc network after running the protocol as α varies.
The protocol is more effective for the ad hoc topology, when α = 0.60 all clones are eliminated when 90 or more clones are introduced. Even when a smaller number of clones are inserted in to the system, very few clones remain connected to the network. The clones' ability to cause harm is severely restricted.
This paper focuses on sensor network cloning attacks where an adversary captures a sensor node, reprograms it, and adds multiple modified copies of the node to the network. We examine the feasibility of our clone detection protocol, first presented in [2] for ad hoc and grid network topologies. These topologies better represent realistic sensor network implementations, since they address communications range limitations.
We detect cloned keys by looking at how often they are used to establish connections between nodes; this is proportional to the number of nodes that possess the key. Inserting clones into the network skews this distribution. Cloned keys are present on more nodes and are used more often than keys that have not been cloned. The number of cloned keys detected increases with the false positive rate α. We quantify the maximum value of α, where the network will fracture for both grid and ad hoc networks. This is done using the method in [13].
Our ability to detect and remove clones increases with the number of clones in the network. Our approach does not require any cooperation from the cloned nodes. In [2], we show that the best strategy for the clones is to ignore the protocol. Active tampering with the protocol can only increase the likelihood that the clones will be detected.
We show how to quantify the number of clones that will not be detected for various number of clones inserted into the network. This is useful, in that it indicates a number of clones that may be present even if this protocol is used. Sensor network applications and security measures should be designed in a manner that can tolerate that many malicious participants.
One drawback of the presentation here and in [2] is the use of a centralized authentication mechanism. Centralized authentication using a base station is undesirable. It introduces a single point of failure into the network and can become a significant system bottleneck. It also violates random key predistribution's support of network self-organization, which is a very appealing aspect of the approach. Using multiple key servers in [5] eliminates the need for having a centralized authentication mechanism. That approach is orthogonal to the issues addressed in this paper.
