Abstract
Industrial revolutions and demand of novel applications drive the development of sensors which offer continuous monitoring of remote hostile areas by collecting accurate measurement of physical phenomena. Data aggregation is considered as one of the significant energy-saving mechanism of resource constraint Wireless Sensor Networks (WSNs) which reduces bandwidth consumption by eliminating redundant data. Novel applications demand WSN to provide information about the monitoring region in multiple aspects in large scale. To meet this requirement, different kinds of sensors of different parameters are deployed in the same region which in turn demands the aggregator node to integrate diverse data in a smooth and secure manner. Novelty in applications also requires Base station (BS) to apply multiple statistical functions. Hence, we propose to develop a novel secure cost-efficient data aggregation scheme based on asymmetric privacy homomorphism to aggregate data of multiple parameters and facilitate the BS to compute multiple functions in one round of data collection by providing elaborated view of monitoring region. To meet the claim of large scale WSN which requires dynamic change in size, vector-based data collection method is adopted in our proposed scheme. The security aspect is strengthened by allowing BS to verify the authenticity of source node and validity of data received. The performance of the system is analyzed in terms of computation and communication overhead using the mathematical model and simulation results.
Introduction
Wireless Sensor Network (WSN) is a network of miniature sensor devices mostly deployed in remote hostile areas to spy over the environment by measuring physical parameters such as temperature, pressure, sound, vibration, humidity level etc. The sensor devices are mainly equipped with sensing and processing units to capture the changes in the environment and produce measurable output about the parameter being monitored and it is communicated in collaboration with other connected nodes to the base station for analysis. Moreover, the monitoring process of sensor nodes can be configured either to be continuous or event driven. This enormous potential makes WSNs popular. It is increasingly gaining impact in our routine life and becoming ubiquitous. WSNs are successfully applied in widespread applications in various domains such as military, healthcare, agriculture, weather monitoring, industrial monitoring and automation etc. But the miniature characteristics left the sensor nodes with less computing power, less storage and battery power supply which make them suffer from resource starvation.
For the WSNs deployed in remote hostile area, it is very difficult to replace or recharge the drained battery. Hence increasing the life span of power constraint WSNs is one of the major problems. Research in [19, 20] shows that data transmission is the more energy consuming factor than computation. Hence Data aggregation (DA) becomes an important primitive [21] which helps to minimize the data transmission by eliminating redundancy. This is normally carried out by an intermediate node called aggregator which integrates the multiple data entities coming from sensor nodes in to a single entity and reduces the overall hops required for the data to reach the destination sink.
Because of the hostile and inaccessible nature of deployed region, the data being communicated in WSNs are prone to attack. The intruders can affect the confidentially and integrity of the data by illegally taping or making unauthorized changes in the data. Hence secure data communication is another important issue of WSNs. One of the popular mechanisms to achieve confidentiality is converting the data into non understandable format using encryption mechanism. To achieve bandwidth reduction with confidentiality, Data aggregation schemes are implemented either by hop-by-hop encryption or by end-to-end encryption [9]. In hop-by-hop based encryption schemes [22], the aggregator decrypts the cipher text and applies integration function over the plain text and then encrypts the aggregated data. This makes the scheme more vulnerable to attack when the aggregator node is compromised. This problem is addressed and end to end confidentiality is guaranteed by employing privacy preserving homomorphism (PH) based encryption mechanism which allows aggregator to apply aggregation function over the cipher text instead of plain text. Based on the kind of keys employed these PH based encryption schemes are classified in to Symmetric PH schemes and Asymmetric PH schemes.
In the region where the sensor nodes are deployed in dense and enrolled to measure different physical parameters of the same region, data aggregation has to be implemented parameter wise. This increases the bandwidth consumption. It can be minimized by aggregating data of different parameters together as single data unit as shown in Fig. 2 by reducing the overall hops needed for data communication. In Fig. 1, the data are aggregated parameter wise. Hence three aggregated data units are generated by aggregator with the total hops of 9. These 9 hops are reduced to 7 in Fig. 2 by using multi parameter aggregation. But the key challenge of multi parameter aggregation is to mix the data of multi parameters efficiently in such a way that the BS should be able to extract parameter wise data smoothly from the aggregate. In [1], Boneh’s et al proposed an asymmetric PH based encryption scheme with the property of bilinear map. This is extended in [2] referred to as CDAMA, by assigning separate key for each application and utilizing additive homomorphism of [1] to aggregate cipher text of different applications. The same concept is also applied in [4] referred to as IPHCDA, but for aggregating data from multiple regions by assigning different keys for each region. In both [2, 4], the BS is enabled to extract application wise and region wise data from the aggregate. The BS can get only the application specific or region wise summation of sensed values. No other statistical functions can be applied by the BS over the extracted data.

Single parameter aggregation.

Multi parameter aggregation.
In [5] referred to as ASAHS, the authors proposed a secure imprecise interval-based data aggregation scheme which allows BS to compute many statistical functions like minimum, maximum, average, etc by collecting number of occurrences based on value range. Another scheme proposed by P.Zhang et al [23] is based on the concept of collecting number of occurrences of specific value but not the range as in [5] and supports the BS to perform multi functions over the extracted data.
In this paper, we propose to develop a scheme called Integrity Assured Multi Functional Multi Application Secure Data Aggregation (IAMFMA-SDA) based on additive asymmetric homomorphic encryption for WSNs deployed in cluster topology. The scheme enables secure aggregation of multiple parameters (like temperature, humidity, pressure and etc.) measured from the same deployed region and extraction of parameter wise elaborated view from the aggregate to enable computation of multiple queries in one round of data collection. The main contributions of this paper are listed as follows. First, we give the system an ability to achieve end to end confidentiality by adapting public key based homomorphic encryption scheme. Also, we enable the cluster head to perform additive aggregation of the cipher text. The information about the plain text is semantically secured by keeping the probability of deciding correct plain text from the cipher text less than half. Also, the value of actual sensed value is hidden from the intruder as well as from the aggregator by assigning different benchmark for each physical parameter which is known only to the sensor node and BS. Second, we design the scheme in such a way that it is suitable for multi parameter - WSNs. The sensor nodes with the ability of sensing different physical parameters can be deployed in the same region and enjoy the benefit of data aggregation. Also, the scheme includes the flexibility of fixing different value range for each parameter and the occurrences of all parameters whose value fall within these specified range are securely aggregated Third, we enable the BS to obtain parameter wise comprehensive view and compute any statistical functions over the parameter specific data sent by sensor nodes in one round of data collection. Example of these functions includes average, minimum maximum, etc. Fourth, we enhance the security of our scheme by ensuring integrity of the data and authenticity of source nodes. Integrity is verified by two mechanisms. One by value-based hash function and another one by comparing active participating sensor nodes with the total occurrence count. Authenticity of source node is ensured both at CH and at BS. Then we minimize the communication cost between the sensor node and cluster head by keeping the vector size as two and security is ensured by introducing randomness in the selection of vector elements pairs. Finally, we present the performance analysis using simulated data set.
The remainder of this paper is organized as follows. Section 2 discusses the related work. Section 3 provides the system model and security model of the proposed scheme. Section 4 explains the preliminaries. Our IAMFMA-SDA scheme is described in Section 5. Security analysis is presented in Section 6. Section 7 gives the performance and comparative analysis of IAMFMA-SDA. Finally we conclude our IAMFMA-SDA in Section 8.
The authors in [5] proposed a scalable scheme for aggregating data of single physical parameter and ensured security through asymmetric PH scheme [13] which is probabilistic and additive homomorphic. The security is strengthened by storing two public key parameters in encrypted form generated using the node ID in each node. This also helps the BS to verify the consistency of the received data. It uses an array of cipher texts whose index represents an interval of values, and cipher text carries the number of occurrences of sensed values which falls within this interval. Instead of sending the actual sensed value each node sends this array by incrementing one of index to which the sensed value belongs to. Because of this concept, the scheme is able to achieve scalability and allows the BS to apply some set of mathematical functions over the data. Concepts of interval-based data collection makes this scheme suffers from loss of accuracy. Also, it requires the BS to keep track of active nodes to verify the integrity of the data. In [3] referred to as RCDA, data aggregation of single physical parameter scheme using EC-EG homomorphic encryption is proposed. Here aggregation is done by concatenating the cipher texts. This enables the BS to recover all sensed data and apply multiple statistical functions over the data. Also, it verifies the collective integrity of the received data by attaching a signature generated using [24] which supports aggregation of signature. But false data due to replay attack cannot be captured by this integrity verification. This is addressed in our scheme by adding value-based hash as well as by tracing the active nodes participating in data gathering. Also, aggregation by concatenation limits the message space by cluster size. To address the problem collective integrity verification which results in the rejection of whole received data slice based discrete integrity verification is proposed in [33]. In [34], the secure data aggregation scheme for the WSN using the hybrid topology is proposed. The deployed region is split into equal parts where all the nodes are structured into stars. Tree topology is established by attaching a parent with each node. Security is enforced using symmetric encryption. The authors of [4] proposed a data aggregation scheme which safely aggregates data from multiple regions and enable BS to extract region wise data from the aggregate. By assigning different parameters to different regions multi parameter aggregation can be achieved. Concealed data aggregation and multi parameter support is achieved by adapting the APH scheme which is the extension of [1]. But as discussed in Section 1, BS is not allowed to apply different query. It supports integrity verification only at region level. But impurity due to replay attack and active attack within a cluster cannot be verified at BS. In [6], the authors proposed a scheme referred to as SASPKE for secure aggregation of single parameter by employing StPKE [28] and homomorphic encryption scheme [29]. It enables the BS to apply many functions over the received data by applying encoding mechanism as in RCDA [3] and data integrity is verified by using hash based message authentication code. The same is also used for verifying the authentication of the node. The authors in [7] referred to as MRCDA, proposed an end-to-end secured data aggregation scheme using EC-EG encryption and integrity of the data is verified at CH as well as at BS by employing two different MACs(Message Authentication Code). One is generated using pair wise symmetric key and another one is by using homomorphism. Fully homomorphic encryption based aggregation scheme proposed in [30] referred to as FESA, ensures end to end confidentiality and allows the detection of false data en route by verifying the integrity during aggregation as well as forwarding towards BS. In [31] referred to as
In real time scenario, it is required to collect information about multiple physical parameters from the monitoring region and apply more statistical functions at the base station. Also, the nature of these deployed regions invites more security attacks and makes it necessary to ensure confidentiality and integrity of the received data. It is important to differentiate valid users from adversaries. Moreover in some real time applications, it becomes necessary to extend the size of the network dynamically after deployment. By considering all these requirements we propose to develop a cost efficient secure data aggregation scheme which can aggregate multiple parameters with an ability of ensuring confidentiality, integrity, scalability and support of multi functionality.
System model and security model
The major requirements of WSN are how efficiently and securely the data can be gathered and transmitted to Base station(BS). In our system for efficient transmission data aggregation of multiple parameters is considered. In WSN, it is important to ensure the confidentiality and integrity of the data. Hence our system focuses on privacy and integrity protection of data from multiple parameters. In this section we present the system model and security model.
System model
In our system, WSN is modeled as a group of clusters organized as tree (Fig. 3). Each cluster comprises of sensors and cluster head (CH). Senor node with the capability of high computation and communication range is designated as cluster head. The CH broadcast Build_Cluster message to all the nearby members. The members compute the distance with all the CH from which Build_Cluster message is received and send the Cluster_Join message to the CH which is close to it. Based on the number of Cluster_Join message received, the CH decides and give Build_Accept message to the cluster member and form the cluster. In our system, the CH shares unique symmetric key with each cluster member for secure communication. After receiving data from all the sensors, CH performs aggregation and forwards it to BS. The sensors may be deputed to measure different physical parameters. Each sensor is deployed with public key, encryption algorithm and parameter specific details known only to BS.

WSN for multiple parameters.
As most of the WSN applications are targeting remote hostile areas, they are prone to attack which mainly affects the confidentiality and integrity of the data. By considering this and the ability of adversary, we define the attack model as follows.
The ability of our proposed scheme to defend against the above-mentioned attacks are discussed in Section 6.
Preliminaries
Privacy homomorphic (PH) encryption schemes
Homomorphic encryption is the form of encryption with the capability of applying computation over the encrypted data. Effect of this computation is same as that of performing the same on plain text. This ensures privacy while outsourcing data and does analysis without affecting the confidentiality.
Because of homomorphic property the operation ° on the plain text is mapped with the operation on cipher text. This operation may be additive or multiplicative. The homomorphic property allows the intermediate node sitting between source and destination can perform aggregation of ciphered text without obtaining the plain text through decryption.In turn end to end confidentially is maintained. The key k used for encryption (Enc k ) is either symmetric or public key. The symmetric key based PH schemes such as Domingo-Ferrer scheme [10] and Castelluccia (CMT) et all’s scheme [11] have less computation and communication overhead. But the requirement of sharing key information between the sender and receiver makes these schemes comparatively less secure than asymmetric PH schemes. The Elliptic curve cryptography (ECC) which enjoys the same level of security as that of RSA cryptosystem with trapdoor function and smaller key size at high speed makes ECC based asymmetric PH schemes more popular. The security level of 256 bit ECC based cryptosystem is same that of 3072 bit RSA system [12]. The hardness of the ECC based PH schemes lies on Elliptic Curve Discrete Log Problem which is defined as finding the multiplicand is computationally difficult when the product and multiplier is given. Paillier in [14] proposed three probabilistic elliptic curve based crypto systems. The first cryptosystem called Elliptic curve Naccahe Stern (n = pq) is semantically se and its security is based on the intractable residuality problem. This is the variant of [15] and is defined over the elliptic curve ɛ n where n = pq and p and q are chosen as congruent to 2 mod 3. The second cryptosystem is called Elliptic curve Okamoto-Uchiyama (n = p2q). This is the extension of [13]. The third cryptosystem is designed by extending the previous two schemes. All these three schemes are probabilistic and based on recovering discrete logarithms othree different types of curves. Also, their defending power depends on residuosity problem. Mykletun et al [16] proposed an Elliptic Curve- ElGamal Cryptosystem by converting the cipher system proposed in [17] into additive group. Boneh’s et al. [1] proposed another elliptic curve based cryptosystem that allows computation over the data encrypted using different keys and this is adapted in our proposed scheme.
Boneh’s encryption scheme
Here we discuss the encryption scheme proposed by Boneh’s et al. in [1] to evaluate the disjunctive normal form of 2 literals given the cipher form of n variables. This is an asymmetric PH scheme with the property of doubly homomorpohic. That is the encryption scheme supports homomorphism of both addition and multiplication over the cipher text. It resembles the encryption schemes proposed in [18] and [13]. The additive homomorphism is obtained from [18]. By considering the expensive nature of multiplicative homomorphism, [2] and [4] have taken advantage of only additive homomorphic property of [1]. Both [2, 4] utilized this property of [1] which enables computation over the data encrypted using different keys. The same modification is adapted in our proposed scheme to support aggregation of data of multiple parameters as in [2].
Boneh’s et al. [1] described three algorithms for generating keys, cipher text and decryption.
Here the base b = g
p
1
and the discrete log can be solved in with
Proposed scheme IMFMA-SDA
In this section, we present our scheme for secure aggregation of multiple parameters which enables BS to extract parameter wise data and apply various statistical queries over the gathered data. Then we show how the BS can verify authentication of participating nodes and integrity of the received data. The major phases of our proposed scheme are setup phase, formation of secured vector, aggregation phase and operations at BS.
Setup
During setup procedure the wireless sensor network is organized as hierarchical clusters of nodes and encryption scheme is initialized. Our scheme assumes encryption based on asymmetric PH proposed by Boneh et al [1] and the updated version of the same as in [2].The BS calls an algorithm namely KG(τ) using the security parameter τ. KG generates separate public key for each parameter and private key. These parameter specific public keys are stored in sensor nodes before deployment. The number of public keys per sensor depends upon the number of parameters the sensor node is deputed for. Private key is kept secret in the base station for decrypting parameter specific data. The details of KG(τ) are given below. Along with the encryption details, in each sensor node the BS stores node ID, parameter specific benchmark (βa), range factor
Formation of secured vector
The operation in each sensor node to form the secured vector is designed in such a way that it enables the BS to attain different statistics of each parameter using one query. Each sensor node x
ia
creates a vector of encrypted values using the sensed raw data
When the sensed raw data falls outside the expected range
Aggregation
The CH of each cluster aggregates the secured vector elements received from all its cluster members and forward this vector aggregate to next CH enroute to the BS. The data in the vector ipositioned using the difference between the sensed value and benchmark. Hence vector elements coming from different nodes of various parameters can be aggregated based on their index position. By leveraging the additive homomorphism of encryption scheme [1, 2], the aggregator can sum up the encrypted values at each index and createaaggregated vector. The aggregated value in each index represents the number of sensors whose measured value has same positional difference in the scale with respect to benchmark independent of parameters.
Consider cluster ‘c’ of size ‘m’. The aggregator vector aggv
c
of cluster ‘c’ is created using the following steps For each vector index k ∈ { aggPos } do aggv
ck
= ∑vik where i ∈ { 1, 2, . . m } andk = D (Posi1) or D (Posi2)
The size of aggv c of cluster ‘c’ depends upon the number of positions in aggPos. This aggv c is forwarded to BS along with encrypted value of all the position in aggPos and aggHash. As already mentioned, the positions are communicated between nodes and CH and between CH and BS using any suitable light weight symmetric encryption scheme.
Operations at base station
The value of different applications are mixed by using points for eacapplication with different order [1] to extract the scalar of one parameter point from the aggregated cipher text. The aggregated vue in each index represents the number of sensors whose measured value has same positional difference in the scale with respect to benchmark independent of parameters. The ciphered aggregated value in aggregated vector at index ‘ is given as follows
From aggv
i
, (∑M
k
) can be extracted by finding discrete logarithm of
The structure of the Sum Matrix created using the algorithm 3 is given below. Column index is from
Integrity verification of aggregated data
The adversary enroute may manipulate the data and falsify the vector btting either between sensor and CH or CH tand CH or CH and BS. Hence, we intend to verify the integrity of the final extracted information from the vector aggregate and prevent BS from accepting any falsified data, as part of the security.
Whenever any sensor is participating in data gathering, the sensor generates a value by applying one way hash function over βa + k. These hashes are aggregated at CH as well as at BS. The BS computes new hash using Algorithm 4 with the help of MatCsum and benchmark of each parameter. Integrity of the extracted information is verified by comparing the new computed hash aggregate with sum of received hash aggregate. If both are same, verification test is succeeded. Step 5 in algorithm 4 updates the new hash aggregation by applying hash function
Authentication of nodes
In our IAMFMA-SDA the authentication is ensured in two levels. In the first level the authentication is ensured using symmetric key based transmission in all single hop communication between nodes inside the cluster and between CH. Because, in symmetric key based communication nodes without valid symmetric key cannot participate in the message transmission. In the second level, the BS verifies the identification of all participating nodes in collective manner using algorithm 4 with the help of value and ID based hash value. This ID based hash value allows only the nodes holding the valid ID to do the communication.
A descriptive example
This section provides an example to demonstrate the capability of our scheme to aggregate data of multiple parameters from different sensors in such a way that from the aggregated vector the BS can get the detailed view of all the occurrences of different parameter values within a specified value range. Also allows the BS to verify the integrity of the received details. We have taken the instance of WSN as shown in Fig. 4. The WSN has 12 sensor nodes grouped into three clusters. The cluster may include sensors of three different parameters. Sensors A,D,G,F belongs to parameter 1, B,H,J,L belongs to parameter 2 whereas C,E,I,K belongs to parameter 3.

Demonstrative example.
Sensors E, F, K act as aggregator for each cluster respectively. F forwards its aggregated vector to E.E, K to BS. The public key of parameter 1 is PB1 = (n, E, Q1, Q4, 4) and for parameter 2 is PB2 = (n, E, Q2, Q4, 4) and for parameter 3 is PB3 = (n, E, Q3, Q4, 4). To make the example simple, we assume small values to the order of Q1, Q2, Q3, Q4. That is order (Q1) = p1 = 11, order (Q2) = p2 = 13, order (Q3) = p3 = 17, order (Q4) = p4 = 19 and n = p1p2p3p4 = 46189. We set the benchmark of each parameter as β1 = 75, β2 = 63, β3 = 30. The range factor is set as g = 2 which is common for all parameters. The vectors coming out of each sensor is given in the Table 1. After receiving the vector aggregate from cluster head E and K, the BS creates MatC1 and MatC2 respectively. To show the extraction of parameter wise data, we have taken the value 0Q1 + 1Q2 + 2Q3 + 29763Q4 in AggC3 at index 0 as sample. To extract number of occurrence of parameter 3, compute
Vectors and Matrix computation
Because 19
In this section, we present the security analysis of our scheme and show its resistance against various attacks.
End to end confidentiality
The proposed scheme ensures the end to end confidentiality of the data by preventing the intruder from obtaining the actual measured value. The term end to end confidentiality actually implies that privacy is guaranteed along the path connecting the source node and BS which includes the hops between node and CH & between CH and BS
Confidentiality between node and CH
The data being transmitted by the node is not the actual measured value. It is actually the Boolean representation of occurrence of value that is either zero or one. Confidentiality of this Boolean value is ensured by enciphering it using elliptic curve cryptography based cipher system. The hardness of this cryptography scheme is based on the factorization of group order n. Let n=P1P2 where P1, P2 are large primes. The hardness says that given (n, E,
To prove that our encryption scheme is semantically secure, here we consider the scenario that the node is sending cipher text of either m0 or m1 and the computationally bounded aeary who is aware of this not able to identify that cipher text is whether the encryption of m0 or the encryption m1 wh the probability better than half. That is the cipher text m0 is indistinguishable from the cipher text of m1. The data atted fm the node to CH includes two cipher texts v
ik
1
, v
ik
2
and v
ihash
, Posi1, Posi2. Both v
ik
1
, v
ik
2
may contain either the encryption of m0 or m1. Here m0, m1 refers to zero and one. Probability that the inside attacker correctly identifies what has been encrypted in v
ik
1
, v
ik
2
is upper bounded by half plus negligible. This probability is over the randomness of index of vector element which is chosen to hide the position of vector element holding the encryption of 1. Hence
Here, the ‘n’ refers to the security parameter. The running time of computationally bounded adversary is upper bounded by polynomial function of this security parameter ‘n’. This makes our encryption process semantically secure in the Cipher text Only Attack model and Chosen Plain text Attack model.
Confidentiality between CH and BS
All the security mechanism discussed in Section 6.1.1 is also applicable for the communication taking place between cluster head and base station. The way the data related to multiple parameters are mixed smoothly and communicated to BS from CH strengthen the resistance of our scheme against attack. Resistance is provided in 3 levels. At level 1, the data is communicated in encrypted form whose security is based on subgroup decisional problem. At level 2, the data communicated is not the actual measured value. Instead, it is the count of instance of values whose positional difference from the benchmark is given by the index. This count is the mixture of different parameter instances. Also, the benchmark is different for each parameter. This makes it very difficult for the attacker to retrieve parameter specific measured value without knowing the benchmark, index and parameter specific count and factorization of group order.
Chosen Cipher Text Attack (CCA)
Homomorphic nature which allows computation over the encrypted text makes the schemes to suffer from chosen cipher text attack [2]. But it requires the adversary to have the capability to decrypt some cipher text and this makes it hard to launch this attack in WSN. The nature of our proposed scheme provides additional shield against these highly capable adversaries. Because the information obtained by the adversary through CCA is only the number of occurrences not the exact value measured by the sensor. Benchmark and positional value are also needed to obtain the exact measured value. But this benchmark is known only to sender and base station. Positional value is communicated in cipher form using symmetric key. Hence without compromising sender and receiver, it is difficult for the intruder to perform CCA.
Malicious attack
Malicious adversaries are more a concern in the public key crypto system compared to symmetric environment. In symmetric key cryptosystem, the receiver is intended to communicate only with a single sender using a specific secret key. Hence for a malicious attacker without the symmetric key shared between sender and receiver, it is very unlikely to come up with a valid cipher text and communicate with receiver on behalf of sender. In our proposed scheme, the index of the vector elements is communicated between sender and receiver in cipher form encrypted using symmetric key. This reduces the possibility of malicious intruder contaminating the aggregated value at cluster head by passing wrong data. The integrity verification at BS also prevents it from accepting data falsified by this attack.
Cluster head compromised attack
The adversary, who got the privilege to compromise cluster head, can only obtain the vector elements and their position in cipher form. Though the adversary can get the actual position value using the symmetric key shared between CH and cluster member, it is difficult to obtain the vector elements plain text whose defense is based on subgroup decision problem. Also, the plaintext enclosed in the ciphered vector is semantically secured by upper bounding the probability of deciding correct plain text to half plus negligible.
Also, for the intruder who is privileged to break the semantic security, cannot obtain the actual measured value without knowing the benchmark which is different for each parameter and available only with sensor and BS. And false aggregation introduced at CH can be detected by the BS by verifying the integrity as mentioned in Section 5.5. Thus the design of our proposed scheme provides protection against cluster head comised attack.
Replay attack
The aggregated value can be contaminated by resending the already transmitted value by the intruder. Our proposed scheme cannot prevent this malicious effect. But enables the BS to identify this effect and filter out this attack. This is because the sum of all the elements of MatC
sum
gives the total number of nodes participated in the current data acquiring round. If
Malleability attack
Homomorphic encryptions are more vulnerable for malleability attack as the attacker can do any unwanted modification in the cipher text with no knowledge of keys and plain text. IAMFMA-SDA sends v hash generated using one way hash function over the sensed value and ID along with the vector elements and get aggregated enroute. Integrity of the final received values is verified by BS using Algorithm 4 and failure of integrity verification enables BS to detect malleability attack.
Performance analysis and comparison
The performance of our proposed scheme is analyzed by measuring the communication cost and computation cost. By the term communication cost we refer both the number of hops required for the gathered information to reach the BS and message size. The computation cost is analyzed by measuring the various kind of computations required to perform encryption at sensor node level, aggregation at cluster head and decryption at Base station. The major goal of the proposed scheme is to aggregate the data of multiple parameters and enable the BS to extract parameter wise data from the aggregate, to do integrity verification and also to provide BS the ability to apply dynamic query over the received data. Hence we consider the asymmetric PH based data aggregation schemes
Simulated scenario
The simulated scenario is created in NS2 with a WSN of varying network size ‘N’. The ‘N’ is varied as 100,150,200,260, and 300. A bounded area of 2000×2000 sqm is utilised for positioning the nodes. Initial energy level of these nodes is set to 100J with transmission range of 50 m. The receiving and transmission power of the nodes are set to 0.01J and 0.02 J. The simulation is done to aggregate two different parameters. For simulated analysis, the size of vector is set as 8 for both the proposed scheme and ASAHS. For IPHCDA, the region size ‘a’ is kept to the minimum of two.
Communication cost
To analyze the communication overhead, we consider the number of hops occurred during the execution of schemes and the message size as two key factors.
Hop refers to the movement of a single packet from one node to another node. Since the major concern of data aggregation is energy saving through hop reduction, the number of hops required for the data to reach BS from the sensor nodes is analyzed. Inside each cluster, the data packets from cluster members reach the CH in single hop. So the total hops in each clusters with size ‘s’ is O (s). The hops required for the packet to reach BS from CH is one as packets from the CH get aggregated in every hop with that of other CH in the path towards the BS. Hence the total hops required for packets to reach from CH to BS are O (c) for the WSNs with c clusters. The final Communication cost based on hop count is O (c . s ) + O (c).
Now we analyze the communication cost based on the size of cipher text. In our scheme, cipher text is the result of 2 scalar multiplications and one scalar addition over elliptic curve points. Since multiplication of an integer with a point and addition of 2 points over the elliptic curve results another point on the same curve, cipher text in our scheme is defined as pair of elliptic curve affine points. A point on an elliptic curve is represented using pair of coordinates x and y. Also, the size of x and y equals to the modulus [16] that is the order of elliptic curve. Using point compression mechanism, the point can be represented using only by x coordinate plus one or more bits of y coordinate. Hence for an elliptic curve defined over the finite field
Secured data aggregation and region wise integrity protection is implemented in [4] using APH of [1]. Hence, the size of single cipher text is same as that of our scheme. The same methodology is also adopted in [2] but it supports aggregation of multiple application and smooth extraction of application wise data by BS. Since the focus is only on aggregation, the cipher text size remains same in all the communication between all nodes in [2] and [4]. In RCDA [3], the size of data packet from each node is size of one cipher text plus the size of signature. Aggregation by concatenating the message limits the message space by the cluster size. This is addressed in our scheme by using parameter wise benchmark and encapsulating only the occurrence in the cipher text. In ASAHS [5] the data packet moving between nodes deployed in tree topology carries q vector element enciphered using Okamoto-Uchiyama (OU) encryption scheme [13] whose cipher text size is equal to modulus used in encryption [16]. Hence, the size of data packets is proportional to the vector size and remains same between all nodes. It requires one hop for each sensor as the aggregation is taking place at every hop enroute to BS. Added to this cost, before the commencement of data collection, the BS has to transmit vector data packet to each sensor node which has same size as that from nodes. This additional communication overhead between nodes and CH is eliminated and size of vector is reduced to 2 in our proposed scheme. In MFMDSDA [36], the size is based on the Pailliers cryptosystem [14] which suffers from message expansion problem as it increases the communication overhead. Formulas used to calculate the communication cost based on number of hops and bit length is given Table 2.
Communication cost
Communication cost
The number of bits generated per source node including the 802.11 header (11 bytes) along with the total number of bits transmitted and received during one round of data collection is tabulated in the Table 3 and the comparative analysis is given in Figs. 5 and 6.
Communication cost
s: WSN Size C: Number of clusters

Number of bits transmitted.

Number of bits received.
The total energy for transmitting and receiving the bits is computed. Comparative analysis based on the simulation results are graphically illustrated in Fig. 7. The communication overhead is 5.62% less than that of ASAHS and 60.8%, and 58.2% more than that of RCDA and CDAMA respectively. The extra communication overhead is acceptable in terms of support of the multiple queries in one data acquiring round compared to CDAMA. To find min, max, avg occurrence of the sensed values, minimum 3 rounds of data collection is carried out and the total energy consumption is measured as 495.19mJ for network of size 150 nodes. But in the proposed scheme the energy consumption for one round of data gathering is 317.45mJ which is 35.89% less. In RCDA and MRCDA, three rounds of data gatherings are required to obtain the details of three different physical parameters with an energy consumption of 530.37 mJ which is 17.5% more than that of the proposed scheme. Also, RCDA and MRCDA schemes are suitable only for the network with a maximum of 2 hop distance from the BS. This issue is addressed in our proposed scheme by supporting the network to grow vertically. This is aided by the smooth aggregation of multiple parameters at every hop. In MFMDSDA, due to message expansion problem the communication cost is 5.03% more than that of the proposed scheme.

Energy consumption.
To perform computation analysis, the various operations done by sensor nodes during message generation and aggregation as well as the operations done by BS during data extraction and data validation process are considered.
The computation overhead of message generation process mainly depends on the encryption scheme. In our scheme, the cipher text is generated using
Table 4 gives the formula to calculate the computation overhead of candidate schemes for message generation at each sensor node, aggregation at each CH, message extraction and validation at BS. Computation overhead includes all operations done by the sensor node and CH to generate the complete data packet which contains cipher text, data validation details etc. Among the candidate schemes we consider MFMDSDA [36], ASAHS [5] and RCDA [3] as more suitable since they allow BS to extract elaborated view of all possible values sensed by sensors and allow it to perform multiple statistical functions over the extracted information which is not supported by the other schemes [2, 4]. The Comparison results are given below. Figure 8 shows the comparison based on the computation done by sensor nodes. It shows that the operational overhead increases with the network size in all the 3 schemes.
Computation cost
Computation cost

Computation cost at sensor nodes by varying WSN size.
Comparison of proposed scheme with all candidate schemes based on the overall computation cost is given in Fig. 10. Compare to ASAHS [5] the computation cost of our scheme is less 22.95%. But it is more than that of RCDA [3]. While analyzing the computation by keeping the WSN size fixed and varying the value range ie vector size q, the operations done by sensors increase proportionally with vector size in ASAHS, but variation in q does not have any impact in our scheme (refer Fig. 9). This is because in our scheme the number of cipher text processed in each sensor node is fixed to two independents of the value range. While analyzing the computation happening at CH, RCDA and our scheme have shown better performance compare to ASAHS. In both RCDA and our scheme, the number of cipher text processed by CH is less than that in ASAHS. While comparing the performance with RCDA, In RCDA, the overall computation required to generate the data packet is 25.07 % less than that of our proposed scheme. This difference is the result of additional computations required for generating two vector elements to support the elaborated aggregated view of data. But RCDA is not suitable for aggregating data of multiple parameters with a different value range. Compare to MFMDSDA which support multiple function and multiple parameter aggregation the computation cost at sensor node is 14.22% less due to the novelty adopted in the formation of secured vector in our proposed scheme.

Computation cost at CH by varying network size.

Computation cost at sensor nodes by varying q.
The comparison of existing Asymmetric PH based schemes with IAMFMA-SDA using the key Quality of Service(QoS) measures is given in Table 5. The entire candidate schemes compared in this section ensures end to end confidentiality using asymmetric PH based encryption by applying aggregation function over cipher text. The truthfulness of received data are verified based on the values in [3, 6, 7, 30, 31, 36]. But the schemes [5] and [4] verify the integrity of the data on the basis of number of active nodes participated in data gathering. IAMFMA-SDA supports both value based and active nodes based integrity verification. This provides protection against malleability and replay attack. No integrity verification is provided in [3]. But it offers the support of multiple parameter integration. The same is implemented for region wise data aggregation in [4]. But both supports only one kind of query at BS and fails to provide detailed view of the data gathered by all sensor nodes and hence does not support multi functionality as in IAMFMA-SDA and [3, 5, 6].
Comparison
Comparison
EM: Encryption method AU: Authentication IT: Integrity MP: Multi parameter support MF: Multi Functionality support SL: Scalability
The authentication of participating nodes is confirmed only at BS in [3, 5, 6]. The detection of unauthorized communication both at cluster level and at BS level is provided in IAMFMA-SDA and in [4, 7, 31]. Early detection at CH level prevents the propagation of attack. Also, the support for dynamic scaling of network size is given in IAMFMA-SDA and [5] by representing the sensed value as number of occurrences. While comparing defending mechanism against attack, malleability attack is detected through integrity verification by the schemes [3–7, 30, 31] and IAMFMA-SDA. Attack as a result of replaying the packets is detected by analyzing the number of participating nodes in [3, 5, 6] and IAMFMA-SDA. The same is prevented in [4, 31] by attaching the time stamp with all data packets. In [7] it is thwarted by maintaining a cipher text counter pair. Nature of homomorphism in all the discussed schemes makes them vulnerable to chosen cipher text attack [2] but it requires the attacker to have an ability to decrypt chosen cipher text. In our scheme IAMFMA-SDA the stability against this attack is strengthened. To conclude the comparison shows that IAMFMA-SDA is the scheme which provides support for all the key QoS measures like confidentiality, integrity, authentication, support of multi parameters, scalability and multi functionality with reduced computation cost.
In this paper, we present a scalable secure multi parameter aggregation scheme for wireless sensor network in cluster topology. The proposed scheme offers a special feature to aggregate data of multiple parameters and to enable secure recovery of elaborated information about sensing data parameter wise in round of data collection. The BS is allowed to compute multiple mathematical functions like sum, median, max, min over the received data of different parameters using a single query. Confidentiality between source end and destination end is provided using public key based homomorphic encryption. Integrity of the data received and authentication of source node is ensured at BS. Also, the proposed scheme is secure against eavesdropping attack, node compromising attack, chosen cipher text attack, replay attack, malleability. A comparison of communication overhead and computation cost with respect to existing schemes shows the practicality and efficiency of the proposed system while achieving all the considerable quality measures (support of multi parameter aggregation, multi functionality using single query, scalability, integrity and authentication) on resource constraint WSN. To the best of our knowledge, our proposed scheme is the first to meet all the above-mentioned quality measures with a feasible performance.
Footnotes
Declaration of interests
None.
