Abstract
It has been envisioned that vehicular ad hoc networks (VANETs) would play an important role in future traffic monitoring applications. This paper investigates how VANETs can overcome the strictly limited bandwidth and privacy issues. Firstly, an aggregation scheme is presented to minimize the required overall bandwidth. Subsequently, a privacy algorithm capable of ensuring user privacy in applications is proposed. This proposed mechanism can not only protect the private data of each individual against the adversaries with arbitrary auxiliary information, but also achieve the optimal accuracy of aggregates. Through simulations, the viability of this approach is confirmed.
Introduction
As an important component of the intelligent transportation system (ITS) and a novel form of mobile ad hoc network, vehicular ad hoc networks (VANETs) for ubiquitous vehicles to sense and collect an enormous amount of traffic related information has attracted much attention from the government, industries and academic institutions. The collected information can be utilized for road traffic report, accident warnings and parking recommendation to improve road safety, traffic efficiency, and convenience for drivers. In the information broadcast or multi-hop transmissions, vehicles shall not only broadcast their own collected information, but also deliver others’ messages. Therefore, local data aggregation is needed to reduce the communication cost and improve the efficiency [7].
In VANETs, data aggregation is accomplished in the following way. Locally, each car makes observations for some essential measured values (i.e., speed, location, mileage and timestamp). The newly observed values are then integrated into the current aggregate record through appropriate weighted averages for the respective parameters to generate the new aggregate.
In this way, the bandwidth for transmitting separate observations is saved for each vehicle. Though the aggregates without obvious information about individuals, an adversary can infer the location and speed of an individual by comparing the new aggregate with the old one [9]. A person’s whereabouts may imply sensitive private information, such as health problems, lifestyles and political associations [14].
By carefully adding the chosen random noises, such as differential privacy, individual observations are not revealed to the adversary by hiding the difference between two continuous aggregates. First proposed in statistical databases, differential privacy [9] adds noise to the answers of queries so that the querying system cannot detect the presence or absence of a single tuple or a set of tuples, which provides a provable foundation for measuring privacy loss regardless of the information collected by an adversary. However, introducing differential privacy into data aggregation in VANETs imposes the following challenges: The data are distributed across a potentially large number of vehicular devices and roadside units (RSUs), without a centralized database. Some differential privacy approaches are restricted to static scenarios, which may not be suitable for the dynamic vehicular environment.
In order to address those challenges, a differentially private aggregation mechanism is proposed for VANETs, which can protect the privacy of each individual’s data against the adversary even if there is any arbitrary auxiliary information, and achieve the optimal accuracy of aggregates.
The rest of this paper is structured as follows. In Section 2, the notions of data aggregation and differential privacy from the literature are discussed, and their weaknesses and strengths are pointed out. In Section 3, the scheme for aggregating travel time information in city environments is proposed and introduced. Besides, distributed differential privacy settings and data aggregation model for VANETs are explained. Then, a differential privacy aggregation mechanism is described in Section 4, and the results of a simulative evaluation are presented and discussed in Section 5. Finally, a conclusion of this paper is drawn in Section 6.
Related work
In this section, the state of the art of data aggregation for vehicular communication and recent applications of differential privacy are reviewed.
Data aggregation for vehicular communication
Because of the limited bandwidth in vehicular communications, the information of several vehicles is often aggregated into a single message or record. This process is called data aggregation. So far, many data aggregation schemes have been proposed, such as SOTIS [21], TrafficView [15], and CASCADE [13], for collecting and combining information about position and speed of vehicles. Scheuermann [19] have proved that any suitable aggregation scheme shall be able to reduce the bandwidth with the information about an area at a distance of d provided to the cars asymptotically faster than
Only recently, the security aspects of data aggregation have begun to attract attention. Picconi et al. [17] have developed a probabilistic solution for validating the aggregated data-probabilistically detecting malicious cars that generate false aggregated information. Dietzel et al. [12] have proposed a secure aggregation which is resilient against both malicious users of the system and wrong information as faulty sensors are taken into consideration.
Nevertheless, the privacy issues of aggregation mechanisms in vehicular networks still have not been researched. In [6], a new privacy metric is put forward to measure the privacy enhancements brought by in-network aggregation, which is considered as a promising technology to protect privacy, because the aggregates do not contain individual atomic information. In this paper, a strong attack model, with which an adversary can infer individual information based on the difference between two continuous aggregates, is considered.
Differential privacy
As a notion of privacy from the area of statistical databases, differential privacy [9] aims to protect an individual’s data while publishing aggregate information about the database. Among the existing privacy models, differential privacy provides one of the strongest privacy guarantees. A typical way to achieve this notion is to add controlled random noise to the query output, such as that drawn from a Laplace distribution. Although most works on differential privacy assume a centralized database, there is no centralized database existing in vehicular communication environment, where vehicles or RSUs maintain traffic-related data. Consequently, distributed differential privacy is considered in this paper.
A number of prior works have provided differential privacy in distributed settings. Shi et al. have designed distributed differential privacy mechanisms based on applied cryptographic techniques, as shown in [20]. However, their distributed key distribution protocols cannot work in a dynamic vehicular environment. In [4,5], Chen at al. have proposed some practical systems that not only provide distributed differential privacy but also limit the effect of the malicious users on query results. Unfortunately, none of them can be directly applied to VANETs due to the multi-dimensional spatial data in VANETs.
There have been limited works applicable to spatial data. Andres et al. [2] have introduced differential privacy into location-based systems that protect the users’ exact location information, allowing the approximate location to be released for obtaining services. Cormode et al. [18] have studied how to apply differential privacy to spatial decompositions. In consequence, how to apply distributed differential privacy to spatial data aggregation in VANETs is studied in this paper.
Preliminaries
In addition to the adversarial model and distributed differential privacy settings, the data aggregation model for VANETs based on the framework proposed in [8] is demonstrated in this section.

Network model in VANETs.
A scenario is assumed in VANETs, where cars need to send their observations (e.g. location, speed, and timestamp) to a server through Vehicle-to-Vehicle (V2V) or Vehicle-to-Infrastructure (V2I) communications. With the dynamic traffic information, the server can provide users with services such as road traffic report and navigation. For reducing the communication cost, vehicles combine the received messages with their own information. As shown in Fig. 1, Vehicle 2 aggregates the message from Vehicle 1, and Vehicle 4 collects the message from Vehicle 3, enabling RSU to receive a message that contains traffic information on the road interval.
Adversary model
In this paper, a global passive adversary is hypothesized, which is able to eavesdrop on all the vehicles. Its prior knowledge is represented as a tuple
Provided that
Distributed differential privacy
In VANETs, individual observations are made and stored in vehicles or RSUs. As there is no centralized database, distributed differential privacy is adopted, with which users control their released data by adding sufficient noise to ensure the differential privacy [1]. Formally, vehicles and RSUs evaluate a function
Given a subset K of participants, it is assumed that
It is required that the following notion of distributed differential privacy is applied to each time period
([16]).
Provided that
In the above definition, the two vectors
When K is the set of compromised nodes, the above definition requires the remaining honest participants’ randomness to be sufficient, so as to ensure differential privacy. As a result, the probability is conditioned on the randomness
Formally, a computation F gives
To put it another way, the probability that the computation F produces a given output is almost independent on whether this record is present in the dataset for any possible record.
Provided that
As shown in Fact 1, if the statistical Function
Differential privacy [19] is a rigorous privacy model that makes no assumption about an adversary’s background knowledge. A differentially-private mechanism ensures that the probability of any output (released data) is equally likely from all the nearly identical input data sets, which thus guarantees that all the outputs are insensitive to the data of any individual. In view of this, an individual’s privacy is not at risk owing to his/her participation in the data set.
An untrusted aggregator (RSU) which may have arbitrary auxiliary information is taken into consideration. For instance, the aggregator may collude with a set of vehicles, which may leak their data and noise to the aggregator in turn. Such information can be considered as a form of auxiliary information. In fact, auxiliary information about vehicles can also be obtained through other means (e.g. public datasets on the websites, or personal knowledge about a specific vehicle). The goal of this research is to ensure the privacy of each vehicle’s data, even when the aggregator has arbitrary auxiliary information. The formal privacy concept satisfies aggregator with inscient and distributed differential privacy.
To achieve this goal, a privacy model similar to the differential privacy notion is adopted. In this model, data aggregator and other vehicles are untrusted. This proposed privacy preserving mechanism is strong enough even if the aggregator has any prior knowledge or even some collusions.
Realization of aggregation security
How to make the vehicles and the aggregator achieve safety aggregation at the minimum cost of communication is one challenge. If one allows the vehicles and the aggregator to interact with each other in real time, then standard secure multi-party computation [11] techniques can be utilized to ensure that the data aggregator can only get the sum. Nonetheless, it is required that all the vehicles shall keep simultaneously online and interact with each other periodically, which is impractical. With this solution, no further interaction is needed except for uploading a noisy encryption to the data aggregator, after a trusted setup between all the vehicles and the data aggregator which may be performed by a trusted third-party or through a standard secure multi-party protocol.
Meanwhile, the intuition behind this construction is illustrated. It is assumed that the vehicles and the aggregator randomly share 0 during every time period
Specifically,
To encrypt the value
At time
Since
Realization of distributed differential privacy
The encryption operation ensures that the aggregator obtains nothing except the statistics with the noise, because the aggregator has no direct access to individual’s data. Nevertheless, individuals’ privacy may be leaked indirectly, as the aggregator can infer their information from continuous aggregated data. In this section, how to build a guarantee of
In previous literatures about differential privacy, it is always assumed that the aggregator is credible. Aiming to ensure differential privacy, a standard procedure is to add an appropriate magnitude of noise before publishing the desired statistic.
In VANET, the vehicles do not trust the aggregator. As a consequence, the real data shall not be sent to the aggregator. Instead, it shall be ensured that the aggregator only obtains the data added with noise. This approach would add noise to each vehicle’s data before encrypting them, which insures that each vehicle satisfies the differential privacy of its own data.
It is provided that
It is supposed that the original data of each vehicle fall within the domain
Performance analysis
In the proposed encryption structure, the encryption contains a hash operation, two modular operations, and a Diffie–Hellman multiplication operation. The operation time is mainly determined by two modular operations, while the time of the hash operation and Diffie–Hellman multiplication can be neglected. As shown in literature [3], the classical Diffie–Hellman group module operation 1024-bit utilized for modular operation takes about 3 ms, whereas the operation with high-speed elliptic curve costs only 0.3 ms. Hence, the encryption algorithm takes about 0.6 ms on the modern computer, which meets the need of practical application.

Empirical error of the proposed scheme and the naive scheme.
Then, the accuracy of the data is proved. Since
It is assumed that
According to Theorem 1, cumulative error will have a high probability to be in the range of
As shown in Fig. 2, a comparison on the simulation results between the proposed mechanism and the conventional mechanisms. In the simulation, it is assumed that the vehicles’ data are from
Conclusion
In this paper, the aggregation problem in VANETs is studied, and a differentially private aggregation mechanism is proposed, under which VANETs protect the privacy data of each individual against the adversary with arbitrary auxiliary information. The contribution of this study mainly focuses on proposing a distributed data randomization procedure that guarantees the differential privacy of the result, even when a subset of participants might be compromised. However, further efforts remain to be made to apply this idea in real practices.
Footnotes
Acknowledgement
The work was supported by the National Natural Science Foundation of China (61202099, U1304606), Plan for Youth Backbone Teacher of Colleges and Universities in Henan Province (2013GGJS-075), Plan for Scientific Innovation Talent of Henan University of Technology (2012CXRC05), and Plan for Youth Backbone Teacher of Henan University of Technology.
