How efficient are replay attacks against vote privacy? A formal quantitative analysis 1

Abstract

Replay attacks are among the most well-known attacks against vote privacy. Many e-voting systems have been proven vulnerable to replay attacks, including systems like Helios that are used in real practical elections.

Despite their popularity, it is commonly believed that replay attacks are inefficient but the actual threat that they pose to vote privacy has never been studied formally. Therefore, in this paper, we precisely analyze for the first time how efficient replay attacks really are.

We study this question from commonly used and complementary perspectives on vote privacy, showing as an independent contribution that a simple extension of a popular game-based privacy definition corresponds to a strong entropy-based notion.

Our results demonstrate that replay attacks can be devastating for a voter’s privacy even when an adversary’s resources are very limited. We illustrate our formal findings by applying them to a number of real-world elections, showing that a modest number of replays can result in significant privacy loss. Overall, our work reveals that, contrary to a common belief, replay attacks can be very efficient and must therefore be considered a serious threat.

Keywords

Electronic voting privacy information flow

1. Introduction

Electronic voting, or e-voting, is a reality. Systems for e-voting are nowadays used for political elections all over the world, for example, in Australia, Brazil, Estonia, India, Switzerland, or the US. Furthermore, in line with the general shift toward remote technologies, numerous institutions (e.g., academic organizations such as ACM, IACR, or SIAM) employ e-voting systems to mitigate physical barriers and increase voter turnout.

The two most fundamental properties for secure e-voting are (end-to-end) verifiability and (vote) privacy. Verifiability [17] enables external and internal observers to detect and reject falsely computed election results, even when the underlying cause is an unknown programming error or malicious behavior of some of the participants. Privacy [5] guarantees that all data published during an election (including data for proving the integrity of the final result) does not leak more information on the single voters’ choices than what can be derived from the public (unbiased) election result.

Designing secure e-voting systems is very challenging, with a long and rich history going back to the 1980’s [4]. Since then, numerous e-voting systems have been proposed which aim to provide both verifiability and privacy (see, e.g., [1,13–16,34,39,40]), sometimes with additional security properties such as receipt-freeness [13] or coercion-resistance [16]. Some of these e-voting systems have been and are used in practice, for example for political elections in Australia [11], Estonia [41], Switzerland [43], and the US [12], or for non-political ones, such as IACR elections [25], to name just a few.

It is crucial to protect the voters’ privacy not only against passive observers but also against adversaries who control some of the protocol participants (e.g., voters or tallying authorities) and who let these corrupted participants actively deviate from their specified roles in order to undermine privacy of some or all (uncorrupted) voters. Guaranteeing privacy in the face of such active adversaries is a common standard which (most) modern e-voting systems aim to provide. However, it turned out that numerous e-voting systems fall short of this goal in their respective threat scenarios, including seminal systems like Helios (see [7,10,18]), Civitas (see [24,27]), or Prêt à Voter (see [10]).

One of the most prominent classes of attacks against privacy – if not the most prominent one – are replay attacks (see, e.g., [7,13,18,19,24]) to which many e-voting systems have been proven vulnerable (e.g., [1,8,16,34,39,40]). Roughly speaking, a replay attack works as follows. The adversary, who controls some corrupted voters, targets an uncorrupted voter whose privacy he wants to undermine. The adversary waits until the targeted voter has submitted her ballot and reads it from the public bulletin board. Then the adversary instructs (some of) the corrupted voters to submit (possibly a re-randomization of) the same ballot the targeted voter had submitted before. If, in a particular e-voting system, these replayed ballots are not discarded prior to tallying, then the targeted voter’s choice is amplified in the public election result. Because the adversary now obtains more information about the targeted voter’s choice than what he could derive from an unbiased election result, vote privacy is undermined.

Despite their popularity, the risk of replay attacks is often regarded as a “largely theoretical” [42] issue, even in the scientific community. In publications which unveil replay attacks against vote privacy of an e-voting system, the effect of replay attacks is typically illustrated for extreme cases only, e.g., elections with just two honest voters [10] or with many corrupted ones who all replay a single voter’s ballot [19]. While such completely artificial toy examples can be useful to explain why privacy is formally broken, they seemingly suggest that replay attacks do not pose a serious threat. It is therefore not surprising that, for instance, in response to the replay attack against Helios [1] discovered in [18], Helios Voting replied that “the risk of this attack being successfully carried out is low, as it requires “wasting” a number of votes to compromise the privacy of one voter”, concluding that, most likely, replay attacks would not matter [23]. This – as we shall see, fallacious – perspective may also explain why the latest version of Helios [1] (used, e.g., for IACR elections [25]) has not yet been patched to defend against the replay attacks discovered in [7].

Indeed, at first glance, it seems necessary to replay a targeted voter’s ballot many times in order to significantly amplify this voter’s choice in the final result. However, somewhat surprisingly, this common conjecture has never come under close scrutiny. The study by Cortier and Smyth [18] is the only previous work which attempted (in Section III.C) to analyze how replay attacks scale, but the authors considered a “definitive mathematical analysis” as future work because their underlying model was “rather naïve” [18].

In this work, we challenge the abovementioned conjecture for the first time, both rigorously and extensively, from two established and complementary perspectives on vote privacy. We precisely measure how efficient replay attacks really are, i.e., how much the affected voter’s privacy loss increases depending on the number of replays. In particular, we show that replay attacks can be devastating for a voter’s privacy even when an adversary’s resources are highly limited so that he can (or is willing to) replay a targeted voter’s ballot only very few times. This observation disproves a common conjecture that vote privacy would only be at risk if the number of replays was high. Our novel insights are immediately relevant for the security of real elections because e-voting systems vulnerable to replay attacks have been, are, and most likely will be used in practice (e.g., the latest version of Helios for IACR elections).

1.1. Our contributions

We provide the following contributions.2

²
We presented the first four contributions in the conference paper [37], whereas our fifth contribution (efficiency analysis of replay attacks under the assumption of approximate prior knowledge) is novel.

Categorization of replay attacks (Section 2 ). We begin by reviewing the scientific literature to extract all replay attacks against vote privacy that have been published to date. We categorize these attacks into different classes, depending on their specific forms. Our extensive presentation highlights that replay attacks play a central role in modern secure e-voting, which demonstrates the importance of our subsequent analysis.

Efficiency analysis based on the KTV vote privacy definition (Section 4 ). We first formally analyse the efficiency of replay attacks using the vote privacy definition by Küsters, Truderung, and Vogt [30], hereafter called the KTV privacy definition (Section 3). The KTV privacy definition is not only established and widely used (see, e.g., [3,9,28,29,32]) but it proves particularly useful for our purposes because it allows us to measure the loss of vote privacy and thus the efficiency of replay attacks.

We first define an ideal functionality for an e-voting protocol which allows the adversary to replay a targeted voter’s ballot $n_{repl}$ times, and compute the KTV privacy loss of this protocol. We obtain a useful reduction from the privacy loss for a general election to that for an election with only three candidates.

This allows us to analyze how the ideal privacy loss is affected by the number of replays $n_{repl}$ . As we shall see, even for small numbers of $n_{repl}$ , the privacy loss can be devastating. We illustrate our abstract results with a number of realistic examples.

A new entropy-based vote privacy definition (Section 5 ). A limitation of the KTV privacy definition [30] (observed for instance in [5]) is that it only measures privacy with respect to a specific security game, namely the adversary’s ability to guess between two possible votes. In particular this means that for the ideal functionality (including replays) the privacy loss is (as we will see in Section 4) entirely determined by the two least popular candidates, with the other candidates having no effect whatsoever.

Entropy-based measures of vote privacy (e.g., [6,36]) provide a complementary view because they consider privacy with respect to a variety of goals for the adversary. Unfortunately, as we will explain in Section 5, they are limited in various ways which make them difficult to apply in practice to analyse concrete elections.

In Section 5, we propose a simple extension of the KTV vote privacy definition, which we show is equivalent to a computational version of a strong entropy-based notion. This is independent of the replay attack setting, and serves to somewhat unify the KTV and strong entropy-based approaches.

We show that our novel definition can be efficiently and accurately estimated for the ideal functionality using Monte Carlo methods, and so we are able to use it to study the efficiency of replay attacks from an entropy-based perspective complementary to the game-based perspective of the KTV definition.

Analysis of real-world elections (Section 6 ). In order to complement our formal analysis, we study how replay attacks would scale in practical elections. We therefore apply our formal results to publicly available data of political elections in Estonia, Germany, the UK, and the USA. In this way, we can realistically simulate to which degree vote privacy would decrease if in such elections replay attacks had been executed. Our “field test” confirms the gist of our abstract results: even if the number of replays is very low, vote privacy can be undermined significantly.

Approximate prior knowledge (Section 7 ). Throughout all of our analyses mentioned above, we assume that the adversary knows the exact voting probabilities of the honest part of the electorate. On the one hand, this assumption is not far-fetched since in reality an adversary can access multiple sources to infer what the election result would have been if he had not polluted it with a replay attack. However, even with the help of such side information, it is realistic that an adversary only gains approximate knowledge of the honest voters’ overall preferences.

In order to account for this restriction, we additionally analyse the efficiency of replay attacks under the assumption that the adversary has only approximate knowledge of the unbiased election result. We do this with a Bayesian approach, assuming that the adversary’s prior is given by the distribution that he would obtain via sampling, namely the Dirichlet distribution. We find that while, as one would expect, the efficiency of replay decreases when the adversary’s uncertainty increases, this decrease is relatively modest and the adversary can still significantly undermine vote privacy of the targeted voter, even with relatively few replays.

1.2. Structure of the paper

The structure of our paper essentially follows our contributions as presented above. In Section 2, we categorize all replay attacks described in the literature. In Section 3, we recall the KTV privacy definition as well as the ideal privacy loss a voting protocol can achieve w.r.t. the KTV definition. In Section 4, we study the efficiency of replay attacks based on the KTV privacy definition. In Section 5, we propose our new entropy-based vote privacy definition, describe its relationship to the KTV definition, and show that it can be efficiently estimated for the ideal functionality of Section 4. In Section 6, we illustrate our theoretical results using concrete election data from political elections and discuss the consequences of our insights. In Section 7, we analyse the efficiency of replay attacks under the assumption that the adversary has only approximate prior knowledge of the honest voters’ preferences.

2. Categorization of replay attacks

We provide the first comprehensive categorization of all replay attacks against vote privacy described in the literature. We identified three different variants of replay attacks: basic replay attacks, homomorphic replay attacks, and re-voting replay attacks. We summarize our insights at the end of the section.

2.1. Basic replay attacks

In its most basic form, a replay attack works as follows. Assume that we have $n_{V}$ voters and that the adversary aims to break privacy of some voter $V_{obs}$ , the voter under observation. We assume that the adversary controls a number $n_{V}^{d}$ of further voters. The adversary waits until $V_{obs}$ has submitted her ballot $b_{obs}$ , containing her secret choice $c_{obs}$ , to the bulletin board. The adversary reads $b_{obs}$ from the bulletin board and instructs all of his corrupted voters to submit $V_{obs}$ ’s ballot $b_{obs}$ as well. If, due to the specification of the e-voting scheme invoked, all $n_{V}^{d} + 1$ identical ballots $b_{obs}, \dots, b_{obs}$ are tallied, then the public election result contains $n_{V}^{d}$ additional votes for $V_{obs}$ ’s choice $c_{obs}$ . By this, $V_{obs}$ ’s choice is amplified in the final result and thus her vote privacy is undermined.

Numerous e-voting schemes have been proven vulnerable against this basic version of replay attacks. Cortier and Smyth [18] demonstrated that basic replay attacks are possible in Helios [1], in the voting scheme by Sako and Kilian [39], and in the one by Schoenmakers [40]. The basic replay attacks against these voting schemes can be prevented by rejecting (partially) duplicated ballots.

2.2. Homomorphic replay attacks

Even if duplicated ballots are rejected in order to protect against basic replay attacks (see above), it may be possible to exploit malleability of the underlying cryptographic primitives in order to execute (more subtle) replay attacks. In what follows, we explain the general idea of such homomorphic replay attacks.3

³
We restrict our attention to the ballots’ ciphertexts and put further primitives (signatures etc.) aside for simplicity.

E-voting schemes with homomorphic tallying assume that voters’ ciphertexts are re-randomizable, i.e., it is possible to transform ciphertext

e = Enc (pk, m; r)

into ciphertext

e^{'} = Enc (pk, m; r^{'})

without knowledge of the secret key

sk

, plaintext m, or randomness r. In a homomorphic replay attack, the adversary re-randomizes the observed voter’s ballot

b_{obs}

into

n_{V}^{d}

ballots

b_{1}, \dots, b_{n_{V}^{d}}

. Because the ballots

b_{obs}, b_{1}, \dots, b_{n_{V}^{d}}

are mutually distinct (with overwhelming probability if the encryption scheme is semantically secure), they will all be tallied even if ballot duplicates are strictly removed. By this, analogously to the basic replay attack (see above), the observed voter’s privacy is undermined because all ballots

b_{obs}, b_{1}, \dots, b_{n_{V}^{d}}

contain

V_{obs}

’s choice

c_{obs}

Several e-voting schemes are vulnerable to such homomorphic replay attacks, for example the one by Lee et al. [34] (pointed out by Dreier, Lafourcade and Lakhnech [19]), or the one by Blazy, Fuchsbauer, Pointcheval, and Vergnaud [8] (pointed out by Chaidos, Cortier, Fuchsbauer, and Galindo [13]) which is the predecessor of BeleniosRF [13].

In order to protect against homomorphic replay attacks, many e-voting schemes employ zero-knowledge proofs (ZKPs) of knowledge which each voter uses to prove that she knows the plaintexts (and randomness) in the ciphertexts of her ballot. By this, a corrupted voter can no longer re-randomize the observed voter’s ballot because he is not able to come up with a (valid) proof of plaintext knowledge.

Typically, e-voting schemes employ ZKPs of knowledge which are non-interactive, i.e., where the voter does not communicate with the verifier while proving knowledge (and correctness) of her encrypted choice. To construct such non-interactive ZKPs, most (modern) e-voting schemes use the Fiat–Shamir transformation [20]. However, as we will recall in what follows, applying the Fiat–Shamir transformation correctly is non-trivial.

Bernhard, Pereira, and Warinschi [7] demonstrated that great care has to be taken when the Fiat–Shamir transformation is used. Bernhard et al. showed that the Fiat–Shamir transformation in the implementation of Helios [1] is too weak because the hash function does not take the statement to be proven as input. Therefore, a voter’s ZKP in Helios [1] is in fact not a proof of knowledge, enabling an adversary to still execute homomorphic replay attacks.

2.3. Re-voting replay attacks

Bursuc, Dragan, and Kremer [10] explained that, even if (partial) ballot duplicates are strictly removed and a (correct) ZKP of knowledge is used (see above), replay attacks against Helios [1] are still feasible if the ballot box is corrupted. We note that, in principle, this replay attack is not restricted to the case of Helios. In what follows, we describe the idea of this replay attack, which is due to P. B. Rønne originally (according to [10]).

If the adversary controls the ballot box (i.e., the server to which voters send their ballots), it can claim that the ballot casting of the voter under observation $V_{obs}$ was not successful. The voter under observation may then try a second attempt with the same vote $c_{obs}$ . This way, the adversary obtains two different ballots $b_{obs}$ , $b_{obs}^{'}$ , both containing the observed voter’s vote $c_{obs}$ . Now, the adversary can submit $b_{obs}$ on behalf of one of the corrupted voters, whereas the voter under observation $V_{obs}$ submits $b_{obs}^{'}$ . Because $b_{obs}$ and $b_{obs}^{'}$ do not contain identical entries (with overwhelming probability due to the semantic security of the underlying cryptographic primitives), they will both be in the input of the tallying phase. The attack can be repeated several times to obtain more ballots of $V_{obs}$ ’s vote $c_{obs}$ . By this, analogously to the basic replay attack (see above), the observed voter’s privacy is undermined. The attack could, for example, be prevented by including each voter’s ID in the statement to be proven, in particular in the hash of the Fiat–Shamir transformation.

2.4. Summary

Our comprehensive presentation demonstrates that replay attacks are a recurrent and often subtle issue in the construction and employment of secure e-voting systems, even when deliberately designed to protect against them. While some pitfalls making replay attacks possible are straightforward to solve (e.g., removing duplicates), others are more subtle and require very close attention (e.g., using strong Fiat–Shamir transformations). Based on our systematic literature review, we conjecture that, despite its popularity, the threat of replay attacks is a recurrent issue of e-voting. It is therefore important to precisely understand the risk that replay attacks pose to the crucial property of vote privacy. In the remainder of this paper, we provide the first formal analysis of this fundamental threat.

3. KTV vote privacy definition

The first part of our formal analysis of replay attacks (Section 4) is based on the vote privacy definition proposed by Küsters, Truderung, and Vogt [30], hereafter called the KTV (privacy) definition. We explain the motivation for this privacy definition and the formal definition itself in Section 3.2, after first recalling the underlying computational model in Section 3.1. In Section 3.3, we recall the best possible privacy loss an arbitrary voting protocol can achieve according to the KTV privacy definition; this ideal privacy loss is expressed as a parameterized formula that we will use to precisely measure the efficiency of replay attacks in Section 4.4

⁴
What we call privacy loss in this work was in the original paper [30] called privacy level. Because the privacy bound δ is higher when more private information is leaked, we prefer to use the term privacy loss for δ.

3.1. Computational model

We briefly recall the computational model of the KTV privacy definition, in particular the notions of processes, protocols, instances, and properties. We refer to [30] for full technical details.

Process. A process is a set of probabilistic polynomial-time interactive Turing machines (ITMs, also called programs), which are connected via named tapes (also called channels). We write a process π as $π = p_{1} ‖ \dots ‖ p_{l}$ , where $p_{1}, \dots, p_{l}$ are programs. If $π_{1}$ and $π_{2}$ are processes, then $π_{1} ‖ π_{2}$ is a process, provided that the processes are connectible: two processes are connectible if common external channels have opposite directions (input/output). A process π where all programs are given the security parameter ℓ is denoted by $π^{(ℓ)}$ . The processes we consider are such that the length of a run is always polynomially bounded in ℓ. A run is uniquely determined by the random coins used by the programs in π.

Protocol. A protocol $P$ specifies a set of agents (also called parties or protocol participants) and a set of channels these agents can communicate over. Moreover, $P$ specifies, for every agent a, a set $Π_{a}$ of all programs the agent a may run and a program ${\hat{π}}_{a} \in Π_{a}$ , the honest program of a, i.e., the program that a runs if a is honest, and hence, follows the protocol.

Instance. Let $P$ be a protocol with agents $a_{1}, \dots, a_{n}$ . An instance of $P$ is a process of the form $π = (π_{a_{1}} ‖ \dots ‖ π_{a_{n}})$ with $π_{a_{i}} \in Π_{a_{i}}$ . An agent $a_{i}$ is called honest in the instance π if and only if $π_{a_{i}} = {\hat{π}}_{a_{i}}$ . A run of $P$ (with security parameter ℓ) is a run of some instance of $P$ (with security parameter ℓ); we consider the instance to be part of the description of the run. An agent $a_{i}$ is honest in a run r, if r is a run of an instance of $P$ with honest $a_{i}$ .

Property. A property γ of $P$ is a subset of the set of all runs of $P$ . By $\neg γ$ we denote the complement of γ.

Negligible, overwhelming, δ-bounded. As usual, a function f from the natural numbers to the interval $[0, 1]$ is negligible if, for every $c > 0$ , there exists $ℓ_{0}$ such that $f (ℓ) ⩽ \frac{1}{ℓ^{c}}$ for all $ℓ > ℓ_{0}$ . The function f is overwhelming if the function $1 - f$ is negligible. A function f is δ-bounded if, for every $c > 0$ there exists $ℓ_{0}$ such that $f (ℓ) ⩽ δ + \frac{1}{ℓ^{c}}$ for all $ℓ > ℓ_{0}$ .

3.2. Privacy definition

The KTV privacy definition [30] formalizes privacy of an e-voting protocol as the inability of an adversary $π_{A}$ to distinguish whether some voter $V_{obs}$ , the voter under observation who runs her honest program, voted for choice $c_{j}$ or choice $c_{j^{'}}$ . Unlike binary privacy notions according to which a voting protocol either does or does not protect privacy (see [5]), the KTV privacy definition measures the privacy loss a voting protocol provides. Being able to measure vote privacy, in particular to measure the loss of vote privacy due to attacks, is crucial for the purposes of our paper (see Section 4).

To be more precise, according to [30], a voting protocol provides δ-privacy if any adversary $π_{A}$ is able to distinguish whether $V_{obs}$ voted for $c_{j}$ or $c_{j^{'}}$ with probability at most δ; or, to phrase it differently, if any adversary’s advantage is δ-bounded. To define the KTV privacy notion formally, we first introduce the following notation for an arbitrary e-voting protocol $P$ . Given a voter $V_{obs}$ and choice c, we consider instances of $P$ that induce a set of processes of the form $({\hat{π}}_{V_{obs}} (c) ‖ π^{*} ‖ π_{A})$ where ${\hat{π}}_{V_{obs}} (c)$ is the honest program of the voter $V_{obs}$ under observation who takes c as her choice, $π^{*}$ is the composition of programs of the remaining parties in $P$ , and $π_{A}$ is the program of the adversary. Let $Pr [{({\hat{π}}_{V_{obs}} (c) ‖ π^{*} ‖ π_{A})}^{(ℓ)} \mapsto 1]$ denote the probability that the adversary writes the output 1 on some dedicated tape in a run of $({\hat{π}}_{V_{obs}} (c) ‖ π^{*} ‖ π_{A})$ with security parameter ℓ and some choice c, where the probability is taken over the random coins used by the parties in $({\hat{π}}_{V_{obs}} (c) ‖ π^{*} ‖ π_{A})$ .

Now, the intuition described above is formally defined as follows.

Definition 1 (Vote Privacy [30]).

Let $P$ be a voting protocol, $V_{obs}$ be the voter under observation, and $δ \in [0, 1]$ . Then, $P$ achieves δ-privacy, if for all possible choices $c_{j}$ , $c_{j^{'}}$ and all adversaries $π_{A}$ (implicitly on input $(c_{j}, c_{j^{'}})$ ) the difference $\begin{array}{r} | Pr [{({\hat{π}}_{V_{obs}} (c_{j}) ‖ π^{*} ‖ π_{A})}^{(ℓ)} \mapsto 1] - Pr [{({\hat{π}}_{V_{obs}} (c_{j^{'}}) ‖ π^{*} ‖ π_{A})}^{(ℓ)} \mapsto 1] | \end{array}$ is δ-bounded as a function of the security parameter $1^{ℓ}$ .

In other words, the privacy loss δ is an upper bound of an arbitrary adversary’s advantage to distinguish whether $V_{obs}$ voted for $c_{j}$ or $c_{j^{'}}$ . Clearly, $δ = 0$ would be desirable but typically we have $δ > 0$ , even for an ideal e-voting protocol with a completely passive adversary. The reason is that in many real-world elections, there exist choices which are picked only with low probability, for example unpopular candidates or unreasonable rankings (e.g., the green party is ranked first and the coal mining party next to it). Now, in most e-voting systems, including all systems mentioned in Section 2, the final election result consists of the number of votes for each choice. Therefore, if the voter under observation $V_{obs}$ chooses $c_{j}$ or $c_{j^{'}}$ but all other voters vote for one (or both) of these choices with low probability only, then $V_{obs}$ ’s choice is not hidden sufficiently well – in the worst case, $V_{obs}$ ’s choice is completely revealed.

3.3. Ideal privacy

Since we have seen that the privacy loss δ is typically not perfect, the following questions are obvious: What is the best possible privacy loss that can be achieved in a given election? How does this ideal privacy loss depend on basic parameters, such as the number of voters or the voters’ preferences? These questions have been answered precisely in [30] and we will recall the results in what follows. These results will be the foundation of our formal analysis of replay attacks in Section 4.

Ideal voting protocol. In order to have a lower bound on the privacy loss for all voting protocols, Küsters et al. [30] derived a formula for the privacy loss an ideal voting protocol provides.5

⁵
The ideal privacy loss derived in [30] is formulated for result functions that reveal the complete tally, i.e., number of votes for each choice. Subsequently, a more general ideal privacy loss was derived in [28] which is formulated for arbitrary result functions, including tally-hiding result functions that may, for instance, only reveal the winner but nothing else. Because all e-voting systems mentioned in Section 2 employ a result function which returns the complete tally, we restrict our attention to the ideal privacy loss derived in [30].

Let us describe this ideal voting protocol, denoted by

I_{priv}

, starting with the parameters it depends on:

Number of choices $n_{C}$ : The set $C = {c_{1}, \dots, c_{n_{C}}}$ consists of all possible choices $c_{j}$ that a voter can choose.

Number of honest voters $n_{V}^{h}$ : We denote the number of voters which cannot be corrupted, the honest voters, by $n_{V}^{h}$ .6

⁶

The number of dishonest voters is not relevant for result functions that reveal the full tally because an adversary can derive the “honest” election result by subtracting the dishonest voters’ choices from the final election result.

Voting distribution $\vec{p}$ : Each honest voter $V_{i}$ picks her choice according to the distribution $\vec{p}$ over C, i.e. $\vec{p} [l]$ is the probability that an honest voter chooses $c_{l}$ .7

⁷

In slight abuse of notation we identify $\vec{p}$ and its probability mass function.

The ideal voting protocol $I_{priv} (C, n_{V}^{h}, \vec{p})$ works as follows. For each of the $n_{V}^{h}$ honest voters $V_{i}$ , the ideal voting protocol $I_{priv}$ chooses $V_{i}$ ’s choice according to $\vec{p}$ . For the voter under observation $V_{obs}$ , the ideal voting protocol expects as input a tuple of choices $(c_{j}, c_{j^{'}})$ from the adversary, and then picks one of them uniformly at random. Eventually, $I_{priv}$ returns the result $res \in N^{| C |}$ which contains the number of votes for each choice made by all honest voters and by the voter under observation. The protocol $I_{priv} (C, n_{V}^{h}, \vec{p})$ is formally defined in Fig. 1.

Fig. 1.

Protocol of ideal voting functionality.

Ideal privacy loss. We now recall from [30] how the privacy loss $δ^{ideal}$ of the ideal voting protocol $I_{priv} (C, n_{V}^{h}, \vec{p})$ can be expressed as a parameterized formula $δ_{C, n_{V}^{h}, \vec{p}}^{ideal}$ . Recall that we defined the privacy loss of a voting protocol by the (level of) inability to distinguish whether the voter under observation $V_{obs}$ voted for choice $c_{j}$ or choice $c_{j^{'}}$ (Definition 1). Now, the intuition behind the definition of $δ_{C, n_{V}^{h}, \vec{p}}^{ideal}$ is as follows. If the adversary, given a final election result $res$ , wants to decide whether the observed voter voted for choice $c_{j}$ or $c_{j^{'}}$ , then the best strategy of the adversary is to opt for $c_{j^{'}}$ if and only if the output $res$ is more likely if the voter voted for choice $c_{j^{'}}$ . In order to capture this intuition formally, we introduce the following terminology.

Let $A_{res}^{l}$ denote the conditional probability that the choices made by the honest voters and by the voter under observation yield the final result $res$ , under the condition that the voter under observation $V_{obs}$ chooses $c_{l}$ . The probability $A_{res}^{l}$ can be expressed as follows using the multinomial distribution: $\begin{aligned} (1) & A_{res}^{l} & = \frac{n_{V}^{h}!}{\prod_{j = 1}^{n_{C}} res [j]!} \cdot (\prod_{j = 1}^{n_{C}} \vec{p} {[j]}^{res [j]}) \cdot \frac{res [l]}{\vec{p} [l]} \end{aligned}$ where $res [j]$ is the number of votes for choice $c_{j}$ . We can now define the set of outputs $res$ for which it is more likely that the voter voted for choice $c_{j^{'}}$ as follows: $\begin{aligned} (2) & M_{j, j^{'}}^{*} = {res : A_{res}^{j} ⩽ A_{res}^{j^{'}}} . \end{aligned}$

The intuition of the ideal privacy loss described above is formally captured by the following definition: $\begin{aligned} (3) & δ_{C, n_{V}^{h}, \vec{p}}^{ideal} = max_{j, j^{'} \in {1, \dots, n_{C}}} \sum_{res \in M_{j, j^{'}}^{*}} (A_{res}^{j^{'}} - A_{res}^{j}) . \end{aligned}$

The following theorem (Theorem 3 of [30], proved in Appendix C of the eprint version [31]) states that the loss $δ_{C, n_{V}^{h}, \vec{p}}^{ideal}$ is indeed optimal for the ideal voting protocol $I_{priv} (C, n_{V}^{h}, \vec{p})$ . As a consequence, no voting protocol can achieve a better privacy loss than $δ_{C, n_{V}^{h}, \vec{p}}^{ideal}$ .

Theorem 1 ([30]).

The ideal protocol $I_{priv} (C, n_{V}^{h}, \vec{p})$ achieves a privacy loss of $δ_{C, n_{V}^{h}, \vec{p}}^{ideal}$ . Moreover, it does not achieve $δ^{'}$ -privacy for any $δ^{'} < δ_{C, n_{V}^{h}, \vec{p}}^{ideal}$ .

4. Efficiency analysis based on the KTV vote privacy definition

In this section, we formally study the efficiency of replay attacks using the KTV privacy definition. First, in Section 4.1 we focus on capturing the effect of replay attacks. We define a suitable ideal functionality for a voting protocol whose only flaw is that it allows the adversary to execute a replay attack. We characterise its KTV privacy loss analogously to the characterisation for the truly ideal protocol (without replays) in [30] (Theorem 2). Because this characterisation is not computationally tractable, we then show a reduction to an election with only three candidates (Theorem 3), and obtain a tractable formula which we use to demonstrate the devastating effect of even a small number of replays on realistically-sized example elections.

Based on these insights, in Section 4.2 we then study the efficiency of replay attacks in general: we analyse (Theorem 7) how the ideal privacy loss behaves asymptotically in the number of replayed ballots $n_{repl}$ and the number of honest voters $n_{V}^{h}$ , in particular for fairly small values of $n_{repl}$ .

4.1. Ideal privacy loss

We analyse the ideal privacy loss if the adversary can replay the observed voter’s choice $n_{repl}$ times. To this end, we modify the ideal voting functionality $I_{priv} (C, n_{V}^{h}, \vec{p})$ (Fig. 1) so that it adds the observed voter’s choice $(1 + n_{repl})$ times to the final result, instead of only once. The resulting ideal voting functionality with $n_{repl}$ replays $I_{priv} (C, n_{V}^{h}, \vec{p}, n_{repl})$ is defined in Fig. 2; note that $I_{priv} (C, n_{V}^{h}, \vec{p}) = I_{priv} (C, n_{V}^{h}, \vec{p}, 0)$ . By using the ideal voting functionality $I_{priv} (C, n_{V}^{h}, \vec{p}, n_{repl})$ , we can model exactly that the adversary is able to execute only a replay attack with $n_{repl}$ replays but no other kind of privacy attack. This means that the privacy loss $δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal}$ provided by $I_{priv} (C, n_{V}^{h}, \vec{p}, n_{repl})$ is a lower bound for the privacy loss of any voting protocol in which the adversary can replay the observed voter’s choice $n_{repl}$ times.

Fig. 2.

Protocol of ideal voting functionality which allows for replaying the observed voter’s choice $n_{repl}$ times. The differences between the original ideal voting protocol (Fig. 1) and the one presented here are highlighted in red.

In what follows, we first derive a representation of the ideal privacy loss $δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal}$ which is conceptually similar to the ideal privacy loss without replays $δ_{C, n_{V}^{h}, \vec{p}}^{ideal}$ , as defined in Eq. (1) (Section 3.3). We observe that $δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal}$ is indeed the privacy loss of the ideal voting functionality with replays $I_{priv} (C, n_{V}^{h}, \vec{p}, n_{repl})$ (Theorem 2). We then derive an alternative representation of $δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal}$ which reduces vote privacy (under the KTV definition) from dependence on all $n_{C}$ possible choices to dependence only on the two most unpopular choices (Theorem 3).

First representation. Analogously to $A_{res}^{l}$ (Section 3.3), let $A_{res}^{l, n_{repl}}$ be the probability of obtaining result $res$ under the condition that the voter under observation voted for candidate l, where now her ballot is replayed $n_{repl}$ times. Note that $A_{res}^{l} = A_{res}^{l, 0}$ . It is easy to see that we have $\begin{aligned} A_{res}^{l, n_{repl}} & = \frac{n_{V}^{h}! \cdot p {[1]}^{res [1]} \cdot \dots \cdot p {[l]}^{res [l] - n_{repl} - 1} \cdot \dots \cdot p {[n_{C}]}^{res [n_{C}]}}{res [1] \cdot \dots \cdot (res [l] - n_{repl} - 1) \cdot \dots \cdot res [n_{C}]} \\ = \frac{n_{V}^{h}!}{\prod_{j = 1}^{n_{C}} res [j]!} \cdot (\prod_{j = 1}^{n_{C}} \vec{p} {[j]}^{res [j]}) \cdot \frac{\prod_{ν = 0}^{n_{repl}} (res [l] - ν)}{\vec{p} {[l]}^{n_{repl} + 1}} \\ = \frac{\prod_{ν = 1}^{n_{repl}} (res [l] - ν)}{\vec{p} {[l]}^{n_{repl}}} \cdot A_{res}^{l} \end{aligned}$

Now, analogously to $M_{j, j^{'}}^{*}$ in Section 3.3, we define $\begin{array}{r} M_{j, j^{'}}^{*, n_{repl}} = {res : A_{res}^{j, n_{repl}} ⩽ A_{res}^{j^{'}, n_{repl}}} \end{array}$ to be the set of all events that occur with higher likelihood under the condition that the observed voter chose $c_{j^{'}}$ than under the condition that she chose $c_{j}$ .

We thus obtain our first representation of the ideal privacy loss with $n_{repl}$ replays, as follows: $\begin{aligned} (4) & δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal} = max_{j, j^{'} \in {1, \dots, n_{C}}} \sum_{res \in M_{j, j^{'}}^{*, n_{repl}}} (A_{res}^{j^{'}, n_{repl}} - A_{res}^{j, n_{repl}}), \end{aligned}$ namely as the largest total variance distance between any two conditional contributions corresponding to choices of the voter under observation.

The following theorem states that the loss $δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal}$ is indeed optimal for the ideal voting protocol with $n_{repl}$ replays. This means that no voting protocol which is subject to a replay attack with $n_{repl}$ replays can achieve a better privacy loss than $δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal}$ .

Theorem 2.

The ideal protocol $I_{priv} (C, n_{V}^{h}, \vec{p}, n_{repl})$ achieves a privacy loss of $δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal}$ . Moreover, it does not achieve $δ^{'}$ -privacy for any $δ^{'} < δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal}$ .

The proof of Theorem 2 is exactly the same as the proof of Theorem 1, with $A_{res}^{j}$ replaced by $A_{res}^{j, n_{repl}}$ throughout.

Note that the formula in (4) for $I_{priv} (C, n_{V}^{h}, \vec{p}, n_{repl})$ involves maximising over $O (n_{C}^{2})$ terms, each of which is a sum consisting of $| M_{j, j^{'}}^{*, n_{repl}} |$ summands. In general $| M_{j, j^{'}}^{*, n_{repl}} |$ will have size comparable to the number of possible results, which is a multinomial coefficient of order ${(n_{V}^{h})}^{n_{C} - 1}$ . This is clearly intractable for large elections, so before we can analyse the efficiency of replay attacks for real-world elections, we will need to do some work to put (4) into a more tractable form.

Second representation. We now show that for the ideal functionality of Fig. 2, the definition of the KTV privacy loss can be greatly simplified. Recall that in Definition 1, we measure privacy as the adversary’s maximum advantage over all possible choices $c_{j}$ , $c_{j^{'}}$ to successfully distinguish whether the voter under observation voted for $c_{j}$ or $c_{j^{'}}$ . Our reduction states that in fact this is equal to the adversary’s advantage in distinguishing only between a vote for the least popular choice $c_{j}$ and a vote for the second least popular choice $c_{j^{'}}$ , i.e., those choices for which $\vec{p} [j]$ and $\vec{p} [j^{'}]$ are the two lowest probabilities. This holds both for the cases with and without replay attacks (since the latter is a special case of the former with $n_{repl} = 0$ ).

Theorem 3.

Let j, $j^{'}$ be such that $\vec{p} [j] ⩽ \vec{p} [j^{'}] ⩽ \vec{p} [l]$ for all $l \neq j$ . Then, the ideal privacy loss is defined as the total variance distance between $A_{res}^{j, n_{repl}}$ and $A_{res}^{j^{'}, n_{repl}}$ , i.e. by $\begin{array}{r} δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal} = \sum_{res \in M_{j, j^{'}}^{*, n_{repl}}} (A_{res}^{j^{'}, n_{repl}} - A_{res}^{j, n_{repl}}) . \end{array}$

In order to prove Theorem 3 we will first show the technical Lemma 1. Lemma 1 states that for each choice of j and $j^{'}$ , the corresponding term in the max of equation (4) only depends on three probabilities – $\vec{p} [j]$ , $\vec{p} [j^{'}]$ and one “dummy” probability $\vec{p} [j, j^{'}] = 1 - \vec{p} [j] - \vec{p} [j^{'}]$ that collects the probabilities of all the other choices.

The formula (5) of Lemma 1, combined with Theorem 3, gives an explicit expression for the ideal privacy loss as a sum of at most $n_{V}^{h} n_{repl}$ terms. This means that we are now comfortably able to analyse real-world-sized elections, as we do later in this section and in Section 6.

We will introduce a new variable $T_{t, j, j^{'}}$ which will be used to state Lemma 1 below. Let $j, j^{'} \in C$ . For each $t \in N$ , let $T_{t, j, j^{'}}$ be a natural number that satisfies $\begin{array}{r} \prod_{ν = 0}^{n_{repl}} \frac{r - ν}{\vec{p} {[j]}^{n_{repl} + 1}} ⩽ \prod_{ν = 0}^{n_{repl}} \frac{t - r - ν}{\vec{p} {[j^{'}]}^{n_{repl} + 1}} \end{array}$ for all $r ⩽ T_{t, j, j^{'}}$ and $\begin{array}{r} \prod_{ν = 0}^{n_{repl}} \frac{r - ν}{\vec{p} {[j]}^{n_{repl} + 1}} ⩾ \prod_{ν = 0}^{n_{repl}} \frac{t - r - ν}{\vec{p} {[j^{'}]}^{n_{repl} + 1}} \end{array}$ for $r > T_{t, j, j^{'}}$ . Note that $0 ⩽ T_{t, j, j^{'}} ⩽ t$ certainly exists since $\prod_{ν = 0}^{n_{repl}} \frac{r - ν}{\vec{p} {[j]}^{n_{repl} + 1}}$ is monotonic in r. To simplify the notation we will sometimes omit the j, $j^{'}$ and just write $T_{t}$ .

$T_{t, j, j^{'}}$ describes the point at which the sign of $A_{res}^{j^{'}, n_{repl}} - A_{res}^{j, n_{repl}}$ switches. It allows us to replace the summation over $M_{j, j^{'}}^{*, n_{repl}}$ by a summation over all possible outcomes. We can then use standard properties of probability distributions to remove all but two probabilities.

Lemma 1.

Let $j, j \in C$ and $T_{t}$ as before. Then for $\vec{p} [j, j^{'}] : = 1 - \vec{p} [j] - \vec{p} [j^{'}]$ $\begin{aligned} (5) & \sum_{res \in M_{j, j^{'}}^{*, n_{repl}}} (A_{res}^{j^{'}, n_{repl}} - A_{res}^{j, n_{repl}}) = \sum_{t = 0}^{n_{V}^{h}} \vec{p} {[j, j^{'}]}^{n_{V}^{h} - t} (\binom{n_{V}^{h}}{t}) \sum_{r = max {T_{t + n_{repl} + 1} - n_{repl}, 0}}^{min {T_{t + n_{repl} + 1}, t}} (\binom{t}{r}) \vec{p} {[j]}^{r} \vec{p} {[j^{'}]}^{t - r} . \end{aligned}$

Proof.

We give only a short sketch, with many details relegated to the Appendix in Lemma 3. To simplify notation we write $δ_{j j^{'}}$ for $\sum_{res \in M_{j, j^{'}}^{*, n_{repl}}} (A_{res}^{j^{'}} - A_{res}^{j})$ . First observe that $δ_{j j^{'}} = δ_{j^{'} j}$ since $M_{j, j^{'}}^{*, n_{repl}}$ and $M_{j^{'}, j}^{*, n_{repl}}$ are complimentary up to a trivial intersection that does not contribute to $δ_{j j^{'}}$ or $δ_{j^{'} j}$ . More precisely, writing ${MN}_{n_{V}^{t}}^{res}$ for the multinomial probability density function $\begin{aligned} (6) & {MN}_{n_{V}^{t}}^{res} = \frac{n_{V}^{t}!}{\prod_{j = 1}^{n_{C}} res [j]!} (\prod_{j = 1}^{n_{C}} \vec{p} {[j]}^{res [j]}), \end{aligned}$ we have that $\begin{aligned} \frac{n_{V}^{t}! δ_{j j^{'}}}{n_{V}^{h}!} & = \sum_{res \in M_{j, j^{'}}^{*, n_{repl}}} \frac{\prod_{ν = 0}^{n_{repl}} (res j - ν)}{\vec{p} {[j]}^{n_{repl} + 1}} {MN}_{n_{V}^{t}}^{res} \end{aligned}$ implies that $\frac{n_{V}^{t}!}{n_{V}^{h}!} (δ_{j j^{'}} - δ_{j^{'} j})$ is equal to $\begin{aligned} \sum_{res} (\frac{\prod_{ν = 0}^{n_{repl}} (res j - ν)}{\vec{p} {[j]}^{n_{repl} + 1}} - \frac{\prod_{ν = 0}^{n_{repl}} (res j^{'} - ν)}{\vec{p} {[j^{'}]}^{n_{repl} + 1}}) {MN}_{n_{V}^{t}}^{res} = 0 \end{aligned}$ where the sum is over all possible results (without abstention), i.e. $\sum_{l = 0}^{n_{C}} res [l] = n_{V}^{h} + n_{repl} + 1 = n_{V}^{t}$ . We also used that the two conditional probability distribution (one w.r.t. to the choice j, one w.r.t. to the choice $j^{'}$ ) are each normed. Note that this is just the special case of a more general theorem for the total variance distance. For details see e.g. Proposition 4.2 in [35].

Now $δ_{j j^{'}} = δ_{j^{'} j}$ and $δ_{j j^{'}} = \frac{δ_{j j^{'}} + δ_{j^{'} j}}{2}$ lead to the more symmetric representation $\begin{aligned} δ_{j j^{'}} = & \frac{n_{V}^{h}}{2 \cdot n_{V}^{t}!} \sum_{res} | Δ_{j, j^{'}}^{res, n_{repl}} | \cdot {MN}_{n_{V}^{t}}^{res} \end{aligned}$ with $Δ_{j, j^{'}}^{res, n_{repl}} = \frac{\prod_{ν = 0}^{n_{repl}} (res [j] - ν)}{\vec{p} {[j]}^{n_{repl} + 1}} - \frac{\prod_{ν = 0}^{n_{repl}} (res [j^{'}] - ν)}{\vec{p} {[j^{'}]}^{n_{repl} + 1}}$ . By moving the $\vec{p} [j]$ , $\vec{p} [j^{'}]$ out of the product we can sum over all $res [l]$ for $j \neq l \neq j^{'}$ under the restriction that $\sum_{j \neq l \neq j^{'}} res [l] = n_{V}^{t} - t$ for $t = res [j] + res [j^{'}]$ . Furthermore, we can then consider the multinomial distribution with $n_{V}^{t} - t$ trials and $(n_{C} - 2)$ probabilities $\vec{q} : = {(\vec{p} {[j, j^{'}]}^{- 1} \vec{p} [l])}_{j \neq l \neq j^{'}}$ . Using the norm 1 property of this probability distribution, we get a term that depends only on j and $j^{'}$ . $\begin{array}{rcl} δ_{j j^{'}} & = & \sum_{t = 0}^{n_{V}^{t}} \sum_{res [j] + res [j^{'}] = t} | Δ_{j, j^{'}}^{res, n_{repl}} | \cdot \frac{n_{V}^{h}! \vec{p} {[j]}^{res [j]} \vec{p} {[j^{'}]}^{res [j^{'}]}}{2 \cdot (n_{V}^{t} - t)! res [j]! res [j^{'}]!} \\ \cdot \sum_{\sum_{j \neq l \neq j^{'}} res [l] = n_{V}^{t} - t} \frac{\vec{p} {[j, j^{'}]}^{n_{V}^{t} - t} (n_{V}^{t} - t)!}{\prod_{j \neq l \neq j^{'}} res [l]} \prod_{j \neq l \neq j^{'}} \vec{q} {[l]}^{res [l]} \\ (7) & = & \sum_{t = 0}^{n_{V}^{t}} \sum_{r = 0}^{t} | Δ_{j, j^{'}}^{res, n_{repl}} | \cdot \frac{n_{V}^{h}! \vec{p} {[j, j^{'}]}^{n_{V}^{t} - t} \vec{p} {[j]}^{r} \vec{p} {[j^{'}]}^{t - r}}{2 \cdot (n_{V}^{t} - t)! r! (t - r)!} \end{array}$ where $Δ_{j, j^{'}}^{res, n_{repl}}$ is defined as before with $res [j] = r$ , $res [j^{'}] = t - r$ . Finally the definition of $T_{t}$ allows us to replace the absolute value to retrieve (5). For more details see Lemma 3. □

Proof of Theorem 3.

By definition $δ_{j j^{'}}$ is continuous and piecewise-differentiable in $\vec{p} [j] + \vec{p} [j^{'}]$ . Hence its representation in (5) has the same properties and it is enough to compute the differential on intervals where $T_{t, j, j^{'}}$ is constant. After differentiating (5) w.r.t. $\vec{p} [j] + \vec{p} [j^{'}]$ we see that $δ_{j j^{'}}$ decreases, i.e. becomes maximal if $\vec{p} [j] + \vec{p} [j^{'}]$ is minimal. For a more detailed version, see Appendix A. □

Fig. 3.

KTV privacy loss δ for the ideal protocol with 10 candidates and the uniform vote distribution. Note that the y-axis is on a log scale.

In Fig. 3, we give some concrete values of the ideal privacy loss with replays $δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal}$ for different numbers of honest voters $n_{V}^{h}$ and replays $n_{repl}$ . We model an election in which the votes of the honest voters are uniformly distributed; this is the distribution that minimises δ, and for any other distribution the privacy loss would be even greater. Observe from Fig. 3 that even if the adversary replays the observed voter’s choice only very few times in relation to the total number of voters, the observed voter’s privacy can be reduced dramatically (corresponding to a dramatically higher value of δ). As we will prove in the remainder of this section, this is no coincidence. In fact, we will show that replay attacks are very efficient in general.

4.2. Asymptotics

In the first part of our analysis, we focused on the effect of replay attacks: our results in Section 4.1 state which privacy loss can be achieved ideally if an adversary replays the observed voter’s choice $n_{repl}$ times. Based on these results, we now precisely analyze the efficiency of replay attacks, i.e., how vote privacy decreases asymptotically depending on the number of replays $n_{repl}$ .

Our main result on the efficiency of replay attacks is Theorem 7. Its proof is based on the explicit representation of the ideal privacy loss from Theorem 1. In what follows, we will first deduce Theorem 7 and eventually illustrate it for specific settings.

We use the terminology introduced in Section 4.1. We remark first, that one can choose $\begin{matrix} ⌊ \frac{t \vec{p} [j]}{\vec{p} [j] + \vec{p} [j^{'}]} ⌋ ⩽ T_{t} ⩽ ⌊ \frac{t \vec{p} [j]}{\vec{p} [j] + \vec{p} [j^{'}]} ⌋ + n_{repl} + 1 \end{matrix}$ for $\vec{p} [j] ⩽ \vec{p} [j^{'}]$ . In particular, the coefficients in the inner sum surround the expected value $E (X_{j}) : = \frac{t \vec{p} [j]}{\vec{p} [j] + \vec{p} [j^{'}]}$ . Now we can use the Integral Limit DeMoivre–Laplace theorem for multinomial distributions (see, e.g., [22]) to represent the asymptotic behaviour in terms of the multivariate Gaussian over the hypersurface given by the condition $\sum_{i = 1}^{3} r_{i} = n_{V}^{h}$ for $r_{1} = r$ , $r_{2} = t - r, r_{3} - n_{V}^{h} - t$ as follows: $\begin{aligned} \sum_{t = 0}^{n_{V}^{h}} \vec{p} {[j, j^{'}]}^{n_{V}^{h} - t} (\binom{n_{V}^{h}}{t}) \sum_{r = max {T_{t + n_{repl} + 1} - n_{repl}, 0}}^{min {T_{t + n_{repl} + 1}, t}} {MN}_{t}^{r, t - r} \\ \approx \frac{\sqrt{\vec{q} [j] \vec{q} [j^{'}] \vec{q} [j, j^{'}]}}{2 π n_{V}^{h} \sqrt{\vec{p} [j] \vec{p} [j^{'}] \vec{p} [j, j^{'}]}} \int_{0}^{n_{V}^{h}} \int_{E (X_{j}) - \frac{n_{repl} + 1}{2}}^{E (X_{j}) + \frac{n_{repl} + 1}{2}} e^{- \sum_{i = 1}^{3} x_{i}^{2} (r, t)} d r d t \end{aligned}$ where we used $\begin{array}{c} x_{1} (r, t) = \frac{r - n_{V}^{h} \vec{p} [j]}{\sqrt{n_{V}^{h} \vec{p} [j] \vec{q} [j]}}, x_{2} (r, t) = \frac{t - r - n_{V}^{h} \vec{p} [j^{'}]}{\sqrt{n_{V}^{h} \vec{p} [j^{'}] \vec{q} [j^{'}]}}, \\ x_{3} (r, t) = \frac{n_{V}^{h} - t - n_{V}^{h} \vec{p} [j, j^{'}]}{\sqrt{n_{V}^{h} \vec{p} [j, j^{'}] \vec{q} [j, j^{'}]}}, \vec{q} = 1 - \vec{p} . \end{array}$

We can isolate the terms for r and t to get $\begin{array}{r} \sqrt{\frac{\vec{q} [j, j^{'}]}{2 π n_{V}^{h} \vec{p} [j] \vec{p} [j^{'}]}} \int_{- \frac{n_{repl} + 1}{2}}^{\frac{n_{repl} + 1}{2}} e^{- r^{2} \frac{\vec{q} [j, j^{'}]}{2 π n_{V}^{h} \vec{p} [j] \vec{p} [j^{'}]}} d r \approx (n_{repl} + 1) \sqrt{\frac{\vec{q} [j, j^{'}]}{2 π n_{V}^{h} \vec{p} [j] \vec{p} [j^{'}]}} + O ({(\frac{n_{repl} + 1}{\sqrt{n_{V}^{h}}})}^{3}) \end{array}$ Since the remaining term $\begin{array}{r} \frac{1}{\sqrt{2 π n_{V}^{h} \vec{p} [j, j^{'}] \vec{q} [j, j^{'}]}} \int_{- n_{V}^{h} \vec{q} [j, j^{'}]}^{n_{V}^{h} \vec{p} [j, j^{'}]} e^{- t^{2} / (2 π n_{V}^{h} \vec{p} [j, j^{'}] \vec{q} [j, j^{'}])} d t \end{array}$ converges to 1 for $n_{V}^{h} \to \infty$ , our approximation (8) describes the asymptotics of the whole term.8

⁸
Our series approximation becomes weak for $n_{repl}^{2} ⩾ n_{V}^{h}$ . Obviously $δ_{j j^{'}}$ is bounded by 1.

From what we have shown above, we obtain the following central result.

Theorem 4 (Asymptotics).

Let C, $n_{V}^{h}$ , $\vec{p}$ , $n_{repl}$ be as above. Let $n_{repl} = o (\sqrt{n_{V}^{h}})$ . 9

⁹
Observe that $n_{repl} ≪ \sqrt{n_{V}^{h}}$ covers the interesting cases because the privacy loss is obviously close to 1 for large $n_{repl}$ .

Then, we have that

\begin{aligned} (8) & δ_{C, n_{V}^{h}, \vec{p}}^{ideal} \sim \frac{n_{repl} + 1}{\sqrt{n_{V}^{h}}} . \end{aligned}

Intuitively, Theorem 7 essentially states that the KTV privacy loss of an election with $n_{V}^{h}$ honest voters and $n_{repl}$ replays is the same as that of an election with $\frac{n_{V}^{h}}{{(n_{repl} + 1)}^{2}}$ voters but no replays. Loosely speaking, by replaying the targeted voter’s choice $n_{repl}$ times, it is as though the adversary could “reduce” the number of honest voters from $n_{V}^{h}$ down to $\frac{n_{V}^{h}}{{(n_{repl} + 1)}^{2}}$ . This is perhaps a more intuitive way to evaluate privacy loss than a change in a numerical measure which may be difficult to interpret without context (although we should emphasise that it is equally dependent on the threat model embedded in the KTV definition).

To give some concrete examples, again with 10 candidates and a uniform distribution: in an election with 10 honest and no corrupted voters (and hence no replays possible) the adversary wins the privacy game with probability $δ = 0.533$ . In an election with 100 voters, the adversary needs to control as few as 3 out of 100 voters (and submit replays on their behalf) in order to have a similar advantage $> \frac{1}{2}$ in the privacy game. In an election with 1000 voters, as few as 9 out of 1000 voters suffice for the same purpose. At the same time, in the last two elections with 100 respectively 1000 voters, vote privacy is mostly preserved without replays ( $δ = 0.177$ and $δ = 0.056$ ). Altogether, we can conclude that replay attacks can be devastating even if the adversary controls only a tiny fraction of all voters.

5. Strong vote privacy

An important limitation of the KTV privacy definition is that it only considers vulnerability with respect to a very specific goal, namely for the adversary to guess between two possible votes (which earns it a rating of ‘too limited’ in the survey article [5]), and a similar limitation applies to most game-based definitions. Various works have sought to address this by considering entropy-based privacy definitions, most notably [6] and [36] (we note that the game-based vote privacy definitions in the line of [5], which are often used to formally analyse vote privacy – see, e.g., [5,10,13,26] – reduce to the entropy-based approach, as proven in [5]). In this section we will show that a simple extension of the KTV definition, which we term ‘strong vote privacy’, is equivalent to a computational version of a strong entropy-based definition.

5.1. Related work

In [6] at CCS 2012, the authors propose a family of entropy-based privacy definitions, parametrised by a number of modeling choices that must be made: firstly, we fix a distribution on the votes, both of the observed voter and the innocent third parties; secondly, we fix a ‘target function’ that the adversary is interested in learning, for example the observed voter’s vote; thirdly, we must choose an ‘entropy notion’ to measure the adversary’s success in learning the target information, for example the ‘average min-entropy’, which measures the adversary’s ability to guess the value of the target function with a single guess. The privacy measure is then the posterior vulnerability of the target function with respect to the chosen entropy notion (for the examples just mentioned, that is the probability that the adversary will correctly guess the observed voter’s vote after seeing the election). The setting of a computationally bounded adversary is dealt with indirectly, by saying that the vulnerability of an output distribution is the minimum vulnerability among distributions computationally indistinguishable from the true distribution.

The assumption that we know in advance the voting behaviour of the innocent third-party voters may well be reasonable – we can estimate this based on opinion polls and the results of previous elections – and the same assumption is made both in the KTV definition and in our definition below. The same assumption for the observed voter is more problematic, since the adversary may well choose to attack a highly atypical voter about whom they possess side-information (for example, someone who is known or suspected to belong to an opposition group). Note, however, that some such assumption seems inevitable in any measure focusing exclusively on posterior vulnerability, since in the case that the observed voter follows a point distribution (i.e. the adversary has total knowledge of her vote) any target function will be totally compromised.

The choice of target function may also be far from straightforward: while the observed voter’s vote is certainly very reasonable, it may not be the only thing the adversary could care about. For example, if candidates are grouped into parties the adversary may only care about which party a voter voted for, or in the case of ranked choice voting the adversary may care about who the voter ranked first. Again some assumption of this kind seems necessary in any posterior vulnerability measure, since for instance a target which is a constant function will trivially be compromised.

Finally, the choice of entropy notion may not be entirely clear, but we agree with the use of average min-entropy because of its clear operational interpretation as the adversary’s best single guess for the target.

The idea of the adversary’s target function is further developed in [36], which considers the voting system as a communication channel from the voters to the result, and applies ideas from the theory of Quantified information flow, in particular the notion of g-leakage. The idea is to define a set $W$ of possible guesses for the adversary, and then a gain function $g : W \times C \to [0, 1]$ quantifying the adversary’s reward for making guess $w \in W$ where the true choice was $c \in C$ . We can then define vulnerability with respect to g as the maximum possible expected payoff for a single guess by the adversary, and the g-leakage as the ratio of the vulnerabilities before and after seeing the tally. The authors show how to represent natural targets for the adversary (such as a specific voter’s vote, or the number of voters whose vote the adversary can guess) by suitable gain functions and illustrate this for small toy example elections.

However, the framework of [36] also has a number of important limitations. It does not attempt at all to consider either adversaries who may interact during the protocol, or computationally bounded adversaries. It is still parametrised by choices of both gain function and vote distribution. Indeed, this is some sense inevitable in that we cannot hope to quantify loss of privacy by the overall capacity of the channel (which would be the supremum over possible gain functions and vote distributions), because the tally does reveal substantial information about the joint distribution of the voters’ votes – this is the whole point of running an election!

5.2. Strong vote privacy

The key idea for our definition is to think of the voting system not as a channel from the votes of all the voters, but rather as a noisy channel from the vote of the observed voter, with noise coming from the random votes of the innocent third-party voters. This means that we can measure the loss of vote privacy as the min-entropy capacity of this channel, and so we can consider the maximum over all possible gain functions and all possible priors on the observed voter.

In order to allow for interactive and computationally bounded adversaries, we will phrase our definition not in the language of channels and information flow, but directly using the operational interpretation of min-entropy, in terms of the maximum advantage that can be gained by a computationally bounded interactive adversary – essentially an interactive and computational version of g-leakage as discussed above.

We consider all possible gain functions g and all possible priors π the adversary may have for the voter’s actions. We then define the privacy loss to be the maximum possible (multiplicative) increase in expected payoff for an adversary who is permitted to mount an attack on the system, compared to one who is not and just makes a guess based on his prior.

Definition 2 (Strong vote privacy).

Let $P$ be a voting protocol, $V_{obs}$ be the voter under observation, and $δ \in [0, 1]$ . Then, $P$ achieves strong δ-privacy, if for all possible probability distributions π over the set of choices $C$ , all finite sets $W$ and ‘gain functions’ $g : C \times W \to [0, 1]$ and all adversaries $π_{A}$ the ratio $\begin{array}{r} \frac{\sum_{i, w} π (c_{i}) Pr [{({\hat{π}}_{V_{obs}} (c_{i}) ‖ π^{*} ‖ π_{A})}^{(ℓ)} \mapsto w] g (c_{i}, w)}{{max}_{w} \sum_{i} π (c_{i}) g (c_{i}, w)} \end{array}$ is $(1 + δ)$ -bounded as a function of the security parameter $1^{ℓ}$ .

Remarkably, it turns out that this is equivalent to a simple strengthening of the KTV privacy definition, which may therefore (depending on the reader’s tastes) alternatively be seen as the primary definition:

Definition 3 (Strong vote privacy, II).

Let $P$ be a voting protocol, $V_{obs}$ be the voter under observation, and $δ \in [0, 1]$ . Then, $P$ achieves strong δ-privacy, if for all adversaries $π_{A}$ the sum $\begin{array}{r} \sum_{i} Pr [{({\hat{π}}_{V_{obs}} (c_{i}) ‖ π^{*} ‖ π_{A})}^{(ℓ)} \mapsto c_{i}] \end{array}$ is $(1 + δ)$ -bounded as a function of the security parameter $1^{ℓ}$ .

The proof of this equivalence, which will be the main theorem of this section, is essentially a computational and interactive version of the ‘miracle’ theorem of QIF, Theorem 5.1 of [2]. We first note that Definition 3 is indeed an extension of the KTV definition:

Proposition 1.
KTV vote privacy (Definition 1 ) is equivalent to Definition 3 with the adversary $π_{A}$ restricted to two possible outputs.
Proof.
Let $π_{A}$ have outputs ${c_{j}, c_{j^{'}}}$ . Then $\begin{aligned} \sum_{i} Pr [{({\hat{π}}_{V_{obs}} (c_{i}) ‖ π^{} ‖ π_{A})}^{(ℓ)} \mapsto c_{i}] \\ = Pr [{({\hat{π}}_{V_{obs}} (c_{j}) ‖ π^{} ‖ π_{A})}^{(ℓ)} \mapsto c_{j}] + Pr [{({\hat{π}}_{V_{obs}} (c_{j^{'}}) ‖ π^{} ‖ π_{A})}^{(ℓ)} \mapsto c_{j^{'}}] \\ = Pr [{({\hat{π}}_{V_{obs}} (c_{j}) ‖ π^{} ‖ π_{A})}^{(ℓ)} \mapsto c_{j}] + (1 - Pr [{({\hat{π}}_{V_{obs}} (c_{j^{'}}) ‖ π^{} ‖ π_{A})}^{(ℓ)} \mapsto c_{j}]) \\ = Pr [{({\hat{π}}_{V_{obs}} (c_{j}) ‖ π^{} ‖ π_{A})}^{(ℓ)} \mapsto c_{j}] - Pr [{({\hat{π}}_{V_{obs}} (c_{j^{'}}) ‖ π^{} ‖ π_{A})}^{(ℓ)} \mapsto c_{j}] + 1, \end{aligned}$ as required. □

It trivially follows that the KTV privacy loss is a lower bound for the strong privacy loss, and that for two-candidate elections the definitions are equivalent.

We now establish the main theorem: Theorem 5.
Definitions* 2 and 3 are equivalent.
Proof.
Trivially Definition 2 implies Definition 3 (take $W = C$ , π the uniform distribution and $g (c, c^{'}) = 1$ if $c^{'} = c$ and 0 otherwise).

To prove the converse implication, let $W$ and g be fixed, and let $π_{A}$ be an adversary for which the quantity in Definition 2 exceeds $1 + δ^{'} > 1 + δ$ infinitely often. Write $A^{(ℓ)} (c_{i}, w) = Pr [{({\hat{π}}_{V_{obs}} (c_{i}) ‖ π^{} ‖ π_{A})}^{(ℓ)} \mapsto w]$ .

Our task is now to construct an adversary $π_{A^{'}}$ for which the quantity in Definition 3 exceeds $1 + δ^{″}$ infinitely often, for some $δ^{″} > δ$ . The general idea is for $π_{A^{'}}$ to imitate $π_{A}$ , except that at the end when $π_{A}$ would output $w \in W$ , we will have $π_{A^{'}}$ output the $c_{i}$ which maximises $A^{(ℓ)} (c_{i}, w)$ .

Note, however, that the values of $A^{(ℓ)} (c_{i}, w)$ are an infite family of data (parametrised by ℓ), and so it is not possible to ‘hard-code’ them into the finite specification of the adversary $π_{A^{'}}$ . It is therefore necessary for $π_{A^{'}}$ to estimate them on-the-fly, by simulating the behaviour of ${({\hat{π}}_{V_{obs}} (c_{i}) ‖ π^{} ‖ π_{A})}^{(ℓ)}$ for each $c_{i}$ .

Define the adversary $π_{A^{'}}$ as follows: first simulate ${({\hat{π}}_{V_{obs}} (c_{i}) ‖ π^{} ‖ π_{A})}^{(ℓ)}$ with $ℓ^{2} | W |$ trials for each $c_{i}$ to obtain estimates $\tilde{A} (c_{i}, w)$ for $A^{(ℓ)} (c_{i}, w)$ . By the Chernoff bound on the sample mean we have that $| \tilde{A} (c_{i}, w) - A^{(ℓ)} (c_{i}, w) | < 1 / | W | \sqrt{ℓ}$ with probability at least $1 - 2^{- ℓ}$ . Then $π_{A^{'}}$ behaves as $π_{A}$ (in the real run of the protocol), and when $π_{A}$ would output w, $π_{A^{'}}$ outputs the $c_{i}$ which maximises $\tilde{A} (c_{i}, w)$ .

By the definition of $π_{A}$ we have that infinitely often $\begin{aligned} (1 + δ^{'}) max_{w} \sum_{i} π (c_{i}) g (c_{i}, w) & ⩽ \sum_{i, w} π (c_{i}) A^{(ℓ)} (c_{i}, w) g (c_{i}, w) \\ ⩽ \sum_{w} [(max_{i} A^{(ℓ)} (c_{i}, w)) \sum_{i} π (c_{i}) g (c_{i}, w)] \\ ⩽ \sum_{w} [(max_{i} A^{(ℓ)} (c_{i}, w)) (max_{w} \sum_{i} π (c_{i}) g (c_{i}, w))] \\ = (\sum_{w} max_{i} A^{(ℓ)} (c_{i}, w)) (max_{w} \sum_{i} π (c_{i}) g (c_{i}, w)), \end{aligned}$ and hence we have that $\sum_{w} {max}_{i} A^{(ℓ)} (c_{i}, w) ⩾ 1 + δ^{'}$ for infinitely many ℓ.

Writing $ϕ (w)$ for the $c_{i}$ which maximises $\tilde{A} (c_{i}, w)$ , we have for these ℓ $\begin{aligned} \sum_{i} Pr [{({\hat{π}}_{V_{obs}} (c_{i}) ‖ π^{} ‖ π_{A^{'}})}^{(ℓ)} \mapsto c_{i}] & = \sum_{i} \sum_{w} A^{(ℓ)} (c_{i}, w) 1_{c_{i} = ϕ (w)} \\ = \sum_{w} A^{(ℓ)} (ϕ (w), w) \\ ⩾ \sum_{w} (\tilde{A} (ϕ (w), w) - 2^{- ℓ} - 1 / | W | \sqrt{ℓ}) \\ = \sum_{w} max_{i} \tilde{A} (c_{i}, w) - (| W | 2^{- ℓ} + 1 / \sqrt{ℓ}) \\ ⩾ \sum_{w} max_{i} A^{(ℓ)} (c_{i}, w) - 2 (| W | 2^{- ℓ} + 1 / \sqrt{ℓ}) \\ ⩾ 1 + δ^{'} - 2 (| W | 2^{- ℓ} + 1 / \sqrt{ℓ}) \overset{ℓ \to \infty}{\to} 1 + δ^{'} . \end{aligned}$ □

5.3. Monte Carlo estimation

Precise analysis of strong vote privacy for the ideal functionality discussed above is rather less straightforward than for the KTV definition (partly since unlike the latter it depends on the entire distribution of the honest voters, rather than only with respect to the two least popular candidates). However, for the ideal replay attack functionality in which the adversary’s only action is to make a guess based on the output tally it is possible to obtain fairly accurate estimates by Monte Carlo methods, as we now show. The first observation is that the optimal output is one which can be easily simulated.

Proposition 2.
For a protocol for which the adversary’s output is a function on some finite set of tallies T, the sum in Definition 3 is maximised by the ‘maximum likelihood adversary’, which on tally t outputs the vote $c_{i}$ which maximises $p_{T | C} (t | c_{i})$ (breaking ties arbitrarily).
Proof.
We have $\begin{aligned} \sum_{i} Pr [{({\hat{π}}_{V_{obs}} (c_{i}) ‖ π^{} ‖ π_{A})}^{(ℓ)} \mapsto c_{i}] & = \sum_{i} \sum_{t} p (t | c_{i}) 1_{π_{A} (t) = c_{i}} \\ = \sum_{t} p (t | π_{A} (t)) ⩽ \sum_{t} max_{c} p (t | c) \\ = \sum_{i} Pr [{({\hat{π}}_{V_{obs}} (c_{i}) ‖ π^{} ‖ π_{\tilde{A}})}^{(ℓ)} \mapsto c_{i}] \end{aligned}$ as required, where $π_{\tilde{A}}$ is the maximum likelihood adversary.

Note that the maximum likelihood adversary can be easily implemented for the ideal protocol of the previous section: for an election with k candidates, $n_{repl}$ replay voters, and honest voters who cast their votes with probabilities $(p_{1}, \dots, p_{k})$ , for a tally $t = (m_{1}, \dots, m_{k})$ we have $\begin{aligned} p (t | c_{i}) & = (\binom{m_{1} + \dots + m_{k} - (n_{repl} + 1)}{m_{1}, \dots, m_{i - 1}, m_{i} - (n_{repl} + 1), m_{i + 1}, \dots, m_{k}}) \\ \cdot p_{1}^{m_{1}} \dots p_{i - 1}^{m_{i - 1}} p_{i}^{m_{i} - (n_{repl} + 1)} p_{i + 1}^{m_{i + 1}} \dots p_{k}^{m_{k}} . \end{aligned}$ By computing this for each $c_{i}$ , we can find $π_{\tilde{A}} (t)$ for given t.

Now observe that if $c_{i}$ is uniformly distributed, and t is drawn according to $p (\cdot | c_{i})$ then we have that $1_{π_{\tilde{A}} (t) = c_{i}} \sim Bernoulli ((1 + δ) / k)$ , where δ is the privacy loss of Def. 3.

Then to estimate δ we repeatedly sample $c_{i}$ uniformly at random and simulate a tally t with the observed voter (hence also the replay voters) voting for $c_{i}$ , and then check whether $π_{\tilde{A}} (t) = c_{i}$ . If this occurs with frequency ρ then $k ρ - 1$ is an unbiased estimator for δ with standard error at most $(1 + δ) / \sqrt{n}$ (where n is the number of trials). □

Figure 4 shows the privacy loss of the ideal functionality for an election with 10 candidates and various numbers of honest and replay voters (with the honest voters voting according to the uniform distribution). Figure 5 shows a direct comparison of our definition with the KTV definition, for elections with between 2 and 10 candidates. All estimates in this section and in Section 6 are with 10,000,000 trials, so standard error $< 0.0005$ .

Fig. 4.
Strong privacy loss δ for the ideal protocol with 10 candidates and the uniform vote distribution.

Fig. 5.
Comparison of privacy losses for strong privacy and KTV privacy, for an election with 10000 honest voters, 10 replays and 2–10 candidates, uniform distribution.
6. Analysis of real-world elections

In order to complement our formal analysis, we study how replay attacks would scale in practical elections. We therefore apply our formal results to publicly available data of political elections in Estonia, Germany, the UK, and the USA. In this way, we can realistically simulate to which degree vote privacy would decrease if in such elections replay attacks had been executed. Our “field test” confirms the gist of our abstract results: even if the number of replays is very low, vote privacy can be undermined significantly.

In the remainder of this section, we first discuss the modeling assumptions used for our analysis (Section 6.1), then describe our real-world examples and the results of our simulations (Section 6.2). Finally we discuss these results, explain why they confirm our theoretical analysis, and elaborate on the consequences of our new insights (Section 6.3).

6.1. Modeling assumptions

Throughout all of our analysis we, like Küsters, Truderung and Vogt in [30], assume that the adversary’s knowledge about the actions of the honest voters is represented by a vector of vote probabilities $\vec{p}$ , with the individual voters’ votes drawn independently according to this distribution (other prior works in the literature [6,36] base their examples on the simple but unrealistic assumption of a uniform prior). In applying this analysis to real-world elections, there are two similar-seeming but conceptually distinct issues to address: whether it is reasonable to think that the adversary knows this probability distribution, and how we as analysts estimate this distribution in order to perform the privacy analysis.

An important feature of all the elections we consider is that they are national elections for which results are published at the constituency or even per-polling station level. This means that if voters were nationally homogeneous then the adversary could easily discover the vote distribution $\vec{p}$ by just averaging the results of the constituencies other than the one in which he executed the replay attack.

Of course, in reality electorates are almost never nationally homogeneous, and there will be systematic variation between North and South, East and West, rich and poor regions, and so on. The adversary’s goal, therefore, is to estimate the distribution $\vec{p}$ for the specific constituency (or polling district) he has attacked, using only the results from other constituencies (and perhaps also results from previous elections). Fortunately for him, there is a well-established technique in political science, called Multilevel Regression and Poststratification (MRP) [21], to predict the local result based on a combination of national polls, local demographic factors and previous results. It is difficult for us to make quantitative statements about the accuracy that could be achieved, since political scientists are generally interested in making predictions before the election using opinion polls rather than with access to actual results outside the target constituency (and a full implementation would be far beyond the scope of this paper). Howerver, recent examples (e.g. [33]) are able to obtain local-level predictions with typical error comparable to the sampling uncertainty of national opinion polls, so it seems reasonable to expect (or at least fear) that with access to the actual national results (rather than only polls) fairly precise estimates could be obtained of the local underlying vote distribution.

The second task, for us as analysts to estimate the vote probabilities in order to perform the analysis, is considerably simpler. Unlike the adversary we have access to the actual results in the relevant area (unpolluted by replay attacks), and so we can use the proportion of votes cast for each candidate as an unbiased estimator of the underlying vote probabilities.

6.2. Examples

We use public data from political elections in Estonia, Germany, the UK, and the USA, to simulate the potential privacy loss if these elections had been conducted using an e-voting scheme vulnerable to replay attacks.10

¹⁰
We published our implementation at http://hdl.handle.net/10993/51209.

In each of these elections, the partial election result of each polling station/area was published. We use these partial results to analyse the efficiency of replay attacks because it is reasonable to assume that an adversary knows in which partial result a targeted voter’s choice is included. For each election, we chose a polling station/area where the number of votes was close to the overall average of votes per polling station/area. Our results are summarized in Fig. 6.

Fig. 6.

Ideal privacy losses with $n_{repl} = 0, 1, 5, 10$ replays based on real election data from Examples 1–4. “KTV” denotes KTV privacy definition and “SP” denotes strong privacy definition.

Example 1 (Estonia, Riigikogu Election 2019).

In the Riigikogu (parliamentary) elections in 2019, 561,141 votes were cast in total. The number of polling stations was 451, which results in 1,244 votes per polling stations on average. In this example, we choose polling station S53P in Mustamäe linnaosa where 1,404 valid votes were cast.11

¹¹
See https://rk2019.valimised.ee/en/voting-result/local-municipality-0482-voting-result.html (accessed 11.04.2022).

The public partial election result at this polling station was

(14, 233, 22, 82, 9, 210, 31, 702, 5, 92, 4)

.12

¹²

The public election result is even more fine-grained because the number of votes per candidate on each party list is revealed. We aggregated the number of votes for each party list and consider adversaries who merely know the aggregated result; this makes our overall argument only stronger.

Example 2 (Germany, Landtag Election 2021).

In the Landtag (parliamentary) election in the state of Rhineland-Palatinate in 2021, 1,922,579 votes were cast in total. The number of polling stations was (roughly) 2,300, which results in $< 836$ votes per polling station on average. In this example, we choose polling station Pluwig where 855 votes were cast.13

¹³
See https://www.wahlen.rlp.de/de/ltw/wahlen/2021/ergebnisse/2242350410700.html (accessed 11.04.2022).

The public partial election result at this polling station was

(291, 253, 34, 35, 141, 27, 74)

Example 3 (UK, EU Referendum 2016).

In the EU referendum in the UK 2016, 33,551,983 votes were cast in total. The number of areas was 382, which results in 87,832 votes per area on average. In this example, we choose the area of Kingston-upon-Thames (London) where 85,270 votes were cast.14

¹⁴
See https://www.electoralcommission.org.uk/who-we-are-and-what-we-do/elections-and-referendums/past-elections-and-referendums/eu-referendum (accessed 11.04.2022).

The public partial election result at this polling station was

(52533, 32737)

Example 4 (USA, Presidential Election 2020).

For the US presidential election in 2020, we were not able to determine the number of polling stations nationwide, so we focused on the results of one state, namely Massachusetts. Here, the average number of votes per polling station was (roughly) 1,500. In this example, we choose the polling station for Precinct 13 of Ward 1 in Boston, where 1,430 votes were cast.15

¹⁵
See https://electionstats.state.ma.us/elections/view/140751/filter_by_county:Suffolk (accessed 11.04.2022).

The public partial election result at this polling station was

(995, 404, 12, 5, 9)

6.3. Discussion

Observe that for the Estonian, German and US elections, which have many candidates, even a very small number of replays would have a devastating impact on vote privacy. For example, in the Estonian election even a single replay already has a substantial effect, and with only 5 replays privacy is almost completely lost (similarly for Germany and the US 1–5 replays compromise privacy, and 5–10 destroy it completely).

On the other hand, in the UK Brexit referendum, which has only two ‘candidates’ and far more votes at the most granular reporting level, we see that the effect of up to 10 replays is far less. This example also illustrates most clearly the result of Theorem 7 that the KTV privacy loss scales approximately proportionately to $n_{repl} + 1$ , which we also see in the small-δ regions of the other examples (we also see the consequence of Proposition 1 that for a two-candidate election the KTV and strong privacy losses agree, apart from stochastic sampling error).

The referendum results should not make us too complacent, however, because in fact the number of replays required to obtain a KTV privacy loss $δ > 1 / 2$ is just 196 – equivalent to just $0.2 %$ of the total number of votes.

It is interesting to compare the results for the US election with the UK results, because these are both elections for which the vast majority of votes went to just two candidates, but the privacy loss for the US is much greater than for the UK (by a factor of approximately 70). The number of votes in the UK example exceeds that in the US by a factor of approximately 60; Theorem 7 tells us that the KTV privacy loss δ scales as $1 / \sqrt{n_{V}^{h}}$ , and so the discrepancy in total votes predicts a ratio of approximately only $\sqrt{60} \approx 8$ , leaving around a factor of 10 still to explain. The reason for this remaining difference is that both the KTV and strong privacy definitions are heavily (in the case of strong privacy) or entirely (in the case of KTV privacy) influenced by the least popular candidates, and so the fact that the US election has a few very unpopular candidates has a large effect.

It may seem odd or even undesirable that the measured loss of privacy should be so heavily influenced by the small number of voters who support minority parties. However, we would argue that ballot privacy must mean privacy for all voters. Indeed, it may well be supporters of unpopular candidates who are at most risk of stigmatisation or reprisals; note that the adversary does not have to choose the targeted voter at random, but can choose to target someone they already suspect of supporting a minority position.

7. Approximate prior knowledge

In the foregoing, we have assumed that the behaviour of the honest voters is modelled by the voters independently selecting their votes according to a probability distribution which is known to the adversary; this means that the total of the honest voter’s votes is a multinomially-distributed random variable with parameters known to the adversary. In reality, however, the adversary is unlikely to have precise knowledge of the ‘true’ underlying vote distribution, but instead will have to estimate it in some way, such as via opinion polls or extrapolating from the known results from other voting districts (as discussed above in Section 6.1).

In this section, we will model the effect of the adversary having only partial information about the honest voters’ preferences. In Section 7.1 we introduce the model in more detail. We then in Sections 7.2 and 7.3 adapt our results from Section 4 above on KTV privacy to this setting, and finally in Section 7.4 we give some data to show the effect of this additional uncertainty on the efficiency of replay attacks, as measured by both KTV and strong privacy.

7.1. The Dirichlet prior

We will take a Bayesian approach and assume that the adversary’s uncertainty is represented by a prior distribution on the honest vote probabilities. In particular, we will assume that the adversary’s prior is given by a Dirichlet distribution, because this is the conjugate prior to the multinomial distribution and so is the class of prior that the adversary would obtain via sampling (such as opinion polling) [38].

The model is thus that the honest vote probabilities are drawn according to the Dirichlet distribution, and then the votes themselves are drawn as a multinomial distribution with these probabilities; this means that the vote totals are given by a Dirichlet-multinomial distribution (note that the reader may be more familiar with the two-outcome case of these distributions, which are known as the Beta and Beta-binomial distributions respectively). The adversary has access to the parameters of the Dirichlet prior (which he can use to compute likelihoods of various honest vote totals), but not the actual sampled probabilities.

This requires a slight adjustment to the ideal functionality as presented in Fig. 2 above, which is now parameterised not by a probability distribution for each individual voter (since they are no longer assumed to be i.i.d.), but rather by a joint probability distribution on the overall tally of the honest voters. This modified ideal functionality is shown in Fig. 7. As discussed above, we will take the joint distribution $\vec{p}$ to be the Dirichlet-multinomial distribution.

Fig. 7.

Protocol of ideal replay voting functionality with the honest voters not assumed to be i.i.d.. The differences between the protocol in Fig. 2) and the one presented here are highlighted in red.

The Dirichlet distribution is parameterised by ‘concentration parameters’ $\vec{α} [j]$ for each $j \in C$ , and we write $α_{0} = \sum_{j \in C} \vec{α} [j]$ . The random variable $X_{i}$ for each probability has mean $μ_{j} = \vec{α} [j] / α_{0}$ and variance $\frac{μ_{j} (1 - μ_{j})}{α_{0} + 1}$ . We can therefore think of the distribution as specified by the estimates $μ_{i}$ of the vote probabilities together with a single parameter $α_{0}$ capturing the adversary’s uncertainty (with a larger value of $α_{0}$ corresponding to a lower level of uncertainty).

So what should $α_{0}$ be? Obviously this will depend on the type and quality of data available to the adversary for the purpose of estimating the vote probabilities, but an approximation can be gained from the formula for the variance of each vote probability. Suppose that the standard deviation of the adversary’s estimate of $p_{i} \approx 50 %$ is approximately one percentage point (corresponding to a 95% confidence interval of ± two percentage points), which is comparable to that obtained by pre-election opinion polling. Then we have that the variance is approximately 10000 and so $\begin{matrix} {(\frac{1}{2})}^{2} \frac{1}{α_{0} + 1} \approx 10000, \end{matrix}$ so $α_{0} \approx 2500$ . In Section 7.4 below, we will consider examples with $α_{0}$ over a range of orders of magnitude to show the sensitivity of privacy loss to this parameter.

Note that none of the work of Section 5 on strong privacy depended on the actual distribution of the honest vote tally and so everything carries over immediately to this setting, except that in the Monte Carlo algorithm described in Section 5.3 the likelihood of obtaining tally $t = (m_{1}, \dots, m_{k})$ given choice $c_{i}$ by the honest voter is no longer that coming from the multinomial distribution but is instead given by $\begin{matrix} p (t | c_{i}) = D‐M (m_{1}, \dots, m_{i - 1}, m_{i} - (n_{repl} + 1), m_{i + 1}, \dots, m_{k}; \sum_{i} m_{i} - (n_{repl} + 1), \vec{α}) \end{matrix}$ where D-M is the probability mass function of the Dirichlet-multinomial distribution, $\begin{matrix} D‐M (\vec{n}; N, \vec{α}) = (\binom{N}{\vec{n}}) \frac{B (\vec{n} + \vec{α})}{B (\vec{α})} \end{matrix}$ if $N = \sum_{i} n_{i}$ and 0 otherwise, where B is the multinomial Beta function.

On the other hand, the proofs of the main theorems in Section 4 did depend on the vote distribution being multinomial, and so some further work is required to show that they still hold in the Dirichlet-multinomial setting, which we undertake in the next section.

7.2. KTV privacy with Dirichlet prior

In this section we discuss how privacy is affected by an adversary who has access to the parameters $\vec{α} [j], j \in C$ and $α_{0} = \sum_{j \in C} \vec{α} [j]$ of the Dirichlet-multinomial prior. We will follow the approach in Section 4 to first reduce the $n_{C}$ -setup down to 3 choices, i.e. the two choices j, $j^{'}$ under observation and a dummy choice, which collects all other probabilities.

The definition of δ-privacy directly transfers to the new setups, namely we define $\begin{array}{c} (9) & A_{res, \vec{α}}^{j, n_{repl}} = D‐M (res | j, n_{V}^{t}, \vec{α}) \\ (10) & M_{j, j^{'}, \vec{α}}^{*, n_{repl}} = {res : A_{res}^{j, n_{repl}} (\vec{α}) < A_{res}^{j^{'}, n_{repl}} (\vec{α})} \\ (11) & δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal} = max_{j, j^{'} \in {1, \dots, n_{C}}} \sum_{res \in M_{j, j^{'}}^{*, n_{repl}}} (A_{res}^{j^{'}, n_{repl}} - A_{res}^{j, n_{repl}}) \end{array}$ as the maximal total variation distance between any two conditional distributions $A_{res}^{j, n_{repl}}$ and $A_{res}^{j^{'}, n_{repl}}$ . As before $A_{res}^{j, n_{repl}}$ is the probability distribution we get if the voter under observation chooses $j \in C$ , her choice is replayed $n_{repl}$ times and all honest voters voted according to $D‐M (\cdot, n_{V}^{h}, \vec{α})$ .

In order to find the reduction in the case of the Dirichlet-multinomial distribution, we have to simplify the condition in Equation (11). Unfortunately, this is technically much more involved than in the multinomial case. The techniques used are however typical when working with Γ- and beta functions. We therefore present this part in more detail.

We start with the integral representation of the Dirichlet-multinomial distribution. Recall that ${MN}_{n_{V}^{t}}^{res}$ is the multinomial probability density function from (6). Moreover, let ${Dir}_{\vec{α}}$ be the Dirichlet density $\begin{aligned} {Dir}_{\vec{α}} (\vec{p}) & = \frac{1}{B (\vec{α})} (\prod_{j = 1}^{n_{C}} \vec{p} {[j]}^{\vec{α} [j] - 1}), \end{aligned}$ for $\vec{α} \in R_{> 0}^{n_{C}}$ and $\begin{aligned} B (\vec{α}) & = \frac{\prod_{j = 1}^{n_{C}} Γ (\vec{α} [j])}{Γ (\sum_{j = 1}^{n_{C}} \vec{α} [j])} = \frac{\prod_{j = 1}^{n_{C}} Γ (\vec{α} [j])}{Γ (α_{0})}, \end{aligned}$ the (multivariate) Beta function. Then the Dirichlet-multinomial distribution satisfies $\begin{aligned} D‐M (res, n_{V}^{t}, \vec{α}) & = \int_{Δ} {MN}_{n_{V}^{t}}^{res} (\vec{p}) {Dir}_{\vec{α}} (\vec{p}) d \vec{p} \end{aligned}$ with integration with respect to the standard (Lebesgue) measure on the standard simplex $Δ = {\vec{p} \in R_{⩾ 0}^{n_{C}} | \sum_{j = 1}^{n_{C}} \vec{p} [j] = 1}$ . Note that the formula just reflects sampling probabilities $\vec{p}$ from the Dirichlet distribution and then sampling from a multinomial distribution ${MN}_{n_{V}^{t}}^{\cdot} (\vec{p})$ for $\vec{p}$ .

Similar to Section 4, we set $δ_{j j^{'}} = \sum_{res \in M_{j, j^{'}}^{*, n_{repl}}} (A_{res}^{j^{'}, n_{repl}} - A_{res}^{j, n_{repl}})$ the total variance distance for $j, j^{'} \in C$ . Note that we still get $δ_{j j^{'}} - δ_{j^{'} j} = 0$ and hence $δ_{j j^{'}} = \frac{δ_{j j^{'}} + δ_{j^{'} j}}{2}$ (cf. [35]). We now want to characterize the condition $M_{j, j^{'}, \vec{α}}^{*, n_{repl}}$ in terms of the j and $j^{'}$ terms only to allow a reduction to these two cases later. The condition in Equation (11) can be written as $\begin{aligned} 0 < & \int_{Δ} \frac{n_{V}^{h}}{2 \cdot n_{V}^{t}!} (Δ_{j, j^{'}, \vec{p}}^{res, n_{repl}}) \cdot {MN}_{n_{V}^{t}}^{res} (\vec{p}) {Dir}_{\vec{α}} (\vec{p}) d \vec{p} \end{aligned}$ with $Δ_{j, j^{'}, \vec{p}}^{res, n_{repl}} = \frac{\prod_{ν = 0}^{n_{repl}} (res [j^{'}] - ν)}{\vec{p} {[j^{'}]}^{n_{repl} + 1}} - \frac{\prod_{ν = 0}^{n_{repl}} (res [j] - ν)}{\vec{p} {[j]}^{n_{repl} + 1}}$ .

In order to isolate the variables j, $j^{'}$ from all other $l \in C$ , we proceed as before, although on a higher technical level. Namely, we have to integrate over the (standard) simplex $Δ = {(\vec{p}, s) \in R_{⩾ 0}^{n_{C}} \times [0, 1] : (\vec{p} [j], \vec{p} [j^{'}]) \in Δ_{s}, {(\vec{p} [l])}_{l : j \neq l \neq j^{'}} \in Δ_{s}^{c}}$ where $Δ_{s} = {(\vec{p} [j], \vec{p} [j^{'}]) \in R_{⩾ 0}^{2} : \vec{p} [j] + \vec{p} [j^{'}] = s}$ and $Δ_{s}^{c} = {{(\vec{p} [l])}_{l : j \neq l \neq j^{'}} \in R_{⩾ 0}^{n_{C} - 2} : \sum_{l : j \neq l \neq j^{'}} \vec{p} [l] = 1 - s}$ are lower-dimensional simplices, i.e. we can split up the integration over Δ into an integration over $Δ_{s}$ and $Δ_{s}^{c}$ . To do so, we define new variables $\vec{q} : = {(\vec{p} {[j, j]}^{- 1} \vec{p} [l])}_{j \neq l \neq j^{'}}$ and $\vec{p} [j, j^{'}] = 1 - \vec{p} [j] - \vec{p} [j^{'}]$ . In particular, a substitution $\vec{p} [l] \mapsto \vec{q} [l]$ for $j \neq l \neq j^{'}$ maps $Δ_{s}^{c}$ to $Δ_{0}^{c}$ , the $n_{C} - 2$ -dimensional standard simplex. By normality of a probability distribution, the integral over the whole $Δ_{0}^{c}$ will then be 1 independent of the $l \neq j, j^{'}$ . This enables our reduction. We can then also collect all $l \neq j, j^{'}$ terms into one dummy parameter $\vec{α} [j, j^{'}] = α_{0} - \vec{α} [j] - \vec{α} [j^{'}]$ .

For $t = res [j] + res [j^{'}]$ the actual computation looks like this: $\begin{aligned} 0 & < \int_{Δ} \frac{n_{V}^{h}}{2 \cdot n_{V}^{t}!} (Δ_{j, j^{'}, \vec{p}}^{res, n_{repl}} (\vec{p})) \cdot {MN}_{n_{V}^{t}}^{res} (\vec{p}) {Dir}_{\vec{α}} (\vec{p}) d \vec{p} \\ = \int_{Δ} (Δ_{j, j^{'}, \vec{p}}^{res, n_{repl}} \cdot \frac{n_{V}^{h}! \vec{p} {[j]}^{res [j]} \vec{p} {[j^{'}]}^{res [j^{'}]}}{2 \cdot (n_{V}^{t} - t)! res [j]! res [j^{'}]!} \cdot \frac{(n_{V}^{t} - t)!}{\prod_{j \neq l \neq j^{'}} res [l]} \prod_{j \neq l \neq j^{'}} \vec{p} {[l]}^{res [l]}) {Dir}_{\vec{α}} (\vec{p}) d \vec{p} \\ \cdot (\int_{Δ_{s}^{c}} \frac{(n_{V}^{t} - t)!}{\prod_{j \neq l \neq j^{'}} res [l]} \prod_{j \neq l \neq j^{'}} \vec{p} {[l]}^{res [l] + \vec{α} [l] - 1} d (\vec{p} [l] : j \neq l \neq j^{'})) d (\vec{p} [j], \vec{p} [j^{'}]) d s \\ = \int_{0}^{1} \int_{Δ_{s}} Δ_{j, j^{'}, \vec{p}}^{res, n_{repl}} \cdot \frac{n_{V}^{h}! \vec{p} {[j]}^{res [j] + \vec{α} [j] - 1} \vec{p} {[j^{'}]}^{res [j^{'}] + \vec{α} [j^{'}] - 1}}{2 \cdot (n_{V}^{t} - t)! res [j]! res [j^{'}]!} \frac{1}{B (\vec{α})} \\ \cdot (\int_{Δ_{0}^{c}} \vec{p} {[j, j^{'}]}^{n_{C} - 3} \vec{p} {[j, j^{'}]}^{n_{V}^{t} - t - n_{C} + 2 + \sum_{l : j \neq l \neq j^{'}} \vec{α} [l]} \\ \cdot \frac{(n_{V}^{t} - t)!}{\prod_{j \neq l \neq j^{'}} res [l]} \prod_{j \neq l \neq j^{'}} \vec{q} {[l]}^{res [l] + \vec{α} [l] - 1} d (\vec{q} [l] : j \neq l \neq j^{'})) d (\vec{p} [j], \vec{p} [j^{'}]) d s \\ = \frac{1}{B (\vec{α})} (\int_{Δ_{0}^{c}} \frac{(n_{V}^{t} - t)!}{\prod_{j \neq l \neq j^{'}} res [l]} \prod_{j \neq l \neq j^{'}} \vec{q} {[l]}^{res [l] + \vec{α} [l] - 1} d (\vec{q} [l] : j \neq l \neq j^{'})) \\ \cdot \int_{0}^{1} \int_{Δ_{s}} Δ_{j, j^{'}, \vec{p}}^{res, n_{repl}} \cdot \frac{n_{V}^{h}! \vec{p} {[j]}^{res [j] + \vec{α} [j] - 1} \vec{p} {[j^{'}]}^{res [j^{'}] + \vec{α} [j^{'}] - 1} \vec{p} {[j, j^{'}]}^{n_{V}^{t} - t + \vec{α} [j, j^{'}] - 1}}{2 \cdot (n_{V}^{t} - t)! res [j]! res [j^{'}]!} d (\vec{p} [j], \vec{p} [j^{'}]) d s \end{aligned}$ where we also used that the volume form of the $(n_{C} - 3)$ -dimensional simplex $Δ_{s}^{c}$ changes by a factor $\vec{p} {[j, j^{'}]}^{n_{C} - 3}$ under scaling by $\vec{p} [j, j^{'}] = 1 - s$ .

We then have to check, when the inequality is satisfied, i.e. we have to determine the sign of the previous term. The sign of the whole term is in fact determined by $\begin{array}{r} \int_{0}^{1} \int_{Δ_{s}} Δ_{j, j^{'}, \vec{p}}^{res, n_{repl}} \cdot \vec{p} {[j]}^{res [j] + \vec{α} [j] - 1} \vec{p} {[j^{'}]}^{res [j^{'}] + \vec{α} [j^{'}] - 1} {(1 - s)}^{n_{V}^{t} - t + \vec{α} [j, j^{'}] - 1} d (\vec{p} [j], \vec{p} [j^{'}]) d s . \end{array}$

We can further simplify by substituting $(\vec{p} [j], \vec{p} [j^{'}]) \to s (\vec{p} [j], \vec{p} [j^{'}])$ , i.e. let $(\vec{q} [j], \vec{q} [j^{'}]) = s^{- 1} (\vec{p} [j], \vec{p} [j^{'}])$ for $s > 0$ to get16

¹⁶
Since the case $s = 0$ has zero measure, we can ignore it.

\begin{aligned} \int_{0}^{1} \int_{Δ_{1}} (\frac{\prod_{ν = 0}^{n_{repl}} (res [j^{'}] - ν)}{s^{n_{repl} + 1} \vec{q} {[j^{'}]}^{n_{repl} + 1}} - \frac{\prod_{ν = 0}^{n_{repl}} (res [j] - ν)}{s^{n_{repl} + 1} \vec{q} {[j]}^{n_{repl} + 1}}) \\ \cdot s^{n_{V}^{t} + α_{0} - 1} \vec{q} {[j]}^{res [j] + \vec{α} [j] - 1} \vec{q} {[j^{'}]}^{res [j^{'}] + \vec{α} [j^{'}] - 1} {(s^{- 1} - 1)}^{n_{V}^{t} - t + \vec{α} [j, j^{'}] - 1} d (\vec{q} [j], \vec{q} [j^{'}]) d s \\ = \int_{Δ_{1}} Δ_{j, j^{'}, \vec{q}}^{res, n_{repl}} \vec{q} {[j]}^{res [j] + \vec{α} [j] - 1} \vec{q} {[j^{'}]}^{res [j^{'}] + \vec{α} [j^{'}] - 1} \\ \cdot \int_{0}^{1} s^{n_{V}^{t} + α_{0} - n_{repl} - 1} {(s^{- 1} - 1)}^{n_{V}^{t} - t + \vec{α} [j, j^{'}] - 1} d s d (\vec{q} [j], \vec{q} [j^{'}]) \end{aligned}

where we used again the scaling property of the volume form on our 1-dimensional surface

Δ_{s}

We hence only have to check $\begin{array}{r} \int_{0}^{1} (\frac{\prod_{ν = 0}^{n_{repl}} (t - r - ν)}{{(1 - p)}^{n_{repl} + 1}} - \frac{\prod_{ν = 0}^{n_{repl}} (r - ν)}{p^{n_{repl} + 1}}) p^{r + \vec{α} [j] - 1} {(1 - p)}^{t - r + \vec{α} [j^{'}] - 1} d p > 0 \end{array}$ where we have set $r = res [j]$ , $t - r = res [j^{'}]$ . This term looks already significantly easier. Still the integration is not ideal to work with. However, the integrals are just (conditional) Dirichlet-multinomial probabilities (indeed beta-binomial given that we are in the 2-dimensional case) up to scaling. In particular, we can use the representation of the beta function as a product of Γ-functions, i.e. $B (\vec{α} [j], \vec{α} [j^{'}], \vec{α} [j, j^{'}]) = \frac{Γ (\vec{α} [j]) Γ (\vec{α} [j^{'}] Γ (\vec{α} [j, j^{'}]))}{Γ (α_{0})}$ as well as the identity $\vec{α} [j] Γ (\vec{α} [j]) = Γ (\vec{α} [j] + 1)$ to further simplify: $\begin{aligned} \prod_{ν = 0}^{n_{repl}} (t - r - ν) \int_{0}^{1} p^{r + \vec{α} [j] - 1} {(1 - p)}^{t - r + \vec{α} [j^{'}] - n_{repl} - 2} d p \\ - \prod_{ν = 0}^{n_{repl}} (r - ν) \int_{0}^{1} p^{r + \vec{α} [j] - n_{repl} - 2} {(1 - p)}^{t - r + \vec{α} [j^{'}] - 1} d p > 0 \\ \Rightarrow \prod_{ν = 0}^{n_{repl}} (t - r - ν) B (β_{1}) > \prod_{ν = 0}^{n_{repl}} (r - ν) B (β_{2}) \\ \Rightarrow \prod_{ν = 0}^{n_{repl}} (t - r - ν) Γ (r + \vec{α} [j]) Γ (t - r + \vec{α} [j^{'}] - n_{repl} - 1) \\ > \prod_{ν = 0}^{n_{repl}} (r - ν) Γ (r + \vec{α} [j] - n_{repl} - 1) Γ (t - r + \vec{α} [j^{'}]) \\ (12) & \Rightarrow \prod_{ν = 0}^{n_{repl}} (t - r - ν) (r + \vec{α} [j^{'}] - 1 - ν) > \prod_{ν = 0}^{n_{repl}} (r - ν) (t - r + \vec{α} [j] - 1 - ν) \end{aligned}$ for $β_{1} = (r + \vec{α} [j], t - r + \vec{α} [j^{'}] - n_{repl} - 1)$ and $β_{2} = (r + \vec{α} [j] - n_{repl} - 1, t - r + \vec{α} [j^{'}])$ . Note that we have $‖ β_{1} ‖_{1} = ‖ β_{2} ‖_{1} = t + \vec{α} [j] + \vec{α} [j^{'}] - n_{repl} - 1$ . The integrals only have to be considered where they are well-defined (due to the prefactors), i.e. where the exponents are larger than $- 1$ .17

¹⁷

To simplify the notation, we will assume that products with 0 are 0, even if the second factor is not well-defined, i.e. we extend $0 \cdot B (β_{1})$ trivially outside the area where it is well-defined.

Overall, we have

\begin{array}{r} M_{j, j^{'}, \vec{α}}^{*, n_{repl}} = {res | (12) holds with r = res [j], t = res [j] + res [j^{'}]} . \end{array}

Now we return to the computation of the privacy $δ_{j j^{'}}$ , i.e. $\begin{aligned} (13) & δ_{j j^{'}} = & \frac{1}{2} \frac{n_{V}^{h}!}{n_{V}^{t}!} \int_{Δ} (\sum_{res} Δ_{j, j^{'}, \vec{p}}^{res, n_{repl}} {MN}_{n_{V}^{t}}^{res} (\vec{p}) χ_{\vec{α}}^{n_{repl}} (res [j], res [j^{'}])) {Dir}_{\vec{α}} (\vec{p}) d \vec{p} \end{aligned}$ for $\begin{array}{r} χ_{\vec{α}}^{n_{repl}} (res [j], res [j^{'}]) = 1_{M_{j, j^{'}, \vec{α}}^{*, n_{repl}}} - 1_{M_{j^{'}, j, \vec{α}}^{*, n_{repl}}} . \end{array}$ We can reduce this term to two parameters as before and get for $δ_{j j^{'}}$ : $\begin{aligned} (14) & δ_{j j^{'}} = 2 \sum_{t = 0}^{n_{V}^{t}} \sum_{r = 0}^{t} | Δ_{j, j^{'}, DMN}^{res, n_{repl}} | g_{j, j^{'}, DMN}^{res, n_{repl}} h_{j, j^{'}, DMN}^{res, n_{repl}} \end{aligned}$ for $\begin{array}{c} \begin{aligned} Δ_{j, j^{'}, DMN}^{res, n_{repl}} & = 1 \prod_{ν = 0}^{n_{repl}} (t - r - ν) \prod_{ν = 0}^{min {n_{repl}, r - 1}} (r + \vec{α} [j] - 1 - ν) \\ - \prod_{ν = 0}^{n_{repl}} (r - ν) \prod_{ν = 0}^{min {n_{repl}, t - r - 1}} (t - r + \vec{α} [j^{'}] - 1 - ν) \end{aligned} \\ g_{j, j^{'}, DMN}^{res, n_{repl}} : = \frac{\prod_{ν = 1}^{r - n_{repl} - 1} (r + \vec{α} [j] - n_{repl} - 1 - ν) \prod_{ν = 1}^{t - r - n_{repl} - 1} (t - r + \vec{α} [j^{'}] - n_{repl} - 1 - ν)}{r! (t - r)!} \\ h_{j, j^{'}, DMN}^{res, n_{repl}} : = \frac{n_{V}^{h}! \prod_{ν = 1}^{n_{V}^{t} - t} (n_{V}^{t} - t + \vec{α} [j, j^{'}] - ν)}{(n_{V}^{t} - t)! \prod_{ν = 1}^{n_{V}^{t} - n_{repl} - 1} (n_{V}^{t} + α_{0} - n_{repl} - 1 - ν)} . \end{array}$ The computation that leads to this reduction is very similar to the computations done above, i.e. we use integration over simplices, suitable substitutions and the properties of the beta function as before. We therefore avoid a detailed treatment here; the interested reader may consult Lemma 4 in Appendix A.2 for more details.

Remark.

Let $α = c \vec{p}$ for a probability distribution $\vec{p}$ . Then in the limit $c \to \infty$ we recover our original result from eq. (7): $\begin{aligned} lim_{c \to \infty} \frac{1}{2} \sum_{t = 0}^{n_{V}^{t}} \sum_{r = 0}^{t} | \prod_{ν = 0}^{n_{repl}} (t - r - ν) \vec{α} [j] - \prod_{ν = 0}^{n_{repl}} (r - ν) \vec{α} [j^{'}] | \cdot \frac{n_{V}^{h}!}{(n_{V}^{t} - t)! r! (t - r)!} \cdot \vec{α} {[j]}^{r - n_{repl} - 1} Γ (\vec{α} [j]) \\ \cdot \vec{α} {[j^{'}]}^{t - r - n_{repl} - 1} Γ (\vec{α} [j^{'}]) \vec{α} {[j, j^{'}]}^{n_{V}^{t} - t} Γ (\vec{α} [j, j^{'}]) \\ \cdot \frac{Γ (α_{0})}{α_{0}^{n_{V}^{t} - n_{repl} - 1} Γ (α_{0}) Γ (\vec{α} [j]) Γ (\vec{α} [j^{'}]) Γ (\vec{α} [j, j^{'}])} \\ = lim_{c \to \infty} \frac{1}{2} \sum_{t = 0}^{n_{V}^{t}} \sum_{r = 0}^{t} | \vec{p} {[j]}^{n_{repl} + 1} \prod_{ν = 0}^{n_{repl}} (t - r - ν) - \vec{p} {[j^{'}]}^{n_{repl} + 1} \prod_{ν = 0}^{n_{repl}} (r - ν) | c^{n_{repl} + 1} \\ \cdot \frac{n_{V}^{h}!}{(n_{V}^{t} - t)! r! (t - r)!} \cdot \frac{c^{r - n_{repl} - 1} \vec{p} {[j]}^{r - n_{repl} - 1} c^{t - r - n_{repl} - 1} \vec{p} {[j^{'}]}^{t - r - n_{repl} - 1} c^{n_{V}^{t} - t} \vec{p} {[j, j^{'}]}^{n_{V}^{t} - t}}{c^{n_{V}^{t} - n_{repl} - 1}} \\ = \frac{1}{2} \sum_{t = 0}^{n_{V}^{t}} \sum_{r = 0}^{t} | \vec{p} {[j]}^{n_{repl} + 1} \prod_{ν = 0}^{n_{repl}} (t - r - ν) - \vec{p} {[j^{'}]}^{n_{repl} + 1} \prod_{ν = 0}^{n_{repl}} (r - ν) | \\ \cdot \frac{n_{V}^{h}!}{(n_{V}^{t} - t)! r! (t - r)!} \cdot \vec{p} {[j]}^{r - n_{repl} - 1} \vec{p} {[j^{'}]}^{t - r - n_{repl} - 1} \vec{p} {[j, j^{'}]}^{n_{V}^{t} - t} \end{aligned}$ where we used that asymptotically in x: $Γ (x + y) \sim x^{y} Γ (x)$ .

After the reduction, we also want to remove the remaining absolute value from Formula (14). Again this works similarly to our construction in Section 4 although it is technically more challenging.

We first define $T (r) = \prod_{ν = 0}^{n_{repl}} (t - r - ν) \prod_{ν = 0}^{min {n_{repl}, r - 1}} (r + \vec{α} [j] - 1 - ν) - \prod_{ν = 0}^{n_{repl}} (r - ν) \prod_{ν = 0}^{min {n_{repl}, t - r - 1}} (t - r + \vec{α} [j^{'}] - 1 - ν)$ and let $T_{t, j, j^{'}}^{\vec{α}}$ be the largest natural number $⩽ t - n_{repl}$ such that $T (r) ⩾ 0$ for all $r ⩽ T_{t, j, j^{'}}^{\vec{α}}$ . Note that $T_{t, j, j^{'}}^{\vec{α}} ⩾ n_{repl}$ . Furthermore, all factors in a non-trivial product are positive. Moreover, for $\vec{α} [j], \vec{α} [j^{'}] ⩾ 1$ , $\frac{t - r - ν}{t - r + \vec{α} [j^{'}] - 1 - ν}$ (or $\frac{t - r - ν}{1}$ ) decreases monotonically and $\frac{r - ν}{r + \vec{α} [j] - 1 - ν}$ (or $\frac{r - ν}{1}$ ) increases monotonically in r (where we exclude the cases with denominator 0). Hence the product $\frac{\prod_{ν = 0}^{n_{repl}} (t - r - ν)}{\prod_{ν = 0}^{min {n_{repl}, t - r - 1}} (t - r + \vec{α} [j^{'}] - 1 - ν)}$ decreases and $\frac{\prod_{ν = 0}^{n_{repl}} (r - ν)}{\prod_{ν = 0}^{min {n_{repl}, r - 1}} (r + \vec{α} [j] - 1 - ν)}$ increases and we find a unique $T_{t, j, j^{'}}^{\vec{α}}$ . For $\vec{α} [j], \vec{α} [j^{'}] < 1$ the behaviour is inversed. We note that realistic values for $\vec{α}$ are usually much larger than 1, since the case 1 corresponds to an adversary that has no information at all on the possible vote distribution. This situation is not realistic for e-voting and hence we will assume $\vec{α} [j] > 1$ for all $j \in C$ from now on (in fact as we have discussed Section 7.1, realistic values for $\vec{α}$ are e.g. those with $α_{0} \approx 2500$ ). We get the following explicit formula for the privacy level:

Lemma 2.

Let $j, j^{'} \in C$ . For each $t \in {0, \dots, n_{V}^{h} + n_{repl} + 1}$ let $T_{t, j, j^{'}}^{\vec{α}} \in Z$ be defined as above. Then for $\vec{α} [j, j^{'}] : = α_{0} - \vec{α} [j] - \vec{α} [j^{'}]$ : $\begin{array}{rcl} δ_{j j^{'}} & = & \sum_{t = 0}^{n_{V}^{h}} \frac{n_{V}^{h}! \prod_{ν = 1}^{n_{V}^{h} - t} (n_{V}^{h} - t + \vec{α} [j, j^{'}] - ν)}{(n_{V}^{h} - t)! \prod_{ν = 1}^{n_{V}^{h}} (n_{V}^{h} + α_{0} - ν)} \\ (15) & \cdot \sum_{r = max {T_{t + n_{repl} + 1} - n_{repl}, 0}}^{min {T_{t + n_{repl} + 1}, t}} \frac{\prod_{ν = 1}^{r} (r + \vec{α} [j] - ν) \prod_{ν = 1}^{t - r} (t - r + \vec{α} [j^{'}] - ν)}{r! (t - r)!} . \end{array}$ If $\vec{α} [j, j^{'}] = 0$ , i.e. the case of exactly two choices, only the $t = n_{V}^{h}$ summand is non-trivial.

The proof of this lemma is included in Appendix A.2.

Finally, the absolute KTV-privacy $δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal}$ is the maximum over all $δ_{j j^{'}}$ for $j, j^{'} \in C$ (cf. Equation (11)). This maximum is reached for the two smallest $\vec{α} [j]$ , $\vec{α} [j^{'}]$ . The proof of this maximality statement is included in Appendix A.2. Overall we get the following main theorem on the KTV-privacy level: Theorem 6.

Let j, $j^{'}$ be such that $1 ⩽ \vec{α} [j] ⩽ \vec{α} [j^{'}] ⩽ \vec{α} [l]$ for all $l \neq j$ . Then, the ideal privacy loss is given by $δ_{C, n_{V}^{h}, \vec{p}, n_{repl}}^{ideal} = δ_{j j^{'}}$ where $δ_{j j^{'}}$ is defined by Equation (15).

7.3. Asymptotics

Similar to our discussion in Theorem 7, we want to take a closer look at the asymptotic behaviour of the privacy level. We will therefore analyse the explicit formula in Equation (15).

For large $n_{V}^{h}$ (and fixed $\vec{α}$ ) we will make use of the approximation $Γ (n_{V}^{h} + y) \approx Γ (n_{V}^{h}) \cdot {(n_{V}^{h})}^{y}$ for $y = α_{0}$ and $y = \vec{α} [j]$ , $j \in C$ . The inner sum of Equation (15) can then by approximated by $(n_{repl} + 1) t^{\vec{α} [j] + \vec{α} [j^{'}] - 2}$ , since their summands are centered around the mode of a beta-binomial distribution and therefore scale with the maximal value. An easy integral approximation of the remaining outer sum leads to Theorem 7 below. A detailed proof of this result is contained in Appendix A.2.

Theorem 7 (Asymptotics).

Let C, $n_{V}^{h}$ , $\vec{α}$ , $n_{repl}$ be as above. Let $n_{repl}$ , $\vec{α} [j] ≪ n_{V}^{h}$ and $\vec{α} [j] > 1$ for all $j \in C$ . Then, we have $\begin{aligned} (16) & δ_{C, n_{V}^{h}, \vec{p}}^{ideal} \sim \frac{n_{repl} + 1}{n_{V}^{h}} . \end{aligned}$

This asymptotic behaviour heavily relies on the validity of the approximation $Γ (x + y) \approx Γ (x) \cdot {(x)}^{y}$ for large x, e.g. for $x = n_{V}^{h}$ and $y = α_{0}$ . In the proof of Theorem 7 we use this approximation for large $n_{V}^{h}$ and comparably small $α_{0}$ , $\vec{α} [j]$ . If $α_{0}$ (and the individual parameters $\vec{α} [j]$ ) are however larger than $n_{V}^{h}$ , the same approximation with x and y interchanged becomes more accurate, e.g. $x = α_{0}$ , $y = n_{V}^{h}$ . In this case we can approximate the Dirichlet-multinomial distribution with the multinomial distribution as in the remark of Section 7.2. In particular, for $\vec{α} [j] ≫ n_{V}^{h}$ for all $j \in C$ we get the known quadratic behaviour from Theorem 7, i.e. $\begin{array}{r} δ_{C, n_{V}^{h}, \vec{p}}^{ideal} \sim \frac{n_{repl} + 1}{\sqrt{n_{V}^{h}}} . \end{array}$

Fig. 8.

Privacy loss ratios (i.e. privacy loss with $α_{0} = 2500$ divided by privacy loss with $α_{0} = \infty$ ) for KTV and strong privacy, for varying numbers of honest voters.

This behaviour is also illustrated in Fig. 8 where we consider $α_{0} = 2500$ with 5 candidates and varying numbers of honest voters. We see that for a low number of honest parties $n_{V}^{h}$ , the privacy level in the multinomial model, i.e. $α_{0} = \infty$ , and in the approximate level are almost the same. On the other hand once the number of honest parties becomes larger and in fact larger than 2500, the privacy loss in the multinomial model is much greater than in the approximate model.

This confirms our theoretical prediction in Equation (28) that the ratio between approximate level and multinomial level varies as $\frac{1}{\sqrt{n_{V}^{h}}}$ for large $n_{V}^{h}$ , i.e. decreases by a factor $\sqrt{10} \approx 3.16$ if we increase $n_{V}^{h}$ by a factor 10. This behaviour is almost identical for KTV and strong privacy. In the next section we will see that for real-world use cases the size of constituencies is often small compared to the accuracy of polls (cf. Estonia, Germany, US in Fig. 11), and so the privacy loss in the approximate model is similar to the privacy loss in the multinomial model described in Section 6.

7.4. Examples

In the discussion above we concluded that the adversary can expect to achieve a value of $α_{0}$ of approximately 2500 (if he can do no better than pre-election opinion polling), but of course it is difficult to be precise and so we begin this section with a sensitivity analysis, shown in Figs 9 and 10 for KTV and strong privacy respectively.

Fig. 9.

Comparison of privacy losses depending on the uncertainty in α for KTV privacy, for an election with 10000 honest voters, 10 replays and 2–10 candidates, uniform distribution.

Fig. 10.

Comparison of privacy losses depending on the uncertainty in α for strong privacy, for an election with 10000 honest voters, 10 replays and 2–10 candidates, uniform distribution.

As expected, we observe that lower values $α_{0}$ correspond to lower privacy loss. Note, however, that even a value of 1000 (so a two-and-a-half times lower than our fairly conservative estimate) gives a substantial but still order 1 reduction in the privacy loss. Note also that the effect of varying $α_{0}$ does not seem to change significantly with different numbers of candidates, whereas we can see from Fig. 8 that the effect of finite $α_{0}$ does vary dramatically with the number of honest voters.

We now turn to the real-world elections previously considered above in Section 6. There we saw that an adversary with access to the actual vote probabilities can break privacy with a fairly low number of replays. We now repeat this analysis for an adversary with the Dirichlet prior and our estimated value of $α_{0} \approx 2500$ ; the results are shown in Fig. 11.

Fig. 11.

Ideal privacy losses with $n_{repl} = 0, 1, 5, 10$ replays based on real election data from Examples 1–4 with an uncertainty defined by $α = 2500$ . “KTV” denotes KTV privacy definition and “SP” denotes strong privacy definition.

By comparison with Fig. 6, which describes the case $α_{0} = \infty$ , we see that the privacy loss for the Estonian, German and US elections is only slightly lower than before. This confirms our theoretical discussion in Section 7.3 and our empirical analysis in Fig. 8 that if the size of a constituency is small, and in particular is similar to or less than the value of $α_{0}$ , then the exact model ( $α_{0} = \infty$ ) and the approximate model ( $α_{0} \approx 2500$ ) lead to very similar privacy losses for both KTV and strong privacy. On the other hand, for the UK election where we have a large constituency size of 85,270 votes cast, the difference between the exact and the approximate model is more significant.

In summary, we see that the privacy loss in the approximate model is smaller than in the exact model. This behaviour is completely natural and expected: A replay attack by an adversary who has less information is less successful. However, we also see that an adversary who only gets access to polls can still perform very successful replay attacks, especially for typical smaller constituency sizes.

Footnotes

Acknowledgments

David Mestel was supported by the Luxembourg National Research Fund (FNR) under grant number INTER FNRS/15/11106658/SeVoTe. Johannes Müller was supported by FNR under the CORE Junior project FP2 (C20/IS/14698166/FP2/Mueller). Pascal Reisert was supported by the DFG through grant KU 1434/11-1 and by the CRYPTECS project founded by the German Federal Ministry of Education and Research under Grant Agreement No. 16KIS1441 and from the French National Research Agency under Grant Agreement No. ANR-20-CYAL-0006.

Proofs

References

Adida, Helios: Web-based open-audit voting, in: Proceedings of the 17th USENIX Security Symposium, San Jose, CA, USA, July 28–August 1, 2008,

P.C.

van Oorschot, ed., USENIX Association, 2008, pp. 335–348.

M.S.

Alvim,

Chatzikokolakis,

Palamidessi and

Smith, Measuring information leakage using generalized gain functions, in: Proceedings of the 2012 IEEE 25th Computer Security Foundations Symposium, CSF’12, IEEE Computer Society, USA, 2012, pp. 265–279. doi:10.1109/CSF.2012.26.

Bana,

Biroli,

Dervishi,

F.E.

Orche,

Géraud-Stewart,

Naccache,

P.B.

Rønne,

P.Y.A.

Ryan and

Waltsburger, Time, privacy, robustness, accuracy: trade offs for the open vote network protocol. IACR Cryptol., ePrint Arch., 2021, p. 1065.

J.D.C.

Benaloh, Verifiable secret-ballot elections, PhD thesis, 1987.

Bernhard,

Cortier,

Galindo,

Pereira and

Warinschi, SoK: A comprehensive analysis of game-based ballot privacy definitions, in: 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17–21, 2015, 2015, pp. 499–516. doi:10.1109/SP.2015.37.

Bernhard,

Cortier,

Pereira and

Warinschi, Measuring vote privacy, revisited, in: ACM Conference on Computer and Communications Security (CCS 2012),

Yu,

Danezis and

V.D.

Gligor, eds, ACM, 2012, pp. 941–952.

Bernhard,

Pereira and

Warinschi, How not to prove yourself: Pitfalls of the Fiat–Shamir heuristic and applications to helios, in: Advances in Cryptology – ASIACRYPT 2012 – 18th International Conference on the Theory and Application of Cryptology and Information Security, Beijing, China, December 2–6, 2012, Proceedings,

Wang and

Sako, eds, Lecture Notes in Computer Science, Vol. 7658, Springer, 2012, pp. 626–643.

Blazy,

Fuchsbauer,

Pointcheval and

Vergnaud, Signatures on randomizable ciphertexts, in: Public Key Cryptography – PKC 2011 – 14th International Conference on Practice and Theory in Public Key Cryptography, Taormina, Italy, March 6–9, 2011, Proceedings,

Catalano,

Fazio,

Gennaro and

Nicolosi, eds, Lecture Notes in Computer Science, Vol. 6571, Springer, 2011, pp. 403–422.

Boyen,

Haines and

Müller, Epoque: Practical end-to-end verifiable post-quantum-secure e-voting, in: IEEE European Symposium on Security and Privacy, EuroS&P 2021, Vienna, Austria, September 6–10, 2021, IEEE, 2021, pp. 272–291.

10.

Bursuc,

C.C.

Dragan and

Kremer, Private votes on untrusted platforms: Models, attacks and provable scheme, in: IEEE European Symposium on Security and Privacy EuroS&P 2019, Stockholm, Sweden, June 17–19, 2019, IEEE, 2019, pp. 606–620.

11.

Burton,

Culnane,

Heather,

Peacock,

P.Y.A.

Ryan,

Schneider,

Teague,

Wen,

Xia and

Srinivasan, Using Prêt à Voter in Victoria state elections, in: Electronic Voting Technology Workshop/Workshop on Trustworthy Elections, EVT/WOTE’12, Bellevue, WA, USA, August 6–7, 2012,

J.A.

Halderman and

Pereira, eds, USENIX Association, 2012.

12.

Carback,

Chaum,

Clark,

Conway,

Essex,

P.S.

Herrnson,

Mayberry,

Popoveniuc,

R.L.

Rivest,

Shen,

A.T.

Sherman and

P.L.

Vora, Scantegrity II municipal election at Takoma Park: The first E2E binding governmental election with ballot privacy, in: 19th USENIX Security Symposium, Washington, DC, USA, August 11–13, 2010, Proceedings, USENIX Association, 2010, pp. 291–306.

13.

Chaidos,

Cortier,

Fuchsbauer and

Galindo, BeleniosRF: A non-interactive receipt-free electronic voting scheme, in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24–28, 2016,

E.R.

Weippl,

Katzenbeisser,

Kruegel,

A.C.

Myers and

Halevi, eds, ACM, 2016, pp. 1614–1625. doi:10.1145/2976749.2978337.

14.

Chaum,

Essex,

Carback,

Clark,

Popoveniuc,

A.T.

Sherman and

P.L.

Vora, Scantegrity: End-to-end voter-verifiable optical-scan voting, IEEE Secur. Priv.6(3) (2008), 40–46. doi:10.1109/MSP.2008.70.

15.

Chaum,

P.Y.A.

Ryan and

S.A.

Schneider, A practical voter-verifiable election scheme, in: Computer Security – ESORICS 2005, 10th European Symposium on Research in Computer Security, Milan, Italy, September 12–14, 2005, Proceedings,

S.D.C.

di Vimercati,

P.F.

Syverson and

Gollmann, eds, Lecture Notes in Computer Science, Vol. 3679, Springer, 2005, pp. 118–139.

16.

M.R.

Clarkson,

Chong and

A.C.

Myers, Civitas: Toward a secure voting system, in: 2008 IEEE Symposium on Security and Privacy (S&P 2008), 18–21 May 2008, Oakland, California, USA, IEEE Computer Society, 2008, pp. 354–368. doi:10.1109/SP.2008.32.

17.

Cortier,

Galindo,

Küsters,

Müller and

Truderung, SoK: Verifiability notions for e-voting protocols, in: IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22–26, 2016, 2016, pp. 779–798. doi:10.1109/SP.2016.52.

18.

Cortier and

Smyth, Attacking and fixing helios: An analysis of ballot secrecy, in: Proceedings of the 24th IEEE Computer Security Foundations Symposium, CSF 2011, Cernay-la-Ville, France, 27–29 June, 2011, IEEE Computer Society, 2011, pp. 297–311. doi:10.1109/CSF.2011.27.

19.

Dreier,

Lafourcade and

Lakhnech, Vote-independence: A powerful privacy notion for voting protocols, in: Foundations and Practice of Security – 4th Canada–France MITACS Workshop, FPS 2011, Paris, France, May 12–13, 2011, Revised Selected Papers,

García-Alfaro and

Lafourcade, eds, Lecture Notes in Computer Science, Vol. 6888, 2011, pp. 164–180, Springer.

20.

Fiat and

Shamir, How to prove yourself: Practical solutions to identification and signature problems, in: Advances in Cryptology – CRYPTO’86, Santa Barbara, California, USA, 1986, Proceedings,

A.M.

Odlyzko, ed., Lecture Notes in Computer Science, Vol. 263, Springer, 1986, pp. 186–194.

21.

Gelman and

T.C.

Little, Poststratification into many categories using hierarchical logistic regression, Survey Methodology46(1) (1997).

22.

Gnedenko, Theory of Probability, 6th edn, Taylor & Francis, 1998.

23.

Helios voting. Attacks and defenses, https://documentation.heliosvoting.org/attacks-and-defenses (accessed 11.04.2022).

24.

Hirschi,

Schmid and

D.A.

Basin, Fixing the Achilles heel of e-voting: The bulletin board, in: 34th IEEE Computer Security Foundations Symposium, CSF 2021, Dubrovnik, Croatia, June 21–25, 2021, IEEE, 2021, pp. 1–17.

25.

IACR, IACR elections, 2020, https://www.iacr.org/elections/ (accessed 11.04.2022).

26.

Iovino,

Rial,

P.B.

Rønne and

P.Y.A.

Ryan, Universal unconditional verifiability in e-voting without trusted parties, in: 33rd IEEE Computer Security Foundations Symposium, CSF 2020, Boston, MA, USA, June 22–26, 2020, IEEE, 2020, pp. 33–48.

27.

Khazaei and

Wikström, Randomized partial checking revisited, in: Topics in Cryptology – CT-RSA 2013 – the Cryptographers’ Track at the RSA Conference 2013, San Francisco, CA, USA, February 25–March 1, 2013, Proceedings,

Dawson, ed., Lecture Notes in Computer Science, Vol. 7779, Springer, 2013, pp. 115–128.

28.

Küsters,

Liedtke,

Müller,

Rausch and

Vogt, Ordinos: A verifiable tally-hiding e-voting system, in: IEEE European Symposium on Security and Privacy, EuroS&P 2020, Genoa, Italy, September 7–11, 2020, IEEE, 2020, pp. 216–235.

29.

Küsters,

Müller,

Scapin and

Truderung, SElect: A lightweight verifiable remote voting system, in: IEEE 29th Computer Security Foundations Symposium, CSF 2016, Lisbon, Portugal, June 27–July 1, 2016, 2016, pp. 341–354. doi:10.1109/CSF.2016.31.

30.

Küsters,

Truderung and

Vogt, Verifiability, privacy, and coercion-resistance: New insights from a case study, in: 32nd IEEE Symposium on Security and Privacy, S&P 2011, 22–25 May 2011, Berkeley, California, USA, 2011, pp. 538–553. doi:10.1109/SP.2011.21.

31.

Küsters,

Truderung and

Vogt, Verifiability, privacy, and coercion-resistance: New insights from a case study. IACR Cryptol, 2011, ePrint Arch., 2011:517.

32.

Küsters,

Truderung and

Vogt, Formal analysis of Chaumian mix nets with randomized partial checking, in: 2014 IEEE Symposium on Security and Privacy, SP 2014, Berkeley, CA, USA, May 18–21, 2014, 2014, pp. 343–358. doi:10.1109/SP.2014.29.

33.

B.E.

Lauderdale,

Bailey,

Blumenau and

Rivers, Model-based pre-election polling for national and sub-national outcomes in the US and UK, International Journal of Forecasting36(2) (2020), 399–413. doi:10.1016/j.ijforecast.2019.05.012.

34.

Lee,

Boyd,

Dawson,

Kim,

Yang and

Yoo, Providing receipt-freeness in mixnet-based voting protocols, in: Information Security and Cryptology – ICISC 2003, 6th International Conference, Seoul, Korea, November 27–28, 2003, Revised Papers,

J.I.

Lim and

D.H.

Lee, eds, Lecture Notes in Computer Science, Vol. 2971, Springer, 2003, pp. 245–258.

35.

D.A.

Levin,

Peres and

E.L.

Wilmer, Markov Chains and Mixing Times, American Mathematical Society, 2006.

36.

McIver,

Rabehaja,

Wen and

Morgan, Privacy in elections: How small is “small”?, J. Inf. Secur. Appl.36(C) (2017), 112–126.

37.

Mestel,

Müller and

Reisert, How efficient are replay attacks against vote privacy? A formal quantitative analysis, in: 35th IEEE Computer Security Foundations Symposium, CSF 2022, Haifa, Israel, August 7–10, 2022, IEEE, 2022, pp. 179–194.

38.

K.W.

Ng,

G.-L.

Tian and

M.-L.

Tang, Dirichlet and Related Distributions: Theory, Methods and Applications, 2011.

39.

Sako and

Kilian, Secure voting using partially compatible homomorphisms, in: Advances in Cryptology – CRYPTO’94, 14th Annual International Cryptology Conference, Santa Barbara, California, USA, August 21–25, 1994, Proceedings,

Desmedt, ed., Lecture Notes in Computer Science, Vol. 839, Springer, 1994, pp. 411–424.

40.

Schoenmakers, A simple publicly verifiable secret sharing scheme and its application to electronic voting, in: Advances in Cryptology – CRYPTO’99, 19th Annual International Cryptology Conference, Santa Barbara, California, USA, August 15–19, 1999, Proceedings,

M.J.

Wiener, ed., Lecture Notes in Computer Science, Vol. 1666, Springer, 1999, pp. 148–164. doi:10.1007/3-540-48405-1_10.

41.

Smartmatic, Online voting successfully solving the challenges, 2021, https://www.smartmatic.com/fileadmin/user_ upload/Whitepaper_Online_Voting_Challenge_Considerations_TIVI.pdf (accessed 11.04.2022).

42.

Smyth, Replay attacks that violate ballot secrecy in Helios. IACR Cryptol, 2012, ePrint Arch., 2012:185.

43.

Swiss Post, E-voting and security, 2021, https://www.post.ch/en/business-solutions/e-voting/security-given-top-priority (accessed 11.04.2022).

How efficient are replay attacks against vote privacy? A formal quantitative analysis 1

Abstract

Keywords

1. Introduction

1.1. Our contributions

2 We presented the first four contributions in the conference paper [37], whereas our fifth contribution (efficiency analysis of replay attacks under the assumption of approximate prior knowledge) is novel.

2. Categorization of replay attacks

2.1. Basic replay attacks

2.2. Homomorphic replay attacks

3 We restrict our attention to the ballots’ ciphertexts and put further primitives (signatures etc.) aside for simplicity.

2.4. Summary

3. KTV vote privacy definition

4 What we call privacy loss in this work was in the original paper [30] called privacy level. Because the privacy bound δ is higher when more private information is leaked, we prefer to use the term privacy loss for δ.

3.2. Privacy definition

Definition 1 (Vote Privacy [30]).

3.3. Ideal privacy

4. Efficiency analysis based on the KTV vote privacy definition

4.1. Ideal privacy loss

8 Our series approximation becomes weak for n repl 2 ⩾ n V h . Obviously δ j j ′ is bounded by 1.

9 Observe that n repl ≪ n V h covers the interesting cases because the privacy loss is obviously close to 1 for large n repl .

5.1. Related work

5.2. Strong vote privacy

Definition 2 (Strong vote privacy).

Definition 3 (Strong vote privacy, II).

6.1. Modeling assumptions

6.2. Examples

10 We published our implementation at http://hdl.handle.net/10993/51209.

11 See https://rk2019.valimised.ee/en/voting-result/local-municipality-0482-voting-result.html (accessed 11.04.2022).

13 See https://www.wahlen.rlp.de/de/ltw/wahlen/2021/ergebnisse/2242350410700.html (accessed 11.04.2022).

14 See https://www.electoralcommission.org.uk/who-we-are-and-what-we-do/elections-and-referendums/past-elections-and-referendums/eu-referendum (accessed 11.04.2022).

15 See https://electionstats.state.ma.us/elections/view/140751/filter_by_county:Suffolk (accessed 11.04.2022).

7. Approximate prior knowledge

7.1. The Dirichlet prior

16 Since the case s = 0 has zero measure, we can ignore it.

Theorem 7 (Asymptotics).

Footnotes

Acknowledgments

Proofs

References

²
We presented the first four contributions in the conference paper [37], whereas our fifth contribution (efficiency analysis of replay attacks under the assumption of approximate prior knowledge) is novel.

³
We restrict our attention to the ballots’ ciphertexts and put further primitives (signatures etc.) aside for simplicity.

⁴
What we call privacy loss in this work was in the original paper [30] called privacy level. Because the privacy bound δ is higher when more private information is leaked, we prefer to use the term privacy loss for δ.

⁸
Our series approximation becomes weak for $n_{repl}^{2} ⩾ n_{V}^{h}$ . Obviously $δ_{j j^{'}}$ is bounded by 1.

⁹
Observe that $n_{repl} ≪ \sqrt{n_{V}^{h}}$ covers the interesting cases because the privacy loss is obviously close to 1 for large $n_{repl}$ .

¹⁰
We published our implementation at http://hdl.handle.net/10993/51209.

¹¹
See https://rk2019.valimised.ee/en/voting-result/local-municipality-0482-voting-result.html (accessed 11.04.2022).

¹³
See https://www.wahlen.rlp.de/de/ltw/wahlen/2021/ergebnisse/2242350410700.html (accessed 11.04.2022).

¹⁴
See https://www.electoralcommission.org.uk/who-we-are-and-what-we-do/elections-and-referendums/past-elections-and-referendums/eu-referendum (accessed 11.04.2022).

¹⁵
See https://electionstats.state.ma.us/elections/view/140751/filter_by_county:Suffolk (accessed 11.04.2022).

¹⁶
Since the case $s = 0$ has zero measure, we can ignore it.