Privacy is a notoriously difficult property to achieve in complicated systems and especially in electronic voting schemes. Moreover, electronic voting schemes is a class of systems that require very high assurance. The literature contains a number of ballot privacy definitions along with security proofs for common systems. Some machine-checked security proofs have also appeared. We define a new ballot privacy notion that captures a larger class of voting schemes. This notion improves on the state of the art by taking into account that verification in many schemes will happen or must happen after the tally has been published, not before as in previous definitions.
As a case study we give a machine-checked proof of privacy for Selene, which is a remote electronic voting scheme which offers an attractive mix of security properties and usability. Prior to our work, the computational privacy of Selene has never been formally verified. Finally, we also prove that MiniVoting and Belenios satisfies our definition.
Confidence in the validity of the outcome and privacy of the votes is supremely important for elections. We build confidence in elections by using carefully selected methods, routines, and election officers. In particular, extensive use of various forms of auditing helps build confidence.
For the analysis of a voting mechanism, we need to know what security means and why the mechanism is secure. The former requires a so-called security notion, while the latter is best achieved with a security proof, a mathematical argument for why the mechanism satisfies the security notion.
It is easy to have an intuitive notion of what security should mean, but defining privacy for voting mechanisms is non-trivial, as shown by the many attempts to do so in the literature. (Bernhard et al. [5] has a good overview of privacy definitions.)
One problem is that many security notions are highly specialised for a particular class of voting mechanisms. But there are a large number of cryptographic voting mechanisms and they exhibit great variety in their form and shape. Defining security notions that usefully and uniformly capture a larger class of voting mechanisms is a good thing in principle, but it is also an essential task if existing security notions do not cover the voting mechanism of interest. We need security notions that capture a larger class of voting mechanisms.
Once we have a security notion, we return to the problem of creating and auditing a security proof. Machine-checked proofs is a good way to increase assurance for security proofs. One system designed for handling security proofs is EasyCrypt. A large set of security notions, cryptographic constructions and corresponding security proofs have been written in EasyCrypt, and the system has seen extensive use. We need machine-checked security proofs for voting mechanisms.
Selene [21] is a voting mechanism designed to provide a simple method for verification, while at the same time mitigating the threat of coercion. The key idea in Selene is that every voter is assigned a personal tracking number, and when the election period is over and everyone has cast their vote, the tracking numbers and the votes are published in plaintext on a web bulletin board. This gives the voters a direct and easy to understand way of verifying that their vote was correctly included in the tally. The key innovation in Selene is how to give the voter a tracking number so that no single party know what tracking number was given to the voter (other than the voter themselves) and the voter can plausibly lie about which tracker they received. The first property is achieved by mixing the trackers as part of the setup.
There is a danger of coercion here, namely that a coercer requires a voter to hand over her tracking number, so that the coercer himself can verify that the voter fulfilled his demands. However, the coercer has a limited window of opportunity, because he needs the coerced voter to hand over her tracker before the trackers and the votes are published. Otherwise, the coerced voter could simply find a vote corresponding to the coercer’s demand, and give the coercer the tracker next to this vote. This observation is used in Selene to counter the threat of coercion: the voters first learn their tracking numbers after the trackers and votes are published on the web bulletin board.
Abstractly, Selene has a different order of operations than many existing voting mechanisms. Schemes like MiniVoting (which in some sense models a large class of voting mechanisms including variants of Helios) do voter verification before tallying. For Selene voter verification must happen after tallying, since the personal tracking numbers do not appear until after tallying. This means that Selene does not fit very well into existing security notions for voting mechanisms. This is also true for a number of other systems where voters or their delegates first can (or choose to) verify after tally. The Selene verification mechanism has also been trialled with a commercial partner [22]. We need a high-assurance security proof for Selene.
Our contribution
This paper extends upon the work by Drăgan et al. [14], where we define the new security notion delay-use malicious-ballotbox ballot privacy () to capture the security of schemes that delay the use of verification information to a post-tallying verification step. This is necessary for tracker based schemes (like Selene [21], Electryo [20] and sElect [19]), for in-person voting schemes where the verification is first done later at home, but further it also applies to e-voting schemes where the verification step is not made mandatory before tallying, or often happens after tally, e.g. when verification is delegatable. To construct our definition, we build upon a recent ballot privacy definition called [13].
We model our new security notion in the proof assistant EasyCrypt [1] (https://github.com/easycrypt/easycrypt), and to validate our security notion, we also model the Labelled-MiniVoting scheme [10] and Belenios [12] and verify that these schemes satisfy ballot privacy both under our new definition and under the original . Furthermore, we model the Selene voting system, and prove that this scheme satisfies ballot privacy under our new security definition. The EasyCrypt code is available at https://github.com/mortensol/du-mb-bpriv.
Here, we extend the original work by investigating the relation between our ballot privacy definition and the notion of individual verifiability, as well as the corresponding relation for the definition (Section 6). Furthermore, we highlight a few lessons that we learned during our effort to formalise the security definitions and proofs in EasyCrypt (Section 7), and which we believe will be useful for future efforts on formalising security notions and security proofs in EasyCrypt. In particular, we discuss the following:
Lesson 1: how to properly restrict adversarial access to an oracle’s internal state, to avoid vacuously true results due to false axioms;
Lesson 2: dealing with random oracles in EasyCrypt;
Lesson 3: how small changes in definitions can lead to large changes in proofs; and
Lesson 4: why it is necessary to split the voters’ internal states into two parts when formalising voting systems (and related proofs) where voter verification happens after the tally has been computed, for security definitions using recovery algorithms.
Related work
Many authors have tried to capture the notion of ballot privacy using standard cryptographic games. Bernhard et al. [5] gives a good general overview of such notions. We give an overview of the history leading up to the recent definition of in Section 2.7, directly preceding our new security notion.
The need for assurance with respect to voting systems makes cryptographic voting schemes a natural target for formalized security proofs, either through symbolic models and automatic verification or via proof assistants. While symbolic models have historically yielded good insights into the analysis of cryptographic protocols, see e.g. [8,25] for symbolic analysis of Selene, we prefer a cryptographic analysis.
EasyCrypt [1] is a proof assistant focused on formalizing computational security proofs in the style of Shoup’s Sequences of Games [24]. EasyCrypt supports constructive proofs of concrete security – leaving the complexity analysis of the constructed reduction to be done by hand. For simplicity in the rest of this paper, we discuss asymptotic notions. The formalized proof is concrete.
Cortier et al. use EasyCrypt to prove that Helios is BPRIV-secure [10], and that Belenios is BPRIV-secure and verifiable [11]. Our proof builds upon their framework – we in fact prove that Labelled MiniVoting and Belenios meet our new privacy definition, and further formalize their security in .
Background
In this section, we first introduce some basic cryptographic models, primitives and algorithms that make up a voting system, before we move on to describe earlier definitions of ballot privacy.
Random oracle model
In our analysis, we model hash functions as random oracles [2]. That is, to compute the value of a hash function at a point x, any party can make a call to an oracle , implementing a random function from some domain D to some range R. The oracle maintains an initially empty table T, and whenever someone calls for some , the oracle checks if there is an entry in T for some . If so, it returns y; if not, randomly generates a , adds to T and outputs . We will use the Random oracle model implicitly below when modelling the non-interactive zero-knowledge proofs that help ensure privacy in e-voting.
Public key encryption
Public key encryption systems are often used in voting protocols, to help protect the privacy of the votes, and possibly other things. In Selene, for instance, both the votes and the voter’s personal tracking numbers are encrypted using some form of public key encryption. Formally, a public key encryption system (PKE) is defined as follows:
A public key encryption scheme (PKE) is a triple of algorithms ; where
is a probabilistic algorithm that takes as input a security parameter λ and outputs a key pair ,
is a probabilistic algorithm that on input a public key and a plaintext m outputs a ciphertext c,
is a deterministic algorithm that on input a secret key and a ciphertext c outputs either a plaintext m or a special error symbol ⊥ indicating that something went wrong.
We require that decryption “undoes” encryption, i.e. for any key pair output by , and any plaintext m, we have that .
A labelled public key encryption scheme (LPKE) extends the notion of a “regular” PKE by adding some additional, non-malleable data called a label [23]. One important property for a labelled PKE is that decrypting a ciphertext using a different label than the one used for encryption, should not reveal anything about the original plaintext. Formally, a labelled PKE is defined similarly to how we define a PKE in Definition 1, but a label ℓ is given as additional input in the encryption and decryption algorithms.
The security of the schemes we analyze rely on a security notion for labelled public key encryption called indistinguishability under chosen ciphertext attack with one parallel decryption query () [3]. Informally, this notion captures that any efficient adversary is unable to distinguish between encryptions of two messages of the same length, when given access to a batch decryption oracle that can be called once.
A similar security notion, namely , allows the adversary to make up to n challenge queries, for some polynomially bounded integer n. This notion is formalized in Fig. 1, where an adversary is given access to an encryption oracle and a decryption oracle . The adversary can make n challenge queries to , who encrypts one of two plaintexts, depending on the bit β. The adversary can query at most once, and the oracle then decrypts a list of ciphertexts. For any ciphertexts created by the encryption oracle, the decryption oracle returns ⊥.
The advantage of a adversary against a labelled public key encryption scheme is defined as
and we say that the labelled PKE is n-challenge -secure if the advantage defined above is negligible in λ for all efficient adversaries .
As noted in [10], if , security is essentially reduced to security. Indeed, it is possible to prove, through a hybrid argument, that a labelled PKE is secure if and only if it is secure. This fact was also verified in EasyCrypt [10], and we were able to reuse this framework in our formalization of the ballot privacy of Selene.
A commitment protocol allows a prover to commit to some value b, and send the commitment to a verifier . The verifier can ask the prover to open the commitment at some later point and verify the output value. Two important properties of a commitment protocol is that it should be binding and hiding. The first property informally means that once has committed to a value, he should not be able to open the commitment to another value. The second property informally means that before the commitment is opened, should not be able to determine what was committed to.
More formally, a commitment protocol is defined as follows:
(Commitment protocol).
A commitment protocol is a triple of algorithms , where
takes as input a security parameter λ and returns a pair of user public and secret keys,
takes as input a user public key and a value we want to commit to, and returns a commitment and an opening key d,
takes as input a commitment and an opening key and returns the committed value.
Commitments are an important part of the coercion mitigation strategy in Selene, where the election officials make commitments to personal tracking numbers for each voter. These commitments are opened by the voters at the end of the election, allowing the voters to verify that their vote was included in the tally, without being able to hand over their tracker to a coercer before all votes and trackers are published. For coercion-resistance, Selene actually employs trapdoor commitments and the voters have secret trapdoor keys that allow them to open the commitment to a tracker satisfying the coercer.
Proof systems
We now describe proof systems which are used in Selene to ensure that various operations are performed correctly. We say that a binary relation is a subset , where X is a set of statements and W is a set of witnesses. A proof system for the relation is a pair of efficient algorithms , where is called the prover and is called the verifier. The prover and verifier work on a common input , and the prover has some additional input . In a non-interactive proof system, uses his input to compute a proof Π. He sends the proof to , who, on input produces a verification output in .
A proof system is said to be complete if the prover can produce a valid proof whenever the statement is true. More formally, for any , if Π is a proof output by , then outputs 1 with probability 1.
A proof system is sound if a prover is unable to convince a verifier that a false statement is true.
A proof system is zero-knowledge if the proof leaks no information beyond the fact that the relation holds. More formally, we demand the existence of an efficient algorithm , called the simulator, that produces valid-looking proofs for a statement without access to the witness w. Formally, we consider a zero-knowledge adversary in the following experiments: The advantage of the zero-knowledge adversary over the proof system and simulator is defined as
and we say that a proof system is zero-knowledge if, for any adversary , there exists a simulator such that the advantage defined above is negligible.
Voting systems
We define a voting system as being built upon a tuple of algorithms
where the different algorithms informally work as follows:
Returns a pair of public and secret data, typically including a public encryption key and a secret decryption key, respectively, but this data might also contain other things.
Takes as input a user identity and some public and secret data and returns a public credential and a secret credential for that user.
Takes as input some public data, a user’s public and secret credentials, and a vote, and returns the user’s public credential, a ciphertext encrypting the vote and a state that the voter later can use for verification.
Checks the validity of the ballot box .
Computes the result r of the election, along with a proof Π of correct tallying.
Run by a voter to check whether or not her ballot was included in the tally.
Checks that Π is a valid proof of correct tally, with respect to the result r and the public part of the ballot box .
Returns a public part of the ballot box .
Voting friendly relations
In the voting systems we analyze in this paper, proof systems are used to compute and validate proofs of correct tally. In our analysis and EasyCrypt formalization, we keep the relation abstract, and thus, we need to ensure that the relation is compatible with the result of the election. For this, we adopt the notion of voting friendly relations, as defined by Cortier et al. [10] and generalize it a bit, so that it also accommodates schemes like Selene. Very informally, a relationship is voting friendly if for any adversarially chosen bulletin board it is possible to find a corresponding tally such that the pair (bulletin board and tally) are in the language.
The relation being compatible with the result of the election means that if is a voting system, is the public and secret data generated by the algorithm, the result r of the election corresponds to the tally of the votes obtained by decrypting the ciphertexts in the ballot box , and if is the public part of , then the relation holds. In other words, it is possible to prove that r is the correct result. More formally, a voting friendly relation is defined as follows:
Let be a voting system and let be a proof system for some relation . We say that is a voting friendly relation if, for any efficient adversary , the following experiment returns 1 with negligible probability:
In the above experiment, the algorithm decrypts the ballot box line by line using the secret data , and returns a list of votes and some additional information , e.g. voter identities as in MiniVoting or tracking numbers as in Selene. We say that ρ is a counting function that takes in a list of the form described above, and returns a result.
Early definitions of ballot privacy
In 2015, Bernhard et al. [5] conducted a survey of existing game-based ballot privacy definitions and found that they were all unsatisfactory. Some of the definitions were too weak, declaring protocols that intuitively did not have ballot privacy to be secure. Some definitions were too strong, making any voting protocol with even minimal verifiability impossible to prove private. Finally, some definitions were too limited, restricting the class of captured voting protocols and privacy breaches too much.
Based on this survey, Bernhard et al. [5] proposed a new definition of ballot privacy which was named . The definition captures the idea that a voting system should not leak any information about the votes that are cast, beyond what can be derived from the result of the election. This is formalized by having an adversary attempting to distinguish between two worlds. In one world, the adversary gets to see a ballot box containing real ballots submitted by honest voters, as well as any ballots the adversary has submitted on behalf of dishonest voters. The adversary then gets to see the result corresponding to these ballots and a proof of correct tally. In the other world, the adversary gets to see a fake ballot box, but he still gets to see the result as tallied on the real ballot box. It is also assumed that there exists a simulator that can simulate a proof of correct tally corresponding to the real result, but with respect to the fake ballot box. The adversary also gets to see this simulated proof.
As the name suggests, ballot privacy only captures the privacy of the ballots and not of the tally. To account for this Bernhard et al. say that any scheme satisfying should also satisfy a property called strong consistency which captures the idea that the tally produced by the scheme should not leak more information than an idealised tally function. The exact idealised tally function is a parameter of the definition. We have proved the strong consistency of Selene as part of our work.
To avoid having to trust the voting server, the strategy in many voting systems is to encrypt the votes under a key for which the corresponding decryption key is split into several parts and distributed among several authorities. However, as is pointed out in [13], this trust assumption is not properly captured in the definition. In , the adversary plays a game where he can control the votes cast by honest parties, but he cannot control the resulting ballots once they are put in the ballot box. This means that assumes that every ballot that is put in the ballot box stays in the ballot box and is not tampered with in any way.
To address this, Cortier et al. introduced a new definition, which they called [13]. The main idea in is similar to : the adversary has to try and distinguish between two worlds: one where he sees real ballots and the real result, and one where he sees fake ballots, but still sees the real result. The difficulty is that an adversary who is in control of the ballot box is able to remove or tamper with any ballots submitted by honest voters. Since we perform the tally on the real ballots in both situations, we need to somehow determine which of the real ballots to perform the tally on. A bad choice would make distinguishing trivial for the adversary.
Cortier et al.’s solution is to parameterize their security definition by a recovery algorithm, an approach we also adopt in our definition. Informally, the idea is to use the recovery algorithm on the adversary’s board in the fake world, to determine how the adversary has tampered with the ballots on the fake board. We then perform the same transformation on the real board, and tally the resulting board.
Formally, Cortier et al. define the aforementioned transformation as a selection function, and recovery algorithm as the process of finding the transformation.
For integers , a selection function for m and n is any mapping
The selection function π represents how the adversary constructs a bulletin board with n ballots, given a bulletin board with m ballots. For ,
, with means that this is the jth element in ,
means that this element is .
The function associated to π maps a bulletin board of length m to a board of length n such that
for any .
A recovery algorithm is any algorithm that takes as input two bulletin boards and and returns a selection function π for and n for some integer n.
We will sometimes abuse notation and write to denote the process of determining the transformation from to , and applying this transformation to , to get the board .
In , the tally occurs only if the adversary’s board is valid, and if none of the voters are unhappy after they perform some kind of verification. This means that the verification process needs to occur before the tally, so does not accommodate voting systems like Selene. Indeed, in Selene, the tally occurs before the verification as a way of mitigating the threat of coercion. Therefore, there is still need for a new privacy definition, that both allows for the voting server (and thus the ballot box) to be dishonest, and that accommodates voting protocols where the verification phase happens only after the tally has been computed.
New security notion
The new security notion for ballot privacy against a dishonest ballot box.
In this section, we present a new definition of ballot privacy against a malicious ballot box, which we call delay-use malicious ballotbox ballot privacy (). This definition essentially extends the range of applicable voting schemes to include those where the verification can be delayed, i.e. happening after tallying, and also includes schemes where a secret key is needed in the verification step. Our new definition is similar to , the most notable difference being that in our definition, the adversary gets to see the tally after we check if his board is valid, and then he gets to see the result of the verification phase. A formal description of the security game for is found in Fig. 2. The relation between and is studied in more detail in Section 3.2.
In the following, let be a set of voter identities, partitioned into a set of honest voters and a set of dishonest voters. Furthermore, let be partitioned into the set of voters who we assume will perform some verification check and the set of voters who we assume will not verify.
The security experiment begins with the generation of some public and secret data and (which typically includes the public and secret keys used to encrypt and decrypt the votes). Then, a number of voters in a set are registered. In the registration phase, public and secret credentials and are generated for each voter, and stored in finite maps and , respectively. Furthermore, we store the secret credentials of a set of dishonest voters in a finite map .
The adversary is now given , and as input. In addition, he gets access to a vote oracle, that on input computes two ballots for this user. The first ballot is stored in a list and the second ballot is stored in a list . The vote oracle also records a state, containing any information the voter later needs to verify that her vote was correctly cast and counted; we split the state into two components. The first component () covers information checked which is generated before tallying and the second component () covers information generated after tallying; this is necessary due to recovery which ensures that information after tallying (including the tally) always acts as if . The adversary may also call on a publish oracle, allowing him to see, essentially, for a secret bit β.
Using this information, the adversary creates a public bulletin board . If the board is invalid, we output a random bit. If the board is valid, we allow the adversary to make an initial guess at the secret bit β, based on the information he has seen so far. This guess is stored in a variable , possibly to be returned by the experiment at a later point.
The adversary now gets access to the tally, and is allowed to add some extra information to the bulletin board, namely a result and a proof that this result corresponds to the votes. If this result and proof fails to pass verification, we output a random bit.
Otherwise, the adversary gets to make a guess d, given access to a verification oracle . The verification oracle records the users who have verified in a set called , and the users who are happy with the verification are recorded in the set . If anyone who we expect should verify actually does not verify, we output a random bit. If a voter is unhappy with the verification process, we output the initial guess the adversary made before seeing the tally. Otherwise, the experiment outputs the guess d that the adversary made after calling the verify oracle.
When given access to the tally oracle, the adversary can call this oracle only once, and the behavior of the tally oracle depends on whether we are in the left world () or in the right world (). If we are in the left world, the tally is performed directly on the board created by the adversary. The adversary then gets to see a real result and a proof of correct tally. In the right world, however, we first run the recovery algorithm to detect how the adversary has tampered with the ballots in , to create . We then change the ballots on accordingly, yielding a new board , which we tally. The adversary then gets to see the result r corresponding to and a simulated proof of correct tally, with respect to r and the adversarial board .
Let be a voting system, and let be a recovery algorithm. We say that satisfies with respect to if there exists an efficient simulator , such that for any efficient adversary ,
is negligible in the security parameter λ, where is the game defined in Fig. 2.
with the recovery functions originally suggested implies the satisfying scheme is equivalent to some ideal functionalities. Future versions of may wish to prove similar results in which a different recovery function is likely necessary.
Recovery function
The definition is well defined for many recovery functions, but three are suggested in the body of [13]. All three recovery functions when used with are only satisfied by schemes which fulfil strong constraints on their verification procedure, and two of them presumes that voters are authenticated, see also discussion in Section 6.
We introduce a simple new recovery function (Fig. 3) which we call which is essentially identical to and in [13], but does not require ballot authentication and without explicit constraints on the verification procedure. with the recovery functions originally suggested implies the satisfying scheme is equivalent to some ideal functionalities. Future versions of may wish to prove similar results in which a different recovery function is likely necessary. In Section 6 below, we will discuss the precise relation between and verifiability, depending on the chosen recovery function.
The algorithm.
Comparison of to
In this section we briefly analyse the differences between the definition and our new definition. As mentioned above, the main difference is that the verification oracle is first available after the tally oracle has been called. This accommodates schemes where the verification first happens after tally and allows a secret key to be used for the verification process, however, it naturally also applies to schemes with early verification. We have changed some parts of the definition to adapt to the delayed use of the verification, but also to make it optimised and precise enough for EasyCrypt.
The main differences are:
We only have one voter map but the state stored depends on secret bit β, see line 4 in the definition of in Fig. 2. However, the state is split into a part relevant before tally and a post-tally part (only relating to which is the board used for tallying). This is necessary for the state handling in EasyCrypt oracles, and was not necessary in since the stateful verification happened before tallying.
If the adversary does not output a valid board, the experiment outputs a random guess bit, whereas allows the adversary to output a guess but without tally access. In both cases this corresponds to real life, where a board will not be tallied if it is not valid. Outputting a random bit makes our proofs in EasyCrypt slightly easier, and it is actually equivalent to unless the algorithm always outputs ⊥.
In our definition the experiment also outputs a random guess bit if the verification of the tally fails via the algorithm . Again, this corresponds to not allowing any adversarial advantage when the tally fails publicly. This case was not explicitly considered in , but is natural in our case where the verification step will not proceed on an invalid tally output.
In the verification status of the honest voters contained in is directly output to the adversary. In this is not defined precisely. In both definitions the appearance of failed verifications inside the set of checking honest voters will in any case imply that the guess bit has to be determined without knowing the tally result. This is to punish the adversary for creating a board with verifications failing which would cause complaints in real life protocols.
Consider a voting scheme where the verification step does not depend on a secret key and can be done before the tally. For such schemes both privacy definitions can be applied to the scheme. We claim that under reasonable conditions our definition is stronger. Further, for voting schemes where the outcome of the individual verifications can be computed by the adversary using the data from the bulletin board, as e.g. happens in Helios, we have equivalence of the two definitions. We now sketch why this is the case.
We show that privacy implies assuming that does not constantly output ⊥, and that never fails on an honestly computed tally. Finally, we also assume that the verification status is not output to the adversary in or that the verification status can be computed using public data, as e.g. happens in Helios. To prove the implication we assume that we have an attack algorithm for . We use the vote choices from the algorithm. If the attack algorithm outputs an invalid board, we change the board that is being output to a valid board which we have assumed exists. Since the tally also does not fail we are allowed to output a guess and we use the one from the attack algorithm and will win with the same advantage since in this case no verification will be done in . If the attack algorithm outputs a valid board, we use the same board in the experiment. In line 12 the adversary has to output a guess bit which will be used if a verification fails for the honest verifier set . Here we use what will be output from the attack algorithm in case of failed verifications, which by assumption either does not depend on , which we do not yet have access to at this stage in the experiment, or it can be computed from the public data on the board. In the experiment line 13 we now get the tally before verification (and the tally verifies by assumption) but we ignore this at first and choose the same verifying voters as in the attack algorithm. At this point the two experiments will be equivalent. If the verification fails we are forced to go back to but this was from the attack algorithm and will be equivalently successful. If no verification fails in then we output the guess from the attack algorithm using the tally result we got earlier as input. The advantage will be the same as for the attack algorithm. This was the important implication direction since it demonstrates that our definition is not too weak, i.e. if an early verification scheme is declared then it is also private which in turn has ideal functionality implications under assumptions such as strong consistency [13].
Considering whether privacy implies , the main problem is that the choice of verifying voters in could depend on the tally output. However, for voting schemes where the outcome of the honest voters’ individual verification can be computed by the adversary, we can prove the implication. We thus assume we have an attack algorithm for with non-negligible advantage. In the experiment we use the votes and output the board from this attack algorithm. If the board is not valid, we simply output a random bit in the experiment , which is equivalent to what happens in . If the board is valid, the next step in is to use the verification oracle. Here we simply let all the required honest voters perform verifications. These will also all have to verify in the attack against to get an advantage since the experiment will otherwise output a random bit. If a verification fails we will use the output in the attack algorithm which was computed without tally access. If none of the verifications fail, we will get tally access in the experiment . If more voters were chosen to verify in the attack algorithm we can compute the outcomes by assumption. If we encounter a failure in the verification here, we again output , otherwise the output from the attack algorithm. The experiments will be equivalent at this point.
Labelled-MiniVoting and Belenios
Algorithms defining the Labelled-MiniVoting scheme for the labelled PKE , and the proof system .
The MiniVoting scheme was first introduced by Bernhard et al. [6] as an abstraction meant to capture several existing constructions in the literature. It is based on two building blocks: public key encryption and a zero-knowledge proof system, and assumes the ballot has the form , for a voter’s identity and for a ciphertext c – that simply encrypts the voter’s choice (or vote) v.
This scheme was later refined by Cortier et al. [10], resulting in the Labelled-MiniVoting scheme. Here, the class of captured voting schemes was broadened by introducing some public information associated to the users, called labels, and creating a strong link between a voter identity in a particular election and its ciphertext – now parameterized by this label via the use of labelled public key encryption. The labels can be used to represent generic information about the election (as in the case of Helios) and/or information pertaining to a voter’s public persona: pseudonym or public verification key (as in Belenios). The ballot in Labelled-MiniVoting takes the form with the voter’s identity and the label-ciphertext pair created by the labelled public key encryption scheme.
A predominant feature of the MiniVoting class of schemes is the enforcement of unique label-ciphertext pairs, via a procedure called weeding [12]. This step, done by the algorithm, prevents trivial attacks on privacy, where an adversary can cast copied ciphertexts (and their label) and observe the changes in the election result.
In Fig. 4 we refine Labelled-MiniVoting to align with the voting notations from previous sections, such that we treat the public credential as the label and consider the following ballot format . The removal of doesn’t have a significant impact, as we use the operation to model the link between identity and public credential:
Typical instantiations for this operator depend on the assumptions over the voter’s identity and the degree of separation we want to capture in the public credential. Here, we consider the voter’s identity a pseudonym or a public encryption key and as such we implement as the identity function ; an approach also taken by Selene. Other options may consider the real voter’s identity (e.g. email, name) as input and create a pseudonym or a public credential (especially if signatures are considered). In all cases, there has to be an injectivity assumption over , which is trivially satisfied by the identity function.
Labelled-MiniVoting is further parameterized by three other classic operators:
.
Checks if the label-ciphertext pair is well-formed.
.
This function returns the election result in a predefined format, e.g. lexicographic order for mixnet tally or a value for homomorphic tally. This is done by first deciding which votes to keep using , and then computing the result over the votes kept by using .
is fixed to “last vote counts” for a particular voter, in the modelling of both Labelled-MiniVoting and Selene.
is left abstract for Labelled-MiniVoting, and made concrete for Selene.
.
This is an abstraction of the public bulletin board, and most of the times it is identical to the ballot box.
Let be a secure labelled PKE, and be a zero-knowledge proof system. Given the operators , , ρ, defined as above, we define the Labelled-MiniVoting scheme
as the single-pass voting scheme whose algorithms are presented in Fig. 4, and which we informally present below:
.
Given the security parameter λ it returns the output of the key generation algorithm for the encryption scheme.
.
It applies over to build the public credential , and does not consider a secret credential, i.e. .
.
Encrypts a vote v using the public credential as the label, and outputs the ciphertext together with the voter’s public credential. The secret credential is omitted from the input just for simplicity as it is unused.
.
Returns true if all ballots are well-formed, according to , and that each ciphertext is always used with the same public credential (weeding property is respected); and false otherwise.
.
Computes the result r of the election by applying the counting function ρ over the list of , where is the decryption of from the ballot box. It also provides a proof of correct decryption Π using the algorithm of the proof system Σ.
.
It checks if its last ballot is in the ballot box . The voter may have a state with all cast ballots, but the verification is done with respect to the last vote that was cast.
.
Run the algorithm of the proof-system to check if the tally proof is valid w.r.t the given statement.
:
Calls the operator over the ballot box .
We prove in EasyCrypt that satisfies under standard cryptographic assumptions for the encryption scheme and proof system Σ. We also consider the identity function for , and “last vote counts” for . The following theorem corresponds to lemma du_mb_bpriv in the MiniVotingSecurity_mb.ec file.
Letbe defined as in Definition
7
. Then, for anyadversary, there exists a simulatorand three adversaries,and, such that
for Belenios
We have used Labelled-MiniVoting to validate our privacy definition, and to infer proof strategies and assumptions that then can be applied to other e-voting systems, e.g. Belenios [12] and Selene. Belenios can be viewed as an instance of Labelled-MiniVoting with some concrete decisions for operators and algorithms. This translates into directly applying the EasyCrypt proof developed for MiniVoting to Belenios without the need to re-do it – as highlighted in Belenios.ec.
We take the labelled encryption scheme and realize it by combining the ElGamal encryption scheme with a zero-knowledge proof system . As the public credential is included in the statement for the proof system we use to express this fact. Encrypting the vote v under some public credential by becomes , and decrypting by returns the decryption of by only if the proof verifies, and ⊥ otherwise.
This form of the ciphertext also has an impact on the algorithm that now uses to decide if a ciphertext is well-formed.
For the result of the election, we only need to instantiate the operator, as we already consider the last vote policy. We can model any type of ideal counting function that can be performed over votes, and instantiate it to lexicographic order to model an ideal verifiable shuffle.
In Belenios, the public bulletin board computed by shows only the last ballot cast by a voter (policy applied over the public credential) together with a hash of that ballot. With verification done against the hash compared to the entire ballot. However, nothing prevents voters from checking their full ballot against the ballot box (as we have modelled in Labelled-MiniVoting). Moreover Cortier et al. [11] have modelled in EasyCrypt different options of for Belenios and as one would expect the two approaches are equivalent modulo hash collisions.
Let be a secure labelled PKE, and Σ be a zero-knowledge proof system. Given the operators , , ρ, defined as above, we define as
The privacy result for Belenios follows directly by simply applying Theorem 1 with the concrete values highlighted here.
Letbe defined as in Definition
8
. Then, for anyadversary, there exists a simulatorand three adversaries,and, such that
Both Theorem 1 and 2 are proven with respect to the recovery algorithm described in Section 3.1.
for MiniVoting and Belenios
We also stated the original definition [13] in EasyCrypt and proved that Theorem 1 and 2 also hold with respect to this privacy definition. The proof strategy was essentially the same, but the proofs had to be re-worked due to the differences between the definitions, especially when the verification happens. This constitutes the first machine-checked proof of . The corresponding EasyCrypt lemma mb_bpriv is found in the MiniVotingSecurity_omb.ec and Belenios_omb.ec files.
Selene
Although Selene offers properties such as verification and coercion mitigation, we focus – in this paper – on formalizing its ballot privacy properties. Like Cortier et al. [10,11], we abstract away the verifiable shuffles (assuming they are non-interactive) and the ElGamal + PoK construction (assuming instead an abstract -secure labelled PKE).
More precisely, we replace the verifiable shuffle – which is used in the tally phase to mix the encrypted votes and trackers while erasing connections between the votes and the voters – with a parametric distribution, which takes as parameters a list of vote/tracker pairs , calculates all possible permutations of the original list, and defines the uniform distribution over the result. Sampling in captures the semantics of a perfect shuffle on ℓ, with a proof of correct shuffle computed separately.2
We note that itself is not probabilistic polynomial time. We treat it as an ideal functionality for verifiable shuffles, whose complexity would normally be probabilistic polynomial time.
Formalizing proofs for interactive protocols, such as verifiable shuffles, in EasyCrypt remains a complex task, and is somewhat orthogonal to our contributions here. In particular, in a setting – like ours – in which the tallier is honest, the shuffle is indeed indistinguishable from our idealization. Prior work [16,17] has proved that the interactive variants of the verifiable shuffles suggested for use with Selene are zero-knowledge proofs which leak no information; this would suffice to prove equivalence with our idealisation. However, this has not been machine checked for the interactive variant due to issues the currently available tools have with handling random oracles.
Before being cast, the votes in Selene are encrypted using the ElGamal public key encryption system [15], and a non-interactive proof of knowledge of the underlying plaintext is appended to the ciphertext. In our EasyCrypt formalization, we abstract away details of the underlying cryptosystem, and encrypt the votes using an abstract labelled PKE that we assume is secure. The ElGamal with proof of knowledge construction used in Selene has in fact been proven to be secure [4,7]. This proof remains out of reach of machine-checking due to its use of the rewinding lemma.
In addition, since we focus on privacy in this paper, we also remove the signatures Selene3
We also have an EasyCrypt proof for Selene with signatures. This is available with the other proofs at https://github.com/mortensol/du-mb-bpriv.
includes on cast ballots to prevent ballot stuffing. The signatures cannot compromise privacy: The signing keys are independent of the encryption of the ballot, the ciphertexts that are signed are public, and the signatures are anyway stripped before shuffling, so they cannot be correlated to any plaintext ballot.
These simplifications, taken together, yield the following model for Selene.
Let be a secure labelled PKE, let be an secure PKE, let be a proof system for a relation and let be a commitment protocol. We define
as the voting system built upon the algorithms given in Fig. 5, which we informally describe below.
Algorithms defining the voting scheme, given an secure labelled PKE scheme , an secure homomorphic encryption scheme , zero-knowledge proof systems , and for the tally proof, the tracker shuffle proof and the proof that commitments are correctly formed, respectively, a commitment scheme , and the abstract operator .
Takes as input a security parameter λ and returns a key pair used to encrypt and decrypt votes, a key pair used to encrypt and decrypt trackers, a list of encrypted trackers, finite maps , , and from voter identities to encrypted trackers, public commitment keys, tracker commitments and openings, respectively, as well as a proof of correct shuffle of the trackers and a proof that the tracker commitments are correctly formed. The public data is
and the secret data is
Takes as input a voter identity and a pair of public and secret data, and returns the voter’s public commitment key, the commitment to her tracker and an opening to the commitment.
Outputs the public part of the ballot box.
Encrypts a vote v and outputs the ciphertext together with the voter’s public credential.
Run the algorithm of the proof-system to check if the tally proof is valid.
For each element in the ballotbox , we perform three checks: every ballot contains a unique voter identity, every ballot is well-formed, and every public credential corresponds to the correct identity.
Decrypts every encrypted vote in the ballot box and the tracker for each voter. Returns the multiset of all vote/tracker pairs .
To verify, a voter opens her tracker commitment and checks if her vote v and tracker is in the list of vote/tracker pairs returned by .
The following theorem establishes that Selene satisfies .
EasyCrypt lemma establishing that Selene satisfies . and G are independent random oracles. S is the ZK simulator.
Let, whereforand any public encryption key, identity, public credentialand vote v. For anyadversary, there exists a simulatorand four adversaries,,and, such that
Theorem 3 corresponds to lemma du_mb_bpriv in SeleneBpriv.ec. Figure 6 displays the EasyCrypt formulation of Theorem 3. The lemma itself is inside a section which quantifies over all core components: Et and Ev denote the encryption schemes used for the trackers and the votes, respectively, P and Ve denote the NIZK’s prover and verifier, C denotes the algorithm, CP denotes the commitment protocol. A is the adversary. The zero-knowledge simulator S and the random oracles for tracker encryption (HRO.ERO) and for the NIZK proof system (G) are defined concretely in the code, but not fully displayed in this paper. The VFR, zero knowledge and security experiments are denoted by VFRS, ZK_L, ZK_R and Ind1CCA, respectively, while the respective reductions are denoted by BVFR, BZK and BCCA. These are also given concrete definitions. Also note that the Ind1CCA module is parameterized by a left-or-right module, representing the case where and the case where , respectively. The security experiment is parameterized by a recovery algorithm, and we use the concrete recovery algorithm described in Section 3.1.
We now sketch the proof of Theorem 3. The EasyCrypt formalization of the full proof is found in the file SeleneBpriv.ec. Unless explicitly stated, all the modules and lemmas we refer to in the following are also found in this file.
In our EasyCrypt formalization, we split the security experiment into two different games, one for and one for . The difference between the two games is, as described earlier, what the tally algorithm does. These games are modeled in the modules DU_MB_BPRIV_L and DU_MB_BPRIV_R, respectively, which are found in the file VotingSecurity_mb.ec.
Starting out from the left side security experiment, the first step is to replace the tally proof produced by the prover in the proof system, by a simulated proof produced by the simulator . This change is modeled in the game G1L. Provided that the proof system is zero-knowledge, the adversary cannot distinguish between the original game and the game where we simulated the proof, except with negligible probability.
We then define a new game, G2L, where we stop decrypting honestly created ciphertexts, and instead use the ciphertexts stored in (as described in Section 3). Ciphertexts submitted by the adversary are decrypted as usual. We also remove one of the bulletin boards from the vote oracle, so that only the left-side votes are stored. We prove that G1L and G2L are equivalent. The equivalence follows from the correctness property of the encryption scheme used to encrypt the votes, and the fact that the adversary only gets to see in the left side security experiment.
Starting out from the right side security experiment (DU_MB_BPRIV_R), we first stop decrypting honest ciphertexts, and prove that the resulting game (G1R) is equivalent to DU_MB_BPRIV_R. The intuition is the same as for the equivalence between G1L and G2L.
We then define a game G2R where we stop performing recovery on the adversarially created ballot box, and simply perform the tally on the ballot box the adversary outputs. We prove that G1R and G2R are equivalent. Intuitively, this holds because the honest votes no longer come from the adversary’s board, but from , and the ballots submitted by the adversary are present both on the adversary’s board and on the recovered board, by definition of our recovery algorithm.
In the final game, G3R, we remove one of the bulletin boards in the vote oracle. This is similar to what we did in G2L, but now only the right side votes are stored. We prove that G2R and G3R are equivalent. This also shows that the final game on the right side, G3R, is completely equivalent to DU_MB_BPRIV_R.
Finally, we show that the probability in distinguishing between the games G2L and G3R is equivalent to the probability of winning the game.
Verifiability
In this section we will consider the relation between and individual verifiability. Indeed, in [13] it was proven that a scheme satisfying and strong consistency will satisfy a special form of individual verifiability which depends on the chosen recovery function in .
We will here shed more light on this relation both for and , in particular, the relation is hardcoded into the recovery function and it is instructive to understand this in more detail.
The algorithm from [13]. Here extracts the voter identity from the public credential .
Actually, the recovery function not only defines which attacks are allowed by the adversary, it also determines which properties the individual verification has to verify, by directly allowing attacks if the verification fails to check for this. Consider the two recovery functions in Fig. 7 and in Fig. 8. In the attacker is allowed to delete and reorder votes of honest non-checking voters, and in to delete, reorder and replace ballots of honest non-checking voters. For both it is hardcoded that if an honest checking voter’s ballot is deleted, replaced or changed4
We note that all the proposed recovery functions are too strong, in the sense that if a ballot is modified in any way but without changing the plaintext, this would give an attack, but the vote result would be unchanged and hence this does not constitute an attack. E.g. Belenios-RF [9] or Selene would not detect this during verification, but would still be private in practice.
in any way there will be different tally results in the two worlds. Reordering would not cause different results. Thus if the verification procedure does not detect these changes, the adversary will be able to tally and win the game. In turn, this hardcoding of what the verification should achieve will also give the relation to verifiability, as we will prove below. However, what we define for the verifying voters, may not be related to what is defined to be legal actions for the attacker for the honest non-verifying voters, which makes the definition somewhat opaque.
The algorithm from [13]. This is identical to from Fig. 7, except that votes from are also considered as cast.
Following the argument of [13] e.g. deleting votes does have implications for privacy, basically reducing the remaining anonymity set – in the extreme case an attacker could delete all votes except one to reveal that vote with the tally. However, above we did not demand that verification detects vote re-ordering which could potentially have just as large consequences for privacy, e.g. in the case where all voters cast two ballots and the attacker knows all the first ballots. Swapping ballots in a last-vote-counts policy then decreases privacy just as deleting votes would do.
In our recovery function we do not put a constraint on what verification should do, but treat all honest voters equally. This means that replacement of voter ballots is allowed for verifying voters, contrary to where it would lead to an attack. This, in principle, allows us to model attacks where the adversary learns the outcome of a plaintext verification after submitting a ballot with a given candidate on behalf of the voter. As an example, if the adversary submits a ballot for candidate A on behalf of a voter and observes that the verification is successful, then the adversary can deduce that the voter voted for A. This does not mean that with is too weak, because replacement is ruled out for verifying voters, but it does mean that this high precision privacy attack would be overlooked, due to concerns of less privacy-violating attacks.
On the other hand, since does not give an explicit constraint on the verification, this also means that the with our recovery function is not in general directly implying verifiability. On a technical level: A verifiability attack that changes ballots would in most cases be seen as cast ballots by the recovery function and hence be both on the left and right world boards. Hence the tally would not change.
To investigate the relation to verifiability, we will thus focus on the recovery functions from [13]. We note that in [13] the implication to individual verifiability was proven by first using that strong consistency and implies an ideal functionality (depending on the chosen recovery function) which in turn implies individual verifiability. It is however instructive to prove this relation directly, which also allows a comparison of the advantage.
We note that a privacy definition can only ever imply a weak kind of verifiability, since the adversary is not allowed to know the secret key. Normally, we want to ensure all types of verifiability even if there are inside attackers. There are several examples of e-voting systems not being secure when the adversary knows the election secret key. We thus label this as weak individual verifiability.
In Fig. 9 we define the experiment for weak individual verifiability with post-tally verification as an updated and stronger version of the individual verifiability in [13]. We here use to control the information the adversary gets from vote casting, just as in , whereas in [13] the adversary gets the output of the vote casting algorithm. and are the list of identity-vote pairs from the checking and non-checking honest voters, respectively. The relation defines the property on the honestly cast ballots that the verifiability should protect.
Weak individual verifiability with post-tally verification for maliciously generated board and tally.
The reason that our individual verifiability definition is stronger than [13] is that their definition assumes an honestly created tally, however, in our case the adversary can output the tally result and proof, but we will check whether the proof is valid using .
(Weak individual verifiability).
Let be a voting system and a relation. We say that fulfills weak individual verifiability w.r.t. against a malicious board and tally if for any polynomial adversary we have that is negligible in λ.
Further, we need strong consistency. We here refer for App. A in [13] for the precise definition. However, what it says is that there exist extractors which extract an identity and a plaintext vote from a general ballot. The extractors are denoted from the identity, which was already used in the recovery functions above, and for extracting the vote, which may use the secret election key in . When applied to honestly generated ballots the extractors are correct with overwhelming probability. In general, the extractors can return ⊥ on invalid ballots. Further, with overwhelming probability, the tally of a bulletin board submitted by the adversary is the result function on the votes extracted from the ballots. That is, if the submitted bulletin board is of the form then if the board is valid we will with overwhelming probability have that the result is the result function computed on the extracted votes:
where we have combined the two extractors into one.
The recovery function used in will decide which relation is secured in the weak individual verifiability. We will only focus on the two recovery functions above. Following [13], also for the notation, the relation corresponding to is
Here ≈ means that the list is the same up to ordering, i.e. they are the same in a set-sense. As discussed above, the recovery function does not recover the order for the checking voters. In words, each verifying voter will have one of their votes counted.
For the relation is
Here ⊑ means inclusion in a set-sense. In words, the verifying voters will have one of their votes counted and the non-verifying voters can have their votes deleted, but not changed (in practice this would have to be ensured via authentication). In both cases, we assume that the result function is stable in the sense of [13], namely is independent of the ordering of votes, as long as the votes for each appear in the same order.
We are now ready to state our theorem that and strong consistency implies weak individual verifiability with the corresponding relation function. Since we go directly from privacy to verifiability, and not via an ideal functionality as in [13], we can directly relate the adversarial advantages, and further, as discussed above, we descend to a stronger version of verifiability with non-honest tally. For the proof there is a technical obstacle in that the verification in the right world still uses the state from the left world for the post-tally verification, a fact that we will discuss in lesson 4 in the next section. This gives a degradation factor .
Letbe a strongly consistent voting system satisfyingwith respect to the recovery functionrespectivelyand let the result function be stable, thensatisfies weak individual verifiability w.r.t. to the relationrespectively. We here assume that these relations are computable in polynomial time. Given an adversarywith advantageagainst weak verifiability, we can build an adversarywithagainst.
Let be an adversary against individual verifiability w.r.t. to the relation and advantage . We then take a challenge from and feed it to against the verifiability experiment. When calls we call in the privacy experiment, and we answer ’s calls to by calling it in the privacy experiment. For we output a random value. In the same way we answer from the privacy experiment, and we get a result and proof from . If we are in the left world, will succeed with probability greater than . We can now again feed queries to from the privacy experiment. In the left world the two games are aligned and with probability the individual verification will succeed, but the result will not fulfill the relation. If the attack does not succeed, we can output a random guess (happens either automatically because a verification fails, or we see the relation is fulfilled). On the other hand, if we are in the right world, the recovery function will make sure that the board always fulfils the relation: For it will make sure that all votes from verifying voters will appear on (but might be reordered). By strong consistency the tally result will thus contain the votes from the honest voters and fulfil the relation. For , the votes from the checking voters will likewise be added back in by the recovery function. For the ballots for the honest non-verifying voters, they will either be kept if they were untouched or thrown away if they were changed in any way. Again by strong consistency, the result will contain all votes from checking voters, and a subset from non-checking honest voters, and hence fulfil the relation. Since we assume that the relation can be computed in polynomial time, we can let the adversary output 1 if the relation does not hold, and 0 if it holds. Since the recovery function changes the ballots to the left hand ballots, and we use the left state for verification, the algorithm will most likely work, however, this cannot be guaranteed. Hence we do not know in the right world how often the board will pass verification. However, when we reach a verified board, we can distinguish the two worlds using the relation, in all other cases the outputs are manifestly random. We thus have that
up to negligible contributions. □
We note that a weaker form of strong consistency would have been sufficient for this proof, but might be needed for more general recovery functions and relations.
We note that the fact that our proof of the implication from privacy to verifiability is direct means that it should be possible to formalise the proof in EasyCrypt. This is in contrast to the proof in [13] which relies on the intermediate ideal functionality, which would be much harder to formalise in EasyCrypt. We leave such a formalisation as future work.
Lessons learned
In this section we highlight and discuss a few important lessons that we learned in the process of formalising the security definition and the security proof of Selene in EasyCrypt. We believe that these lessons will be useful to develop and evaluate future efforts on formalising both security proofs for electronic voting protocols and other security proofs in EasyCrypt.
Lesson 1: Vacuity in adversary quantifications
EasyCrypt has proved useful in formalising – and sometimes spotting issues in – existing security proofs, related to standard security notions. This work and the series of work on the security of electronic voting that precedes it also make definitional contributions, which are developed in parallel with their formal counterparts.
In a setting where any change is costly, having definitions and proofs evolve together is a risky proposition, and can very easily lead to very complex proofs of vacuous results. Such issues can obviously arise, from stating trivially false axioms under which the results are proved to hold. But EasyCrypt proofs also offer less obvious pitfalls in the form of universal quantification over adversaries.
Our top-level results, expressed on paper, are simply of the form
In EasyCrypt, the universally-quantified adversary above very often needs to be further restricted to:
Not share memory with other parts of the system – often the security experiment and related oracles; and
Terminate with probability 1 when run with access to oracles that terminate with probability 1.
The first restriction accounts for a convenience feature of EasyCrypt’s memory model, which makes all global variables visible to all modules – an issue when trying to hide their value from an adversary, but a convenience when writing a reduction that peeks into an oracle’s state.
The second restriction accounts for the fact that termination cannot be either assumed (because it is possible to write a non-terminating adversary) or discharged automatically. Making the assumption explicit where needed gives some flexibility in instantiating complex arguments. It is possible to write – in intermediate proof steps that hold regardless of the adversary’s termination status – non-terminating adversaries (or adversaries whose termination is hard to prove, such as those that make use of rejection sampling). But in places where termination is needed, a termination proof must be provided.
As we discovered after completing our proof, it is very easy to state these restrictions too strongly – and sometimes strongly enough as to be vacuous.
More concretely, let us consider the case of some adversary that expects oracle access to some oracle . A natural way of expressing restriction 2 above on some fixed adversary would be as follows.
This natural expression is, however, too strong: it expects that should terminate even if the oracles she can query can interfere with her memory (and even if she, in turn, can interfere with her oracles’ internal memory). This would, pathologically, prevent a theorem proved under this restriction from applying to an adversary who calls its oracle inside a simple while loop – where the oracle could simply bump the loop index back every time it is called to prevent the adversary from terminating. The correct way of stating the restriction above in EasyCrypt is (in prettified syntax) as follows, taking care to prevent from interfering with the execution of .
Spotting such missing restrictions is not easy: our first proof fell prey to one of them. Such assumptions are often present in top-level statements, which are not immediately instantiated. As a consequence, the hypothesis is not discharged – which would reveal it as too strong – and simply remains as a “hidden axiom” which severely limits the applicability of the formal result.
This issue is one of a few issues with EasyCrypt’s approachability to new users which may cause more than frustration, and we have fed it back to the development team. We will keep in touch with them to design and develop new ways of expressing adversary constraints that are less vacuity-prone.
Lesson 2: How to deal with random oracles in EasyCrypt
EasyCrypt already provides a library for working with Random Oracle Models (ROM) that formalizes non-programmable eager and lazy models, bounded eager and lazy models, and programmable lazy models. Moreover, they also provide a lemma that shows that eager and lazy ROM can be used interchangeable within a non-programmable model. This result proves to be invaluable, especially when dealing with modules that have access to a random oracle, like the non-malleable encryption scheme we had to deal with in our formalisation and proofs.
As EasyCrypt does not allow one to write invariants using modules we had to introduce operators to capture the same behavior while taking care to model exactly how these modules interact with the random oracles. For example, on paper one may write the following invariant independent of the type of ROM considered:
to express that for any ballot that we have in we store the plaintext in when the adversary has called the oracle . However, to do this in EasyCrypt we had to introduce an operator dec_cipher_vo that is equivalent to the call as long as the state (or map) of the random oracle has not changed or is an eager random oracle. Thus, we used the following invariant:
As an additional benefit of this formalisation and using the non-programmable eager ROM, we could start using these type of invariants as soon as calls to are made and reason on the outputs of and maintain that invariant until we need to discharge it later in the tally phase.
Another important lesson was on understanding that e-voting protocols can use more than one ROM, and the adversary and proof strategy may rely on having different assumptions over them. This is where the lesson on ROM from [10] proved invaluable as we had to model the following ROM for our e-voting protocols , and :
– the ROM that the encryption scheme used,
– the ROM used by the zero-knowledge proof of correct tally, and
– the ROM defined by the simulator .
This separation allowed to accurately model the cryptographic primitives that require specific ROM access, and to have efficient proof reduction steps, e.g., replacing the zero-knowledge proof system withe the simulator also implied replacing ROM with the ROM of . That meant, for example in the case of Selene going from this definition
to this type of definition undetected by the adversary
Lesson 3: Small changes lead to large changes
Proofs in EasyCrypt are tied to exact module and type formats, and even trivial changes can have drastic impact. One such example was when we adapted the proof of Selene to work for “Selene with signatures”. More precisely, we enriched the ballot considered by Selene to by including a signature s verifiable using the public credential of the voter providing the encryption c of the vote v. That meant, given the signature scheme we changed the Selene from Definition 9 to
and updating all previous steps to handle the signature. However, it still amounted to definitions and proof changes of 2500 line out of 7070 lines for the Selene proof.5
Our EasyCrypt formalization has 19340 lines of code with 1310 line for CORE (e.g., ElGamal, LPKE, proof system, hybrid argument), 7070 line for the Selene proof, 6860 lines for Selene with Signature proof, and 4100 lines for Belenios and MiniVoting proof.
The main reason for this significant number of updated lines is due to the fact all EasyCrypt proofs are manually produced. EasyCrypt only provides automatic guarantees on the proof verification. For example, any modification in an invariant (independent of the scale, e.g. to ) leads to re-proving all statements involving that invariant.
Additionally, the experience of previous EasyCrypt development [10,11] has helped in avoiding making direct assumptions even when we were confident on the statements, e.g., non-malleability (or IND-1-CCA) over the entire ballot b compared to just . As an alternation to that ballot form to makes this malleable due to the presence of the signature.
Lesson 4: Split the voter state into two parts in the presence of a recovery algorithm
In Section 3 we briefly mention that we split the voter state into two different parts, and . We now discuss this further and elaborate on the reason why this is necessary.
First, recall that in the definition, the adversary gets access to a vote oracle when creating the ballot box. The vote oracle records, among other things, a voter state, which is used for verification later on. The state can for example be a ciphertext encrypting the voter’s ballot (as in Labelled-MiniVoting) or a plaintext ballot (as in Selene). The reason why we need to split the state into two parts is partly due to the fact that we want to accommodate schemes where post-tally verification is not enforced (e.g. Labelled-MiniVoting) and schemes where verification can only happen after the tally has been computed (e.g. Selene), and partly due to the presence of the recovery algorithm.
For schemes where post-tally verification is not enforced, i.e. schemes where it is typically possible to do verification both before and after tally, the information needed to verify is typically a ciphertext, which the voter checks whether or not is included in the ballot box. In other words, the voter state is checked against some data that is available prior to tallying. On the other hand, in schemes where verification has to happen after tally, the verification check is typically performed on data that is produced by the tally algorithm (e.g. a list of plaintext ballots).
Now, one could imagine that it would be sufficient to only have one voter state, as this state would (for example) be a ciphertext in schemes where post-tally verification is not enforced. It would still be a ciphertext that is checked against some available data, independent of whether or not the voter actually performs the verification check before or after the tally. However, due to the recovery algorithm, it is indeed necessary to split the state into two parts.
The first part of the state, , should depend on the secret bit β. Since this part of the state is used in schemes where verification could occur before tallying, the voter should verify against the ballot box created by the adversary, both in the real world and in the fake world. For example, in Labelled-MiniVoting, if we are in the real world, the voter should check that the left side encrypted ballot is a member of the left side ballot box , and if we are in the fake world, the voter should check that the right side ballot is a member of the right side ballot box .
However, for schemes with post-tally verification, the situation is slightly different. Recall first how the recovery algorithm works, and what the goal of the adversary is. The adversary is tasked with distinguishing between two different worlds. In the real world, the adversary gets to see real ballots and the real result as well as a proof of correct tally. In the fake world, the adversary gets to see fake ballots, but he still gets to see the result as tallied on the real ballots, as well as a simulated proof of correct tally. As the adversary is allowed to tamper with the ballot box, the recovery algorithm is needed to decide which of the honest ballots to include in the tally. Thus, for schemes where post-tally verification is enforced, the voter should always verify against data as if she was in the real world. For this reason, the vote oracle should always record the second part of the state as if it was in the real world.
In summary, in the presence of a recovery algorithm, it is necessary to split the voter’s state into two parts: one for where verification occurs on the bulletin board, and one for where verification occurs on data produced during the tally. As the state should depend on the secret bit β when verification happens with respect to information produced before tallying and it should be independent of β when verification happens with respect to information produced by the tally algorithm, the vote oracle records when , and when .
Note that we discuss state splitting only for electronic voting protocols. Examining the applicability of this approach to other kinds of protocols is out of scope for this paper, but an interesting area of future research.
Concluding remarks and future work
In this work we presented a refined version of the privacy definition which we call delay-use malicious-ballotbox ballot privacy (). Our new definition allows the modeling of schemes (such as Selene) where verification occurs after tallying. The security claim is also more explicit. We formalised our new definition in the interactive theorem prover EasyCrypt and showed that labelled MiniVoting, Belenios and Selene all satisfy the definition. We also proved that MiniVoting and Belenios satisfy the original privacy definition. Furthermore, we have discussed the relation between and individual verifiability (and the corresponding relation for ) and proved that a voting system satisfying and a property called strong consistency, also satisfies a form of individual verifiability which depends on the choice of recovery function. Finally, we have highlighted and discussed a few important lessons that we believe will be useful for future efforts on formalising electronic voting protocols and related security properties in EasyCrypt.
While we have encoded Selene correctly in what we firmly believe is the most appropriate privacy definition in literature, our work highlights certain defiances in privacy definitions which future work should address. The defiances are fairly deep and addressing them is far out of scope for this work. We will, however, briefly mention two principal defiances of and related definitions. First, the definition, while handling a malicious bulletin board, assumes honest setup. Secondly, and related definitions are highly calibrated to schemes where the auxiliary data produced by tally to be used in verification are zero-knowledge proofs. In particular, schemes like Selene which output trackers to use for verification are difficult to express in these definitions.
Further, the style of definition implies certain restrictions. As an example, the adversary can only see the result and verification from the left side board which constrains the attacker model. In particular, this means that we cannot detect privacy attacks relying on inducing candidate-specific errors for an observed voter while giving the adversary access to whether the corresponding voter verification fails or not, see e.g. [18] for such a style of attack. Of course, such attacks can be ruled out by considering recovery functions preventing any changes to honestly cast votes as in [13], but it is not the case in general. An important line of future research is thus to find alternative definitions capturing both more general and more transparent attacker models, e.g. by decreasing the generality of the definition or to consider simulation based security.
Footnotes
Acknowledgments
We thank the anonymous reviewers at IEEE Computer Security Foundations Symposium for their helpful comments. This work was supported by the Luxembourg National Research Fund (FNR) and the Research Council of Norway (NFR) for the joint project SURCVS (FNR project ID 11747298, NFR project ID 275516), and by the FNR project STV (C18/IS/12685695/IS/STV/Ryan). Thomas Haines is the recipient of an Australian Research Council Australian Discovery Early Career Award (project number DE220100595). Constantin Cătălin Drăgan was supported by the EPSRC grant EP/W032473/1 (AP4L), and European Union’s Horizon grants 101069688 (CONNECT) and 101070627 (REWIRE) – funded by UK government Horizon Europe guarantee and administered by UKRI under grants 10043743 (CONNECT) and 10043730 (REWIRE). Further, this work received funding from the France 2030 program managed by the French National Research Agency under grant agreement No. ANR-22-PECY-0006. We thank the (rest of the) EasyCrypt team for the continued development of the tool and its libraries.
References
1.
G.Barthe, F.Dupressoir, B.Grégoire, C.Kunz, B.Schmidt and P.-Y.Strub, EasyCrypt: A Tutorial, Springer International Publishing, Cham, 2014, pp. 146–166.
2.
M.Bellare and P.Rogaway, Random oracles are practical: A paradigm for designing efficient protocols, in: ACM CCS 93, D.E.Denning, R.Pyle, R.Ganesan, R.S.Sandhu and V.Ashby, eds, ACM Press, 1993, pp. 62–73.
3.
M.Bellare and A.Sahai, Non-malleable encryption: Equivalence between two notions, and an indistinguishability-based characterization, in: CRYPTO’99, M.J.Wiener, ed., LNCS, vol. 1666, Springer, Heidelberg, 1999, pp. 519–536.
4.
D.Bernhard, Zero-knowledge proofs in theory and practice, PhD thesis, University of Bristol, 2014.
5.
D.Bernhard, V.Cortier, D.Galindo, O.Pereira and B.Warinschi, SoK: A comprehensive analysis of game-based ballot privacy definitions, in: 2015 IEEE Symposium on Security and Privacy, IEEE Computer Society Press, 2015, pp. 499–516. doi:10.1109/SP.2015.37.
6.
D.Bernhard, V.Cortier, O.Pereira, B.Smyth and B.Warinschi, Adapting helios for provable ballot privacy, in: ESORICS 2011, V.Atluri and C.Díaz, eds, LNCS, Vol. 6879, Springer, Heidelberg, 2011, pp. 335–354.
7.
D.Bernhard, O.Pereira and B.Warinschi, How not to prove yourself: Pitfalls of the Fiat–Shamir heuristic and applications to Helios, in: ASIACRYPT 2012, X.Wang and K.Sako, eds, LNCS, Vol. 7658, Springer, Heidelberg, 2012, pp. 626–643. doi:10.1007/978-3-642-34961-4_38.
8.
A.Bruni, E.Drewsen and C.Schürmann, Towards a mechanized proof of Selene receipt-freeness and vote-privacy, in: International Joint Conference on Electronic Voting, Springer, 2017, pp. 110–126. doi:10.1007/978-3-319-68687-5_7.
9.
P.Chaidos, V.Cortier, G.Fuchsbauer and D.Galindo, BeleniosRF: A non-interactive receipt-free electronic voting scheme, in: ACM CCS 2016, E.R.Weippl, S.Katzenbeisser, C.Kruegel, A.C.Myers and S.Halevi, eds, ACM Press, 2016, pp. 1614–1625.
10.
V.Cortier, C.C.Dragan, F.Dupressoir, B.Schmidt, P.-Y.Strub and B.Warinschi, Machine-checked proofs of privacy for electronic voting protocols, in: 2017 IEEE Symposium on Security and Privacy, IEEE Computer Society Press, 2017, pp. 993–1008. doi:10.1109/SP.2017.28.
11.
V.Cortier, C.C.Dragan, F.Dupressoir and B.Warinschi, Machine-checked proofs for electronic voting: Privacy and verifiability for Belenios, in: CSF 2018 Computer Security Foundations Symposium, S.Chong and S.Delaune, eds, IEEE Computer Society Press, 2018, pp. 298–312.
12.
V.Cortier, D.Galindo, S.Glondu and M.Izabachène, Election verifiability for helios under weaker trust assumptions, in: ESORICS 2014, Part II, M.Kutylowski and J.Vaidya, eds, LNCS, Vol. 8713, Springer, Heidelberg, 2014, pp. 327–344.
13.
V.Cortier, J.Lallemand and B.Warinschi, Fifty shades of ballot privacy: Privacy against a malicious board, in: CSF 2020 Computer Security Foundations Symposium, L.Jia and R.Küsters, eds, IEEE Computer Society Press, 2020, pp. 17–32.
14.
C.C.Drăgan, F.Dupressoir, E.Estaji, K.Gjøsteen, T.Haines, P.Y.A.Ryan, P.B.Rønne and M.R.Solberg, Machine-checked proofs of privacy against malicious boards for Selene & Co, in: 2022 IEEE 35th Computer Security Foundations Symposium (CSF), 2022, pp. 335–347. doi:10.1109/CSF54842.2022.9919663.
15.
T.ElGamal, A public key cryptosystem and a signature scheme based on discrete logarithms, in: CRYPTO’84, G.R.Blakley and D.Chaum, eds, LNCS, vol. 196, Springer, Heidelberg, 1984, pp. 10–18.
16.
T.Haines, R.Goré and B.Sharma, Did you mix me? Formally verifying verifiable mix nets in electronic voting, in: IEEE Symposium on Security and Privacy, IEEE, 2021, pp. 1748–1765.
17.
T.Haines, R.Goré and M.Tiwari, Verified verifiers for verifying elections, in: ACM CCS 2019, L.Cavallaro, J.Kinder, X.Wang and J.Katz, eds, ACM Press, 2019, pp. 685–702.
18.
S.Kremer and P.B.Rønne, To du or not to du: A security analysis of du-vote, in: 2016 IEEE European Symposium on Security and Privacy (EuroS&P), IEEE, 2016, pp. 473–486. doi:10.1109/EuroSP.2016.42.
19.
R.Küsters, J.Müller, E.Scapin and T.Truderung, Select: A lightweight verifiable remote voting system, in: 2016 IEEE 29th Computer Security Foundations Symposium (CSF), IEEE, 2016, pp. 341–354. doi:10.1109/CSF.2016.31.
20.
P.B.Rønne, P.Y.Ryan and M.-L.Zollinger, Electryo, in-person voting with transparent voter verifiability and eligibility verifiability, 2021, arXiv preprint arXiv:2105.14783.
21.
P.Y.A.Ryan, P.B.Rønne and V.Iovino, Selene: Voting with transparent verifiability and coercion-mitigation, in: FC 2016 Workshops, J.Clark, S.Meiklejohn, P.Y.A.Ryan, D.S.Wallach, M.Brenner and K.Rohloff, eds, LNCS, Vol. 9604, Springer, Heidelberg, 2016, pp. 176–192.
22.
M.Sallal, S.A.Schneider, M.Casey, C.C.Dragan, F.Dupressoir, L.Riley, H.Treharne, J.Wadsworth and P.Wright, VMV: Augmenting an internet voting system with Selene verifiability, 2019, CoRR abs/1912.00288.
23.
V.Shoup, A proposal for an ISO standard for public key encryption, Cryptology ePrint Archive, Report 2001/112, 2001. https://eprint.iacr.org/2001/112.
24.
V.Shoup, Sequences of games: A tool for taming complexity in security proofs, Cryptology ePrint Archive, Report 2004/332, 2004. https://eprint.iacr.org/2004/332.
25.
M.-L.Zollinger, P.B.Rønne and P.Y.Ryan, Short paper: Mechanized proofs of verifiability and privacy in a paper-based e-voting scheme, in: International Conference on Financial Cryptography and Data Security, Springer, 2020, pp. 310–318. doi:10.1007/978-3-030-54455-3_22.