Symbolic protocol verification generally abstracts probabilities away, considering computations that succeed only with negligible probability, such as guessing random numbers or breaking an encryption scheme, as impossible. This abstraction, sometimes referred to as the perfect cryptography assumption, has shown very useful as it simplifies automation of the analysis. However, probabilities may also appear in the control flow where they are generally not negligible. In this paper we consider a framework for symbolic protocol analysis with a probabilistic choice operator: the probabilistic applied π-calculus. We define and explore the relationships between several behavioral equivalences. In particular we show the need for randomized schedulers and exhibit a counter-example to a result in a previous work that relied on non-randomized ones. As in other frameworks that mix both non-deterministic and probabilistic choices, schedulers may sometimes be unrealistically powerful. We therefore consider two subclasses of processes that avoid this problem. In particular, when considering purely non-deterministic protocols, as is done in classical symbolic verification, we show that a probabilistic adversary has – maybe surprisingly – a strictly superior distinguishing power for may testing, which, when the number of sessions is bounded, we show to coincide with purely possibilistic similarity.
Automated symbolic protocol verification, based on the seminal work of Dolev and Yao [26], has nowadays reached a level of maturity enabling successful use on complex real-world security protocols, including TLS [7,23], Signal [19], authentication protocols of the 5G standard [4], or EMV’s secure payment protocols [5] to name only a few. In the symbolic model, a non-deterministic, computationally unbounded attacker is assumed to have complete control of the network, being able to intercept any messages, and forge new ones. As a counterpart, cryptography is idealized and the attacker can only use predefined rules to manipulate messages that are represented by terms, e.g., expressed by an equation stating that a message m encrypted with k can be decrypted with the same key. This treatment of cryptography is in opposition to computational models where we assume a probabilistic polynomial time attacker, messages are represented by bitstrings and assumptions that an arbitrary such attacker has at most negligible probability of breaking a cryptographic primitive. Similarly, in the symbolic model, random values, such as keys or nonces, are chosen freshly from an infinite domain, rather than chosen randomly from a sufficiently large domain. These symbolic abstractions of cryptography and randomness have even been shown sound [22] (under rather strong assumptions) and significantly ease the automation of proofs. Hence, symbolic modeling of messages is arguably useful for formally analyzing cryptographic protocols.
However, the above-described abstractions of randomness only apply to the messages, and not to the control flow. Typical examples which crucially rely on randomized control flow are mechanisms for providing anonymity, such as the dining cryptographers protocol [14], mix-nets [13] or Crowds [32]. In this paper, we will investigate indistinguishability properties, expressed as equivalences in a cryptographic process calculus, the applied π-calculus [1], extended with a probabilistic choice operator. Typically, the testing equivalence expresses that two processes are equivalent if they exhibit the same behaviour when put in parallel with an arbitrary attacker process. Our work presents foundations for a model that (i) extends the scope of symbolic protocol analysis to probabilistic protocols, and (ii) allows to consider a probabilistic attacker (even on non-probabilistic protocols). In particular, when we consider purely concurrent processes – without probabilistic behavior – the equivalence we obtain is strictly stronger than the standard testing equivalence on such purely concurrent processes; in other terms, probabilistic adversaries are – for good reasons, as we will argue – more powerful in order to distinguish such processes than the purely concurrent adversaries considered in existing works and tools.
Summary of the relationship between preorders.
Our contributions. In a first part we introduce a probabilistic applied π-calculus and its semantics, which has similarities to [28], with two major differences. (i) We express our semantics in terms of general non-deterministic probabilistic transition systems (NPLTS) – also called probabilistic automata in the literature – which allows us to benefit from a large body of existing results on these systems [9,10,25,27,31,33,34]. (ii) More importantly, we differ in the way non-determinism is resolved: unlike [28] we allow for randomized schedulers – rather than choosing one particular non-deterministic choice, we allow the scheduler to choose an arbitrary distribution on the available non-deterministic choices.
Second, we define several notions of preorders and equivalences and study their relations. The main results are also summarized in Fig. 1, focusing on preorders (with similar relations between corresponding equivalences). We show, in particular, that
unlike in the purely non-deterministic case, the may-testing preorder () is strictly stronger than the trace equivalence preorder () (Theorem 1);
simulation () and observational pre-order (), respectively bisimilarity and observational equivalence, coincide for randomized schedulers (Theorem 2);
for non-randomized schedulers, these equivalences ( and ) do not coincide (Lemma 4), which provides a counter-example to one of the main results in [28].
Third, a well-known phenomenon [2,12] in process calculi that are both probabilistic and non-deterministic is the existence of some nonrealistic schedulers that are able to use the internal probabilistic choices done by an agent in order to schedule another agent’s non-deterministic choices, i.e., the scheduler leaks the probabilistic choices. Therefore, we study two important subclasses of processes that avoid this phenomenon.
We first consider the classical class of non-probabilistic processes (denoted ), as in the original applied π-calculus, but in the presence of probabilistic adversaries. We show that, if we additionally bound the number of sessions (denoted ),
may-testing with probabilistic adversaries coincides with the classical, purely possibilistic notion of similarity (Proposition 5 and Theorem 2). This also provides a contextual characterization of the notion of similarity which is reminiscent of [25] in the setting of CSP;
verification of testing equivalence with probabilistic adversaries is co-NEXPTIME complete for a large class of cryptographic primitives, relying on results from [18].
We next consider a class of purely probabilistic processes with a bounded number of sessions (denoted ), which is reminiscent of a probabilistic version of simple processes in [16,20], and a slight generalization of the processes in [11]. We show that trace equivalence as considered in [11], is
weaker than may-testing, but
coincides with a version of may-testing with determinate attackers: attacker processes are restricted by disallowing replication, parallel, and non-deterministic choice, but allowing probabilistic choices (Theorem 3).
Next, we present an algorithm for deciding trace equivalence by extending the procedure of the DeepSec verifier [18]. Our procedure inherits some limitations (bounded number of sessions, the class of admissible rewrite systems) but provides a more general setting than Bauer et al. [6] who additionally bound the size of input messages. We have implemented our procedure in the open-source tool DeepSec available at [35].
Finally, we illustrate our framework by studying the Dining Cryptographers protocol. We notably provide a pen and paper proof that the protocol guarantees anonymity (expressed as may-testing equivalence). Using the DeepSec tool we also show that anonymity is not provided when coins are biased.
For readability, we often only provide proof sketches and omit some of the most technical proofs. A full version with more detailed proofs is available at [17].
Probabilistic applied π-calculus
In this section we introduce the probabilistic applied π-calculus, a probabilistic variant of the applied π-calculus introduced by Goubault-Larrecq et al. [28].
Message as terms
Atomic values such as keys and nonces are modelled by names. We assume an infinite set of such names and partition it into two disjoint infinite sets and . The set of private names is a priori unknown to the attacker and models, e.g., honest keys in a protocol. The set of public names models public values, known to the attacker. The distinction between public and private names is analogous to the distinction between free and bound names in the original applied π-calculus. We also define an infinite set of variables . Finally, we consider a finite set of function symbols each equipped with their arity . Function symbols model cryptographic operations, e.g., is a binary symbol that could be used to model encryption. Terms are defined as names, variables, and function symbols applied to other terms. For instance, given two names , represents the encryption of a with the key k. For any , and , the set of terms built from and by applying function symbols in is denoted by .
We also suppose that terms are equipped with a binary relation ≐ that expresses that two terms evaluate to the same result, and a predicate that is intended to hold when evaluation succeeds. How ≐ and are precisely defined is not relevant for the results of this paper and we wish to capture several formalisms. ≐ can for instance be defined by an equational theory, as in the applied π-calculus [1] (where would evaluate to true on any term), by a constructor-destructor rewrite system, allowing evaluation to fail when a destructor application does not reduce, as in the DeepSec tool [18], or a combination of these as in the ProVerif tool [8].
Formally, we require that ≐ is symmetric, transitive, and closed under substitution of terms for names and variables, as well as application of function symbols. Moreover, for all , if and only if . is supposed to hold on any names, be closed under renamings and implies that and . Finally, we require that implies .
For example, the ≐ relation could capture that for any m, k modelling that decryption cancels out encryption when the same key k is used; one may also define as false to express that decryption fails if the ciphertext argument is not an encryption with the matching key.
Syntax of the process calculus
The syntax for processes is defined as follows:
processes
0
nil
output
input
parallel composition
replication
restriction
conditional
non-deterministic choice
probabilistic choice
where , , and . As usual, in examples we will omit trailing 0 processes and branches. A process P is closed when all variables in P are bound by an input. We also denote by the set of free names in P, i.e., the names not bound by a restriction, and by the set of all names occurring in P.
As an example, consider the process P:
P consists of two parallel processes. The left process outputs on a channel c with probability the name k and with probability the name a. The right process inputs a value on channel c and binds this value to x. If x equals k then it outputs the constant .
We denote by the set of all processes in the probabilistic applied π-calculus, and by the set of all multisets over .
Operational semantics
We will now define the semantics of the probabilistic applied π-calculus. We opt for a different presentation of the semantics than Goubault-Larrecq et al. [28] relying on existing formalisms for transition systems. Moreover, we allow for a more general class of schedulers.
Let be an arbitrary set. We denote by the set of all finitely supported probability distributions over and by the set of all sub-probability distributions over (observe that ). For , and sub-distributions D, E, we define the measure
When , the resulting sub-distribution does not depend on E, and we simply write instead of .
If , we denote by the support of D, i.e., the set of all elements such that . If , we define . Finally, we denote by the Dirac distribution on x.
The operational semantics of processes is defined by a relation between multisets of processes and probability distributions on multisets of processes, denoted . This relation is defined in Fig. 2.
One may note that our calculus offers a non-deterministic choice operator that is resolved internally. This differs from the standard π-calculus [30] where the non-deterministic choice operator is resolved externally. Note that the original applied π-calculus [1] does not contain non-deterministic choice.
Semantics of the calculus.
In the following, we define the operational semantics of our calculus using well studied probabilistic systems. We choose the formalism of non-deterministic probabilistic labelled transition systems (NPLTS) used for instance in [9]. A NPLTS allows to represent states that allow both internal and external non-deterministic behavior. It can be noted that it coincides with the notion of simple probabilistic automata of Segala et al. [33].
A NPLTS is a triple , where
is a set of states,
is a set of labels, and
is a transition function: for each state in , and each label in , is a set of (finitely supported) distributions.
The label τ is the internal action and the labels in are the external actions. For , , we write when .
In the remaining of this paper, we may define a NPLTS by only its transition function, i.e., we will say that is the NPLTS .
We now express our operational semantics as a NPLTS without external actions, i.e., . External actions will be used to express our labeled semantics in Section 4.1.
The operational semantics is the NPLTS where for every , .
Note that the states of the NPLTS contain all possible multisets of processes and how they are executed. Obviously, is thus an infinite transition system. In examples illustrating transitions of a multiset of processes , we only show the relevant fragment of that contains .
The complete execution of the process P, introduced in Example 1 is given in Fig. 3.
Behavioral equivalences
In this section we define probabilistic versions of two classical equivalences: may-testing and the stronger observational equivalence. In order to do so we first introduce the notion of resolution (also known as scheduler), i.e., how internal non-determinism is resolved, and the notion of barb, which models an observational action.
Resolving the internal non-determinism
Resolutions express how internal non-determinism of a NPLTS is resolved. Intuitively, resolving the non-determinism means, for each state, restricting the transition system either by choosing one particular internal transition, or else by leaving the choice of a non-deterministic external action. The resulting transition system is called a Reactive Probabilistic Labelled Transition System (RPLTS) and has still external, but no internal, non-determinism. It can be noted that this model is equivalent to Labelled Markov Chains when extended with internal actions.
A RPLTS is a triple , where
is a set of states,
a set of labels, and
is a transition function that assigns to each state in
either a unique distribution for the label τ (the deterministic internal action);
or a function mapping labels in to a failure (⋆) or a distribution over (the non deterministic external actions).
States such that are called external states, while the ones such that are called internal states. Given a RPLTS R, we denote by and the sets of external and internal states of R respectively. For a more homogeneous notation, when s is an internal state, we sometimes write instead of .
In the particular case of a NPLTS with no external action, resolving the internal non-determinism results in a RPLTS without any non-determinism. This is the case of the operational semantics . Such a purely probabilistic system is typically equivalent to the notion of Markov Chain. By abuse of notation, the transition function
of such RPLTS is rewritten as
as for any set X, the cardinality of the set is 1.
Before defining the notion of resolution – or schedulers –, we need to introduce two classical notions in probabilistic models: the convex hull of a set of distributions and the probabilistic lifting of a function.
Let be a set of states. The convex hull of , denoted , is the set of distributions such that
Intuitively, rather than choosing one distribution in Δ, each element in corresponds to a distribution over the distributions in Δ. This will be useful for defining randomized schedulers.
Next, we lift functions to distributions: applying a function f to a distribution simply defines a new distribution that transfers, according to f, the probability weight of elements in the domain of f to its image, possibly summing these weights when f maps several inputs to a same output.
Let , be two sets of states and . We define the function to be the probabilistic lifting of f, where
When obvious from context, we will overload the notation and write f instead of .
We now define resolutions for a NPLTS that allow to solve the internal, but not external, non-determinism: a resolution describes one of the possible ways of turning an NPLTS into a RPLTS. It means that for each state s, a resolution should choose whether s is an internal state or external state; in the first case, a unique post-transition distribution must be chosen; in the second case, for each external action a, the resolution must choose to either stop (i.e., ) or a unique distribution D such that (i.e., ). Due to the possible existence of cycles in the NPLTS, a scheduler that visits multiple times a certain state s must be able to choose differently how to resolve the non determinism every time it visits s. This leads to the notion of correspondence function.
A randomized resolution for a NPLTS is a pair where
is a RPLTS, and
is the correspondence function such that for all states , implies .
Given a NPLTS we denote by the set of randomized resolutions. Additionally, we denote . Figure 4 shows an example of a resolution from for the process P from Example 2.
Example of a randomized resolution for process P from Example 2 where the correspondence function is the identity.
Computing the probability to reach a barb
The notion of barb is a classical way of expressing observables. Intuitively a state of , i.e., a multiset of processes, exhibits a barb c whenever an output on channel c is possible.
For and , we say that exhibits barb c when there exists a process in where and . We denote by the set of all multisets of processes that exhibit the barb c.
We next define the probability of reaching a state in a set of target states, in a fully probabilistic transition system, i.e., in a transition system where all non-determinism–internal or external–has already been resolved. We first define the probability of reaching such a state in at most n steps, and then we take the probability of reaching them eventually as the limit of the n-step reaching probabilities.
Let be a RPLTS, a set of states, and s an initial state. For every we define the probability of reachingfrom s in at most n steps as:
We define the probability of reachingfrom s as:
Note that, as is an increasing function in n we can replace the limit by the supremum on .
Given , we denote by the probability:
Defining may testing equivalence
Intuitively, two processes are may-testing equivalent if they exhibit the same observations when executed in the presence of any attacker process. This models the inability of an arbitrary process to distinguish them. More formally, two multisets of processes and are may testing equivalent when the attacker has the same probability over all schedulers to exhibit the barb c in both and .
(May testing equivalence).
Let . We say that iff:
We say that , are may testing equivalent, denoted , when and .
One could also consider a more fine-grained definition of may testing pre-order that guarantees the equality of probabilities between two schedulers rather than comparing the probabilities over all schedulers. Formally, this pre-order, denoted , requires that for all resolutions and state s of R such that , there exist a resolution and a state of such that and
However, the resulting relation is counter-intuitive and distinguishes processes
Indeed, we can show that : for , there exists a resolution such that
but for every resolution such that ,
Defining observational equivalence
In this section, we define observational preorders and equivalence which are stronger than may testing. When studying cryptographic protocols we suppose that internal actions are not observable and therefore only study weak equivalences hiding whether such internal actions take place or not. To define the observational preorder we need to introduce a weak relation for internal actions. In a purely non-deterministic system this simply corresponds to the reflexive, transitive closure . However, in our setting we need to compute the corresponding distributions.
Let be a NPLTS and .
We write when there exists and such that
, , ,
.
To define the observational preorder, we additionally need to lift relations defined on a given set to relations on sub-distributions over this set.
(Lifting of a relation).
Let R be a binary relation on a discrete set . We define the lifting of R to sub-distributions as the binary relation on , denoted , defined as:
where .
Using these notions of weak transition and lifting of relations to sub-distributions we can define observational equivalence.
The observational preorder is the largest relation on such that implies:
;
if and then , and ;
∀ closed such that . .
The observational equivalence is defined by additionally requiring to be symmetric.
Note that we slightly diverge from the original definition of observational equivalence [1] where an evaluation context is of the form
In our definition we simply consider a parallel process, and no additional name restriction. However, we now show that these two definitions coincide. Intuitively, restricting names whose scope includes the adversarial process A does not provide additional distinguishing power and these names could as well be public. While this result is of independent interest, the reader may safely skip the remainder of this section without hindering their understanding of the rest of the paper.
Our distinction between private and public names enforces that all restricted names are in . Applying an adversarial context with restriction in our formalism corresponds to applying a renaming from public to fresh private names. Hence, in our formalism the original observational equivalence can be defined as follows.
The original observational preorder is the largest relation on such that implies:
for all , ;
if then and ;
for all closed , for all renaming ρ, if , , and then .
The original observational equivalence is defined by additionally requesting to be symmetric and in the second bullet point, by requesting both and to hold.
We now show that the two notions coincide. Intuitively, restricting names whose scope includes the adversarial process corresponds to making previously public channels invisible to the attacker at later steps, which does not provide additional distinguishing power.
and.
Taking ρ to be the empty renaming, we have that and .
Define the relation as iff there exists , and a renaming ρ such that , , , , and .
As ≐ is closed under substitution of names by terms, we can show that iff . This can be propagate to schedulers, i.e. if and only if . Hence, we derive that
if and only if
We now show that satisfies the three items in Definition 11:
Let . If then c does neither occur in nor in . Hence, . Otherwise, where for all , . Since, and do not contain public names from , they can never reach states from . Hence, . Since , we deduce that
If then . By , we have and . This implies that with .
Let and be a renaming such that , , and . Note that the domains of ρ and may not be disjoint. Similarly, the process may contain public names from .
Hence, let be a renaming such that , . Hence, we define .
We have since . Moreover, as , we have that
Therefore, . This allows us to deduce that . Similarly, we have .
We know that . Hence which allows us to conclude that and so .
□
Labelled semantics and equivalences
As usual in π-calculi, and in the applied π-calculus, we can define a labelled semantics. The intent of the labels is to capture adversarial actions and avoid the universal quantification over processes in equivalence definitions.
Labelled semantics
A state in this labeled semantics is defined by associating a multiset of processes with a frame, modeling the adversary’s knowledge. We consider a new set of variables distinct from that will act as pointers to messages that were previously output.
An extended process is a pair , where and ϕ is a ground substitution
such that , and for .
We denote by the set of all extended processes.
A recipe is a term from representing how an attacker can deduce a message.
If D is a distribution over , and ϕ a frame, we write for the distribution over extended processes defined as .
We now define the NPLTS for the labelled semantics. External actions model interactions with the attacker.
The labelled semantics is the NPLTS where
is the set of labels , , and with ξ, ζ recipes and ;
Note that when we lift to extended processes we suppose that the freshness requirement of a new name in the (New) rule of Fig. 2 also applies to the frame ϕ, i.e., must not appear in ϕ.
It should be noted that we deal with static equivalence in a different way as done usually in the applied π-calculus, or implicitly in the probabilistic applied π-calculus [28]: we encode static equivalence into the NPLTS by a countable set of actions–all the tests and their negations– instead of just one action testing static equivalence. The motivation behind this choice is to be able to represent every action from the NPLTS by an elementary action of the adversary. As shown later, this choice has no effect on the definition of the simulation pre-orders or on bisimulation, but it leads to a slightly different notion of trace equivalence, that is closer to may testing equivalence.
Defining trace equivalence
We first define the probability of executing a trace for a given resolution. As we are interested in weak trace preorder (where internal actions cannot be observed), traces are sequences of external actions only. Our definition uses the previously introduced notation : recall that this denotes the probability of reaching state t from state s using only internal actions for some resolution .
Let be a RPLTS. Let be a trace, i.e., a finite word on the alphabet . For all states , we define the probability of executing w starting from s in R as:
Given a NPLTS , we denote by the probability:
This allows us to define trace equivalence of and : intuitively any trace that can be executed in can be executed with at least the same probability in and vice-versa.
(trace equivalence).
Let , . We say that iff
and are trace equivalent, denoted , iff
Unlike, in the purely possibilistic case, in our probabilistic setting trace preorder is (strictly) weaker than the may testing preorder.
Letbe two processes.Moreover there exist processessuch that
, such that .
Figure 6 witnesses that the implication is strict. and output each 3 encrypted bits (in a non-deterministic order). outputs twice the encryption of 0; twice the encryption of 1. The (randomized) encryption ensures that these three values are indeed indistinguishable. We give the adversary a single access to a decryption oracle . Intuitively, trace equivalence holds, as the scheduler can ensure that matching encryptions are sent to . However, may-testing does not hold: the attacker chooses uniformly at random one of the three encryptions to submit. The probability to hit 0 will be in and only in .
Observe that Theorem 1 holds for any processes and does not require them to be image-finite, contrary to usual results in the literature, e.g., [16]. This discrepancy comes from our choice of labelled actions for static equivalence (see Remark 5): a trace cannot test directly static equivalence, but can only do a finite numbers of recipe tests. We believe this variant definition of trace equivalence to be of independent interest as it provides an exact characterization of may testing in the purely non-deterministic case.
Defining (bi)simulations
In this section, we define simulations on probabilistic processes and corresponding equivalences. Our definition of simulation preorder is similar to the definition of randomized weak simulation preorder introduced by Segala and Lynch for probabilistic automata [33]. We reuse the lifting of a relation and the weak relation for internal actions defined in Section 3.4 but applied to the NPLTS . In particular, given an action and two distributions , we write when , and for some , and where denotes . Here is the natural lifting of the transition function of , i.e., with .
A relation is
a simulation if implies that for all ,
a bisimulation if R is a simulation and R is symmetric.
The simulation preorder, denoted , is the largest simulation and bisimilarity, denoted , is the largest bisimulation. We define similarity, denoted , as .
As usual in the field of (bi)simulation, it can be shown that , respectively , exists [27] and that it is a pre-order, i.e., a reflexive and transitive relation, respectively an equivalence, i.e., a reflexive, symmetric and transitive relation [34].
The following proposition from [34] states that, as usual in the non-probabilistic case, the weak arrow can replace the single arrow in the definition of simulation.
Let R be the largest binary relation onsuch thatimplies that for every,We have.
Observe that our choice of labelled actions for static equivalence (see Remark 5) has no impact on the resulting simulation and bisimulation. Indeed, if two -states and are in the simulation pre-order or bisimulation, then ϕ is statically equivalent to ψ. Indeed, for all , implies that and . Since neither the a transition or τ transition modifies the frame, the former indicates that , and , with for all , . The former ensures that . Therefore, for all ξ, ζ, if or holds on ϕ then it also holds on ψ respectively. It implies that ϕ and ψ are statically equivalent.
We now show that observational preorder and equivalence are exactly characterized by the simulation preorder and bisimilarity.
Let,two processes in.
We here provide the main intuitions of the proof that iff .
(⇒). To show that observational preorder implies simulation, we need to represent the frame of an extended process as a process: we output in parallel the terms , with , on a public channel , distinct for each i and not occurring anywhere in . Thus, we build the relation such that
with and pairwise distinct and not occurring in , , ϕ, .
As the public channels do not occur anywhere else, any internal transition on
must correspond to an internal transition on ; and similarly for .
For all visible actions, we rely on being closed by composition with an adversarial process. For example, when the action is the test , we compose with the adversarial process that (i) reads the frame, (ii) applies the test, and (iii) outputs on a fresh public channel if the test succeeds:
where are fresh variables and . We then consider the transition
and the fact that to conclude. Indeed, for
to exist with , must also have passed the test in the conditional branching. Hence and so .
When the visible action is an output or an input, the process is more complicated. The adversarial process starts by reading the frame as before and executing the action. The last part of the adversarial process consists in outputting again the frame so that we re-enter the relation . Assume for instance the action . By definition, with , and .
We consider the following adversarial process :
where are fresh public names pairwise distinct not occurring anywhere else. We will conclude by considering the transition
where . Indeed, for
to exist with , must also have applied an internal transition executing the construct which allows for the labeled action to be executed on .
(⇐). Showing that simulation implies observational preorder is more straightforward. We build a relation such that when there exist two extended processes , with compatible frames (i.e. ), a renaming ρ from to , and a multiset of adversarial processes such that:
names in do not occur in , ϕ, and ;
;
;
.
The renaming ρ replaces the private names that are generated by the processes in (through the construct ) with public names that are chosen fresh (i.e. not in , ϕ, and ). □
Randomized vs non-randomized schedulers
All our equivalence notions are based on randomized schedulers where the non-determinism is solved by picking a distribution from the convex hull of the available distributions. In the literature, more restrictive non-randomized schedulers have also been considered when defining observational equivalence and bisimilarity [28]. A non-randomized scheduler solves the non-determinism by choosing directly one of the available distributions. Formally, in Definition 4, instead of requiring that implies , a non-randomized resolution requires that implies and is injective on the support of D.
Denoting by the set of all non-randomized schedulers of , we can naturally update the notions used to define behavioural equivalences to non-randomized schedulers. For instance, we denote by the probability of reaching from s over all non randomized schedulers . Similarly, denotes the probability of executing the trace w from s over all schedulers from . Updating the definitions results into may-testing and trace preorder for non-randomized schedulers, denoted and respectively. We now show that and do not depend on the whether schedulers are randomized or not (unlike simulation based notions as we will see below).
May testing and trace preorders with randomized and non-randomized resolutions coincide:
The core of the proof is the following fact: when we fix a randomized resolution R, an initial state s, and , it is possible to decompose the behaviour of R from state s and during the n first execution steps into a weighted family of non-randomized resolution (where the weight is a coefficient in , in the sense that for every set of processes , . The construction of this decomposition is defined inductively on n. □
This result is of interest as it is often easier to manipulate non-randomized schedulers, and we expect automated verification to be more convenient as well.
When considering observational equivalence, simulation and bisimulation, non-randomized schedulers raise a number of issues. First, as highlighted for instance in [24,27], when considering bisimulation or simulation on general NPLTSs, non-randomized resolutions result into relations that are not transitive. We show that even on the specific NPLTS , simulation is not transitive.
Simulation for non-randomized scheduler is naturally defined by extending the notation to non-randomized scheduler, denoted : from Definition 8, we require to be in and additionally require an injectivity property on the correspondence function, i.e., is injective on the support of . We denote the resulting simulation with non-randomized schedulers by (and similarly for , and ).
Fragments of showing but .
is not transitive.
Consider processes
The corresponding fragment of is displayed in Fig. 7a. We will show that
It is easy to see that and so . The difficult part of the proof of is to match
This is achieved by the scheduler displayed in Fig. 7b.
Finally, we prove by showing that the transition cannot be simulated in . □
Note that the definitions of bisimilarity in [28] rely on non-randomized schedulers. Even though this does not necessarily imply that their relation is not transitive (as they focus directly on the semantics of processes) we show in the next lemma that and do not coincide, hence disproving [28, Theorem 2]. This reenforces our belief that it is preferable to use randomized schedulers in our definition.
Fragments of NPLTS showing and .
There exist processessuch that
,
, and
.
We consider the following processes:
Both and are proved by showing that the binary relation R, defined as the reflexive, symmetric and transitive closure of , is a bisimulation (see Fig. 8a).
To prove that , we show that . In particular (see Fig. 8b), we build a non-randomized scheduler such that where . However, there is no distribution E such that , and . □
Remark that we have cast the definitions of [28] in our own framework. In the full version [17] we show that processes P, Q in Lemma 4 can be adapted to obtain the counterpart of Lemma 4 in the exact framework of [28]. The failure of the proof of [28, Theorem 2] can be traced back to the auxilliary lemma [28, Lemma 3] that states that bisimilarity is closed under application of closing evaluation contexts. No proof of this lemma is however provided, and it is actually false: as shown in the proof of (our) Lemma 4, the extended processes and defined there are bisimilar (with respect to non-randomized schedulers), but it is not the case of the extended processes and .
Well behaved subclasses of protocols
It is a well-known phenomenon that non-determinism and probabilistic choices do not interact well: a particular scheduler may for instance leak a secret probabilistic choice. Such schedulers are generally deemed unrealistic, and several papers aim at restricting schedulers, e.g., [2,12]. We illustrate this phenomenon on the following example.
Consider the processes
One may, intuitively, consider that these two processes exhibit the same behaviour. Q takes an input and then with probability decides to either output on or on . P on the other hand first choses a branch with probability . Each branch performs an input and, depending on the input value, outputs either on or on . As the two branches make opposite choices on the output according to the input value, one might expect the probability to output on to be .
However, P and Q are not may testing equivalent and can be distinguished by the adversary . Indeed, we can show that:
Intuitively, this results from the fact that the resolution may leak the probabilistic choice through the non-deterministic choice of the attacker to output 0 or 1. The resolution chooses the attacker to output 0 in the first probabilistic branch of P and 1 in the second.
In this section we identify two subclasses of processes that avoid this problem. The first such subclass is that of non-probabilistic processes, i.e., without the operator (we denote by all the multisets of such processes). This is the class of the original applied π-calculus which also enjoys good tool support. Figure 6 already illustrated that even on non-probabilistic processes, probabilistic adversaries have a stronger distinguishing power for the may testing equivalence. We formally characterize this distinguishing power when restricting protocols to a bounded number of sessions (denoted ), i.e., considering processes without replication: for this subclass, may-testing coincides with similarity. We therefore inherit from [18] the fact that deciding may-testing is coNEXP complete for a large class of cryptographic primitives.
The second subclass considers purely probabilistic processes with (nearly) no non-determinism. We show that trace equivalence in this class (as considered for instance in [11]) corresponds to may-testing with a restricted, determinate adversary process. We also sketch how the algorithms of the DeepSec prover [18] could be adapted to check trace equivalence in this probabilistic setting.
Non-probabilistic processes
May-testing with non-probabilistic adversaries and trace equivalence coincide
Our definitions of may testing and trace equivalence coincide with the classical definitions of the original, purely non-deterministic applied π-calculus when all processes are non probabilistic. As a first step, we observe that the weak operational semantics we defined in Section 4.3 is a conservative extension of the weak (non-probabilistic) operational semantics: indeed, when considering non-probabilistic processes, all distributions in the (labeled) operational semantics are Dirac distributions.
We write for the set of all non-probabilistic extended processes. We write , respectively , for the one-step reduction relation we obtain when we restrict the NPLTS to , respectively to . For , we write when there exists a sequence .
Let.
We look now at preorder relations between non-probabilistic processes. We first recall formally how may testing, trace equivalence and bisimulation are defined for non-probabilistic processes (Definition 17 below). Those definitions are coherent with those from the literature, e.g [16], (up to the difference on static equivalence, discussed in Remark 5).
If b is a barb, we write when there exists such that , and . For , we write when . If is a trace, we write when there exists a sequence .
We define the binary relations , , on as follows:
when . implies ;
when for every trace α, implies ;
is the largest reflexive and transitive relation such that implies that for every , and , there exists such that and .
The preorders and –and the corresponding equivalence relations–are conservative extensions of and . As expected, the preorder is not a conservative extension of , because of the additional expressive power of probabilistic adversaries. Nonetheless, we can recover when we restrict the adversaries in the definition of to non-probabilistic adversaries.
Let.
iff;
iff;
iff
For may-testing and trace preorder, the proof uses crucially the fact that it is enough to consider non-randomized distributions (thanks to Lemma 2), and from there, we conclude using Lemma 5. Since it is not possible to consider only non-randomized schedulers in the definition of simulation, the proof of the first point in Proposition 2 is more subtle, and uses well-structured properties of the lifting of a relation from Definition 9. We need to show (1) that the (non-probabilistic) simulation is a probabilistic simulation in the sense of Definition 16, and (2) that is also a non-probabilistic simulation in the sense of Definition 17.
Suppose that for . Let , such that . Looking at the way we defined (and since is non-probabilistic), we also see that for every is the support of D. From there, we obtain that for each i, there exists such that , and . At that point, we build , and we can see that . Moreover, the structural properties of the lifting allows us to go from to . Hence, we have shown that is indeed a probabilistic simulation in the sense of Definition 16.
Suppose that for . Let , and such that . This transition carries over to , i.e., . We obtain that there exists a distribution E such that , and . But by structural property of the lifting, we have that for every element in the support of E. Moreover, since is non-probabilistic, it holds that for every element in the support of E. Since E is a distribution, there exists at least one such element , thus we can conclude.
□
The following result indicates that may testing and trace equivalence coincide in non-probabilistic settings. In particular, we recover the fact that for the classical definitions in non-probabilistic settings, trace equivalence implies may-testing, as shown in [16].
Let.
May-testing and simulation coincide for bounded processes
We rely on a modal characterization of strong simulation on image finite labeled transition systems (LTS) by a Hennessy–Milner logic [29] (HML).
We can rely on strong simulation as it is a well-known fact ([3]) that simulation for a LTS can be expressed as strong simulation on the corresponding weak LTS, that is, in our case all transitions are merged into a single transition.
A LTS is image finite when the LTS cannot infinitely branch from a state and a label. Therefore, as we consider only bounded processes and by denoting the strong simulation relation on a LTS , we can build an image finite LTS such that for all :
Our HML characterization consists in expressing strong simulation preorder by the means of satisfaction of logical formulas by the LTS.
Let be a countable set of actions. We define the set of logical formulas as:
In our case, the set of actions corresponds to that is indeed countable. The satisfaction of such formulas by a LTS is defined as follows.
Let be a LTS. We say that satisfies a formula F, written , if for all ,
;
when and ;
when and .
The following proposition shows how to relate strong simulation with satisfiability of logical formulas.
(HML characterisation of simulation).
For an image finite LTS,
In order to prove that simulation coincides with may-testing for bounded non-probabilistic processes, we show that we can emulate any logical formula by a probabilistic adversary: for all formulas F, we build a probabilistic adversary such that for all bounded non-probabilistic extended processes ,
The construction of is defined as follows.
Let F be a formula in , such that and . We let and define by induction on the syntax of F:
if then .
if then when and otherwise.
if then when , and otherwise.
if then when and otherwise.
if then when and otherwise.
if , then .
In Definition 20, the integer n and the conditions on the variables of ξ, ζ ensure that the adversarial process is closed (no free variables). This is not a restriction as implies that F satisfies these conditions. In particular, we note that conjunction is encoded by probabilistic choice and on formula ⊤, the adversary process exhibits the barb c. The main result of this section follows almost directly.
Let.
Cheval et al. have shown [18] that both deciding trace equivalence and bisimilarity is coNEXPTIME complete when cryptographic primitives are modelled by a subterm convergent destructor rewrite system and the number of sessions is bounded. (We refer the reader to [18] for a precise definition of this class of rewrite systems.) The hardness proof reduces Succint 3Sat to both trace equivalence and bisimilarity using a same encoding which also proves hardness of similarity. The coNEXPTIME decision procedure for bisimilarity can be directly adapted to the case of similarity, hence, we have the following result.
Let. Decidingis coNEXPTIME complete when ≐ is defined by a subterm convergent destructor rewrite system.
May-testing and simulation do not coincide for unbounded processes
We consider the following two simple LTS and , that are an asymetric variant of the LTSs used in [36] to show that the finitary HML does not characterize bisimulation.
Though both and may produce an unbounded number of a transitions, the initial transition in decides on an arbitrary, but fixed number of a transitions in the rest of the execution. This in particular shows that does not simulate , i.e., .
In the applied π-calculus, can be represented by the process . Modeling is more complex: we non-deterministically choose an integer , and output times a. The non deterministic choice of n is realized through the following process that outputs the unary encoding of n:
Channel d represents a memory cell initiated with a public name b (encoding 0). When reading on d the current value x, the process non deterministically chooses to increment x (updating the cell with ) or to select the value x by outputting it on channel e.
It remains to model the process that given an integer x, produces outputs of a:
Similarly to , relies on a private to increment b until reaching the value x read on e. It is easy to see that the process outputs times a.
Let. We have:
is shown by relying on the same idea used to show . Before the first output on c, Q must produce a communication on e, hence fixing the value of x used in which bounds the number of following outputs on c by . As may output on c an unbounded number of times, we conclude.
To prove that , we unfold the definition of and focus on where is a resolution, and . One can notice that by definition, the number of transitions from s in the computation of is bounded by n. Thus, the resolution may at most unfold n times the process . By denoting , we can build a resolution whose probability to exhibit the barb from is the same as . We then rely on and Theorem 2 to conclude that which allows us to conclude. □
Purely probabilistic processes
At the opposite of the spectrum of purely non-deterministic processes, we study purely probabilistic processes with (nearly) no non-determinism. A similar class of processes has been considered in [6,11] to model various protocols relying on randomization (e.g., Crowds [32], mix-net [13], electronic voting [15]). They consider systems that are built as the parallel composition of independent agents, called roles, and where all communications are mediated by the adversary. Moreover, the internal behavior of each role is deterministic; the only non-determinism is controlled by the adversary–thus external–and consists in the adversary’s choice for scheduling the communications. We model purely probabilistic processes as follows.
A process is fully determinate if it does not contain the operators +, |, nor !.
is purely probabilistic when:
each is fully determinate;
there exist distinct public channels such that for all , all input and output actions in are on .
The class of purely probabilistic multisets of processes is denoted by .
can also be seen as a probabilistic extension of the class of simple processes introduced in [21] to show that trace equivalence coincides with observational equivalence for such processes.
Removing scheduling of τ-actions
In [6,11], the authors consider trace equivalence for a slightly restricted fragment of purely probabilistic processes. More precisely, all processes have exactly the same control structure which removes the necessity of scheduling honest τ-actions and allows to directly consider strong trace equivalence, as exactly the same τ-actions occur. In this work, we lift this restriction on the shape of the processes and show instead that the non-determinism related to honest τ actions is inconsequential when deciding trace equivalence. Indeed, in a multiset of processes , a τ action may be available simultaneously in multiple components. However, all such τ-action are in fact purely deterministic (e.g., conditional branching, probabilistic choice). Moreover, as the s do not contain parallel composition and all input and output occur on distinct channels , no internal communication between processes and is possible.
We show that to compute the probability of executing a trace w, we only need to consider a single maximal resolution on the NPLTS . Such a resolution always executes an action when at least one is available. Formally, a resolution with on purely probabilistic processes is maximal when for all , if there exists in for some a, D then for some .
Letbe an extended purely probabilistic process. For all maximal resolutionson, for allwith,
May-testing and trace equivalence coincide for fully determinate adversaries
As illustrated in Example 3, may-testing is strictly stronger than trace equivalence even on purely probabilistic processes due to the non-determinism in the adversarial process. However, by restricting the adversarial process to be fully determinate, we can show that may-testing and trace equivalence coincide. We define the resulting determinate may testing preorder, denoted , exactly as in Definition 7 but additionally restrict the adversary process to be a singleton where A is fully determinate.
Let.
To prove that implies we encode any trace w into a determinate adversary where c is fresh. is defined in a similar way as (Section 5.1), e.g.,
In particular, on the empty trace the adversary process exhibits the barb c: . We obtain that
The other implication is more difficult as the adversarial process is allowed to use probabilistic choices which cannot be directly encoded in a trace. Instead, we show that any adversarial process aiming to exhibit a barb c corresponds to a multiset of weighted traces , built inductively on . For instance, when and
then
For other constructs, the set of weighted traces is built as expected, e.g., and . For outputs or inputs, we additionally test if the channel corresponds to the barb c, i.e., when and ,
This construction yields the following property:
Intuitively, exhibiting the barb on c does not require any interaction with the adversary, or this interaction is correctly encoded in . Using this property we easily conclude. □
Deciding trace equivalence and tool support
As previous mentioned, Cheval et al. [18] designed a decision procedure for trace equivalence when cryptographic primitives are modelled by a subterm convergent destructor rewrite system and a bounded number of sessions. This procedure is based on constraint solving techniques that represent the infinite set of all possible concrete executions of the processes and an arbitrary attacker as a finite symbolic tree, called the partition tree. Intuitively, each node of this symbolic tree represents the state of the two processes after executing a trace . Due to non-determinism, a node may contain several constraint systems corresponding to every possible interleaving allowing the execution of a given trace . Deciding trace equivalence between processes A and B, in the original, non-probabilistic setting, requires to check that each node of the symbolic tree contains at least one constraint system derived from process A and one from process B; or the node is empty.
We show how to extend this procedure in order to decide trace equivalence in a general setting where both probabilistic and non-determinism behavior may co-exist in the process. Obviously, we inherit the setting of a bounded number of sessions and cryptographic primitives modeled by a subterm convergent destructor rewrite system.
Following Lemma 2, proving is equivalent to proving . Thus, by definition, we focus on the computation of for all . A main difficulty stems from the presence of universal quantification over resolutions. However, as is bounded, we can completely ignore resolutions and focus on the labelled semantics, as shown by the following property.
Let. Let. Let.where
Note that this inductive definition is well founded as the size of the processes strictly decreases at each semantics step since there is no replication.
As mentioned above, partition trees [18] are a finite symbolic representation of the concrete executions of the two initial processes. Each node contains all reachable states of the two processes after executing some trace w. However, to compute the probability , Lemma 7 also requires to know the different semantics steps that led to the process states after executing w. In other words, it requires the history of each extended processes. Therefore, to simplify the probability computation, we extended the labelled semantics by adding the history of transitions leading to the extended process. To further simplify, we assume that each input, output, probabilistic choice and non-deterministic choice are decorated with a label , denoted , and , respectively. Additionally, we assume that all labels are distinct in the initial processes.
We define history entries as elements from . A history, usually denoted , is a sequence of history entries.
Extended processes and the labelled semantics can naturally be extended to include history. In particular when and , we define where:
when and ;
when and ;
when and ;
when and ;
when , ;
when and and ;
when and and ;
otherwise.
Given , we write when and w is with the τ labels removed.
Since labels occur at most once in the initial process, intuitively, two extended processes with a shared prefix executed the same semantics steps. In other words, if
then there exist , and an extended process such that and
We now describe how we can compute the probability that executes a trace w from the set of histories obtained after executing w, i.e. where denotes the empty sequence.
Let S be a set of histories. We denote by . We define inductively as follows:
, i.e., if S is the singleton containing the empty sequence
otherwise
The correspondence between and the function is given in the following lemma.
Letbe an extended process,and. We have that.
This lemma provides the core property that allows us to update the partition trees generated by the procedure in [18] as well as the test performed on each node of this partition tree to conclude trace preorder.
Letand ≐ be defined by a subterm convergent destructor rewrite system. Trace preorderis decidable.
First, we show that we can restrict the set of traces we need to verify. In particular, it suffices to look at traces that can be split in two parts: where only consists of actions of the form or , and of any other kind of actions. In other words, we can push static equivalence tests towards the end of traces. Moreover, for all concrete traces not containing static equivalence tests, it suffices to check a finite number of concrete traces built only on static equivalence tests. To do this, we use a proof technique similar to the proof in [16] showing that trace equivalence and may testing coincide for processes with bounded number of sessions. Let us define the sets and . Since P and Q are bounded processes, , and are finite (here ∼ refers to static equivalence of frames). Therefore, for all , we can define a finite sequence of actions built only on actions of the form or that exactly defines the equivalence class of in , i.e., for all , if and only if . Thus, if we denote by W this set of traces sufficient for proving trace preorder, we have that if and only if for all , .
Second, similarly to how we augmented our labelled semantics with histories, we also augment the symbolic semantics used in [18] to generate partition trees with histories. As the addition of histories in symbolic extended processes does not impact in any way the generation of the partition trees, we can deduce from [18] that there exists a partition tree from P and Q, denoted . Thus, each node η of the partition tree will now contain a set of symbolic processes of the form where is an open extended processes, is a set of constraints and is a history. The set can furthermore be divided into the sets of symbolic processes coming from P and from Q, respectively and . If we denote by the set of histories of the (symbolic) extended processes in S, our decision procedure consists in checking that for all nodes , .
To show that it indeed decides trace preorder, we recall that soundness of partition trees ensures that the nodes of the partition tree rooted by and contain the symbolic processes representing all concrete extended processes reached after executing some trace w and that are statically equivalent. Moreover, completeness of partition trees ensures that for all concrete traces , there exists a node in the partition tree such that if or then contains a symbolic process representing .
Formally, we show that:
Soundness: , s.t. and
Completeness: , s.t. and
Applying Lemma 8, we obtain that
which allows us to conclude the proof. □
As a direct corollary we obtain that we can decide determinate may testing for purely probabilistic processes as it coincides with trace equivalence.
Letand ≐ be defined by a subterm convergent destructor rewrite system. Determinate may testing preorderis decidable.
We have implemented the procedure described above in the DeepSec tool. The input language has been extended with a probabilistic choice operator where p is a real number in . When trace equivalence between 2 processes is queried, the tool uses the above procedure for verification. The development is available on DeepSec’s official github repository at [35].
As we will see in Section 7, we can use DeepSec to show that the dining cryptographers protocol does not provide anonymity when coins are biased.
An example: Dining cryptographers
In this section, we illustrate our framework by proving anonymity of the well-known dining cryptographers protocol. This protocol has been designed to solve the following problem: three participants have a secret bit , such that at most one of these bits equals 1, i.e. the sum s of the three bits is either 0 or 1. The goal is to determine whether while hiding any additional information of each ; in particular, if , it must be impossible to identify for which i. To do so, we suppose that each pair of participants has a private coin, that they can toss privately, i.e., only and can read the result. To solve the above described problem, participants proceed as follows:
each pair of participants tosses its coin, and privately shares the result ;
each participant outputs their result , computed as the exclusive or (xor) of their secret bit , and the result of both their left and right coins;
each participant can compute s by doing the xor of all the three .
We suppose that the secret bits of each participant are determined by an assignment , and denote by the process (defined below) modeling the protocol with the given assignment. As we are interested in anonymity we focus on the case where one secret bit is 1 and we denote by the set of 1-weighted assignment functions, that is those functions that assign map exactly one participant to 1. Then, the security property for the dining cryptographers protocol can be stated as
that is no adversary should be able to distinguish the systems and whenever one of the secret bits equals 1.
Our contributions in this section are the following. First, we model the dining cryptographers protocol as a process in the probabilistic applied π-calculus. Then we give a manual proof that (when coins are unbiased) and –where B, are assignment functions as above–are may-testing equivalent. Finally, we show, using the tool DeepSec, that when the coins are biased those systems are not trace equivalent, thus not may-testing equivalent (Theorem 1).
Dining cryptographers in the probabilistic applied π-calculus
We split the definition of in several components. First, we model the fact that two adjacent participants can toss a coin, and privately share the result. We model this by the oracle process (indexed by channels , ): it performs a fair probabilistic choice, and sends the result in parallel on channel on its right, and channel on its left.
We now define the process that models one agent : it is indexed by three channels–(private) channels and for communicating with the oracle placed to its left, respectively to its right, and a (public) channel c for sending its final result. It also depends on the agent’s secret bit .
where ⊕ models bitwise xor.
We fix a set of public names and we will identify each participant to their public channel .
Moreover, we define the set of oracles (or edges), that we note . Observe that has three elements, that we call , , for convenience.
We also fix a set of 6 private names that will serve as the private channels between oracles and participants; each of the three participants has access to two oracles. We formalize the association between channels, oracles and participants by three functions
Function c associates to an agent or an edge the pair of channels (we denote by and the first and second projection of c respectively). Functions e and a associate to a channel, the corresponding agent, respectively edge.
Dining cryptographers: graph representation.
The overall system using these notations is depicted in Fig. 9. Given we can now define the process that puts in parallel three participants, and their three coin toss oracles. The agent channels used by the participants to communicate their final results are public. By contrast, the channels used to communicate with the oracles are hidden from the adversary, thus private and bound by .
We can now formally state that the dining cryptographers protocol does ensure anonymity.
In order to show Theorem 5, we are going to show use the labelled semantics and show a stronger result, i.e. that and are bisimilar.
Labelled semantics for the dining cryptographers protocol
The labelled semantics for our model of the dining cryptographers protocol is however complex: one of the reasons is that we do not enforce a particular order for the communications between the oracles and the participants . Thus, we have both probabilistic and non-deterministic concurrent behaviors. As a consequence, it is difficult to prove our equivalence result directly in the NPLTS . To solve this problem, we define an auxiliary simplified NPLTS , and show that:
is bisimilar to ;
for every the -states corresponding to and are bisimilar in .
The main idea for the construction of the auxiliary NPLTS is that only relevant steps should appear. More precisely, we note that a run of the protocol should be entirely characterized by the oracles’ probabilistic choices, and the order in which participants output their results. By contrast, the order of internal communications between participants and oracles occur–on the private channels –is irrelevant.
We define sets of internal events Λ and external events as:
The event means that oracle is sampling. The event means that participant is outputting b on their public channel. We call history a pair , where s is a set of internal events in Λ and l is a list of external events in such that no participant occurs twice in l. We denote by the set of all such h.
To each history , we associate a frame when . Moreover, for , and history , we write when for some boolean b, and otherwise. We define similarly and for . We denote the empty history by .
We are now ready to define our simplified NPLTS . has exactly the same set of actions as , the NPLTS built from the labelled semantics. A state of encapsulates information about both the (relevant) history, and the assignment function of secret bits . These information are enough to decide how a history is modified by an action. More precisely three kinds of actions can be enabled in a given state.
The first kind are actions of the form , that in perform an equality test on the frame. In the equality test is performed on , the frame built unambiguously from the history as defined in Notation 8. The second kind of action is the internal action τ: in τ-actions correspond to a step where an oracle, that does not already appear in the history, randomly samples a boolean b. After such a τ-action, which is inherently probabilistic, the information that this oracle has sampled b is added to the history. The last kind of actions are the (external) output actions: some participant can output a result b – if no output by a is already recorded in the history – when the history contains the information of the boolean chosen by both oracles adjacent to a, and that moreover b is equal to the xor of these two booleans with the secret bit of a.
We define a NPLTS where , , and the transition function is defined as:
We now follow the road-map that we outlined in the beginning of the present section: first we show that we can replace the study of the labelled semantics for by the study of , by exhibiting a bisimilarity between those two NPLTSs (Proposition 7). Then, we show that the states (corresponding to) and are bisimilar as soon as (Proposition 8).
There exists an NPLTS bisimulation R onsuch that for every, it holds that.
Ifthen the-statesandare bisimilar.
From Propositions 7 and 8, we can show that the process and are equivalent for the labelled bisimulation from Definition 16 in Section 4. We state this result formally below.
(Security for the dining cryptographers protocol).
Ifthen.
Recall that by definition of , we need to show that the states and are bisimilar in . Since we know by Proposition 8 that and are bisimilar in , we also have (because and have disjoint state spaces). By Proposition 7, we also have that , and . Since is an equivalence relation, we can combine all this to obtain that . From here (again because and have disjoint state spaces), we conclude that . □
As an immediate consequence of Corollary 3, and since labelled bisimulation is stronger than may-testing equivalence, we obtain a proof of Theorem 5, i.e. that , .
Analysis with DeepSec
We have analyzed the Dining Cryptographers protocol using our probabilistic extension of the DeepSec tool [35]. The encoding of processes for is straightforward and follows directly the modeling above. Although DeepSec does not support xor in general, we can easily encode the ternary version using 8 rewrite rules for and . Restricting to a ternary xor is not a loss of generality as the protocol does not take any input from the adversary and only honest users construct xor-terms.
As a first result we can show that
This result is actually not satisfactory, as trace equivalence is strictly weaker than may testing (which also motivated the pen and paper proof of the previous section). More interesting is the fact that we can show that anonymity is broken when we use a biased coin, i.e., we replace the probability 0.5 in the coin tossing oracle by a probability . If we denote the modified process by we can for instance show, using DeepSec, that, as expected,
where assigns to 1 and remaining bits to 0. Given that trace equivalence is strictly weaker than may-testing this implies that may testing does not hold either.
Given the rather small size of the processes DeepSec performs these verifications in about 1 second each.
Conclusion and future work
In this paper we introduced a framework to reason about indistinguishability properties, modelled as process equivalences, in symbolic models enhanced with probabilities. Defining such a framework turns out to rely on subtle technicalities such as the need for randomized schedulers, overlooked in previous attempts. In addition to solving technical problems, we believe that randomized schedulers capture more faithfully the idea that one cannot predict how non-determinism is resolved. Randomized schedulers generalize the idea that a schedulers chooses a particular distribution among the ones available by allowing an arbitrary combination (in the convex hull) of the available distributions.
We define different, classical behavioral and labelled equivalences and show their precise relations. As usual in models mixing non-determinism and probabilities, the resulting equivalences may be considered as too strong: indeed arbitrary schedulers may leak the (private) probabilistic choices of the processes and give the attacker an unrealistically strong distinguishing power. Defining more restricted schedulers that are only allowed partial knowledge of the current state, such as in [12], is orthogonal to our work. We however believe that our work provides a convenient framework for defining such more fine-grained notions of schedulers and consider this an interesting direction for future work.
We therefore study two classes of protocols where this problem is avoided. First, we study protocols that do not make probabilistic choices, but allow the adversary to do so. This class of non-probabilistic protocols corresponds to the classical setting and captures all major case studies performed in the context of symbolic models. Our results highlight that the classical notion of may-testing, considered rather intuitive as it models an arbitrary attacker running in parallel, does not take into account attackers that make probabilistic choices. Interestingly, when bounding the number of sessions, (non-probabilistic) similarity exactly captures such probabilistic attackers and offers an attractive target for automated analysis. Second, we study a class of fully probabilistic protocols, also considered in [11], and show that trace equivalence on such protocols coincides with may testing in the presence of a (syntactic) class of determinate attackers. One may indeed argue that determinacy removes artificial non-deterministic choices that the attacker could exploit and that correspond to unrealistic behaviors. When protocols can be expressed in the class of purely probabilistic processes, from a formal analysis point, it seems appealing to do so as it also simplifies the analysis.
We also show how deciding trace equivalence can be automated in the presence of probabilities. We propose a decision procedure that extends previous work by Cheval el al [18] and implement this procedure in the DeepSec tool. Hence, we provide tool support for proving determinate may-testing on the class of purely probabilistic processes when the number of sessions is bounded and cryptographic primitives are modeled as a subterm destructor rewrite system. On more general classes of processes, our tool can be used for attack finding: disproving trace equivalence implies that may testing (and all stronger equivalences) does not hold either, therefore violating security properties stated as an equivalence.
Finally, we illustrate our framework by studying the well-known Dining Cryptographers protocol. We model the protocol and its anonymity property in the probabilistic applied π-calculus. Then, we use our framework to prove that anonymity holds, and demonstrate DeepSec’s attack finding ability on a variant of the protocol that uses a biased coin.
Our work paves the road towards several future works, in addition to exploring restricted schedulers mentioned above. The insight that (purely possibilistic) similarity takes into account probabilistic adversaries (as it coincides with may testing) when the number of sessions is bounded and protocols are non-deterministic motivates adding support for (bi)similarity in a tool such as DeepSec (which currently only verifies trace equivalence). A different direction going beyond the subclasses considered in this paper is to investigate restrictions of the scheduler (building, e.g., on ideas from [2,12]) in our framework to limit the adversary’s power without restricting the class of protocols. Finally, a more prospective direction is the use of more quantitative equivalences, i.e., distances between processes, that might be interesting to compare different protocols that try to achieve a same property.
Footnotes
Acknowledgments
We thank Alwen Tiu and the anonymous reviewers for their helpful comments and suggestions. This work has been partly supported by the ANR Research and teaching chair in AI ASAP with support from the region Grand Est and by France 2030 program managed by ANR (ANR-22-PECY-0006).
References
1.
M.Abadi, B.Blanchet and C.Fournet, The applied pi calculus: Mobile values, new names, and secure communication, Journal of the ACM (JACM) (2017).
2.
M.S.Alvim, M.E.Andrés, C.Palamidessi and P.van Rossum, Safe equivalences for security properties, in: Theoretical Computer Science – 6th IFIP TC 1/WG 2.2 International Conference, TCS 2010, Held as Part of WCC 2010, C.S.Calude and V.Sassone, eds, IFIP Advances in Information and Communication Technology, Vol. 323, Springer, 2010, pp. 55–70. doi:10.1007/978-3-642-15240-5_5.
3.
R.Amadio, Operational methods in semantics, 2016.
4.
D.A.Basin, J.Dreier, L.Hirschi, S.Radomirovic, R.Sasse and V.Stettler, A formal analysis of 5G authentication, in: ACM SIGSAC Conference on Computer and Communications Security (CCS’18), D.Lie, M.Mannan, M.Backes and X.Wang, eds, ACM, 2018, pp. 1383–1396. doi:10.1145/3243734.3243846.
5.
D.A.Basin, R.Sasse and J.Toro-Pozo, The EMV standard: Break, fix, verify, in: 42nd IEEE Symposium on Security and Privacy (S&P’21), IEEE, 2021, pp. 1766–1781. doi:10.1109/SP40001.2021.00037.
6.
M.S.Bauer, R.Chadha, A.P.Sistla and M.Viswanathan, Model checking indistinguishability of randomized security protocols, in: International Conference on Computer Aided Verification (CAV’18), Springer, 2018, pp. 117–135. doi:10.1007/978-3-319-96142-2_10.
7.
K.Bhargavan, B.Blanchet and N.Kobeissi, Verified models and reference implementations for the TLS 1.3 standard candidate, in: IEEE Symposium on Security and Privacy (S&P’17), IEEE, 2017, pp. 483–502.
8.
B.Blanchet, Modeling and verifying security protocols with the applied pi calculus and ProVerif, Foundations and Trends in Privacy and Security (2016).
9.
F.Bonchi, A.Sokolova and V.Vignudelli, The theory of traces for systems with nondeterminism and probability, in: 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), IEEE, 2019, pp. 1–14.
10.
V.Castiglioni, Trace and testing metrics on nondeterministic probabilistic processes, in: Combined 25th International Workshop on Expressiveness in Concurrency and 15th Workshop on Structural Operational Semantics and 15th Workshop on Structural Operational Semantics, (EXPRESS/SOS) 2018, Vol. 276, 2018, pp. 19–36.
11.
R.Chadha, A.P.Sistla and M.Viswanathan, Verification of randomized security protocols, in: 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS’17), IEEE, 2017, pp. 1–12.
12.
K.Chatzikokolakis and C.Palamidessi, Making random choices invisible to the scheduler, Inf. Comput.208(6) (2010), 694–715. doi:10.1016/j.ic.2009.06.006.
13.
D.Chaum, Untraceable electronic mail, return addresses, and digital pseudonyms, Commun. ACM24(2) (1981), 84–88. doi:10.1145/358549.358563.
14.
D.Chaum, The dining cryptographers problem: Unconditional sender and recipient untraceability, J. Cryptol.1(1) (1988), 65–75. doi:10.1007/BF00206326.
15.
D.Chaum, P.Y.A.Ryan and S.A.Schneider, A practical voter-verifiable election scheme, in: 10th European Symposium on Research in Computer Security (ESORICS’05), S.D.C.di sVimercati, P.F.Syverson and D.Gollmann, eds, Lecture Notes in Computer Science, Vol. 3679, Springer, 2005, pp. 118–139. doi:10.1007/11555827_8.
16.
V.Cheval, V.Cortier and S.Delaune, Deciding equivalence-based properties using constraint solving, Theor. Comput. Sci.492 (2013), 1–39. doi:10.1016/j.tcs.2013.04.016.
17.
V.Cheval, R.Crubillé and S.Kremer, Symbolic protocol verification with dice: Process equivalences in the presence of probabilities (extended version), 2023, https://hal.inria.fr/hal-03683907/document.
18.
V.Cheval, S.Kremer and I.Rakotonirina, DEEPSEC: Deciding equivalence properties in security protocols – theory and practice, in: Proceedings of the 39th IEEE Symposium on Security and Privacy (S&P’18), IEEE Computer Society Press, San Francisco, CA, USA, 2018, pp. 525–542. doi:10.1109/SP.2018.00033.
19.
K.Cohn-Gordon, C.Cremers, B.Dowling, L.Garratt and D.Stebila, A formal security analysis of the signal messaging protocol, Journal of Cryptology33(4) (2020), 1914–1983. doi:10.1007/s00145-020-09360-1.
20.
H.Comon-Lundh and V.Cortier, Computational soundness of observational equivalence, in: ACM Conference on Computer and Communications Security (CCS’08), P.Ning, P.F.Syverson and S.Jha, eds, ACM, 2008, pp. 109–118. doi:10.1145/1455770.1455786.
21.
V.Cortier and S.Delaune, A method for proving observational equivalence, in: 22nd IEEE Computer Security Foundations Symposium, (CSF’09), IEEE Computer Society, 2009, pp. 266–276. doi:10.1109/CSF.2009.9.
22.
V.Cortier, S.Kremer and B.Warinschi, A survey of symbolic methods in computational analysis of cryptographic systems, Journal of Automated Reasoning46(3–4) (2010), 225–259. doi:10.1007/s10817-010-9187-9.
23.
C.Cremers, M.Horvat, J.Hoyland, S.Scott and T.van der Merwe, A comprehensive symbolic analysis of TLS 1.3, in: ACM SIGSAC Conference on Computer and Communications Security (CCS’17), ACM, 2017, pp. 1773–1788.
24.
Y.Deng, Axiomatisations and types for probabilistic and mobile processes, PhD thesis, École des Mines de Paris, 2005.
25.
Y.Deng, R.Van Glabbeek, M.Hennessy and C.Morgan, Testing finitary probabilistic processes, in: International Conference on Concurrency Theory, Springer, 2009, pp. 274–288.
26.
D.Dolev and A.C.Yao, On the security of public key protocols, IEEE Trans. Inf. Theory29(2) (1983), 198–207. doi:10.1109/TIT.1983.1056650.
J.Goubault-Larrecq, C.Palamidessi and A.Troina, A probabilistic applied pi-calculus, in: Asian Symposium on Programming Languages and Systems, Springer, 2007, pp. 175–190. doi:10.1007/978-3-540-76637-7_12.
29.
M.Hennessy and R.Milner, On observing nondeterminism and concurrency, in: Automata, Languages and Programming (ICALP’80), J.de Bakker and J.van Leeuwen, eds, Springer, 1980, pp. 299–309. doi:10.1007/3-540-10003-2_79.
30.
R.Milner, J.Parrow and D.Walker, A calculus of mobile processes, I, Inf. Comput.100(1) (1992), 1–40. doi:10.1016/0890-5401(92)90008-4.
31.
A.Parma and R.Segala, Logical characterizations of bisimulations for discrete probabilistic systems, in: International Conference on Foundations of Software Science and Computational Structures, Springer, 2007, pp. 287–301.
32.
M.K.Reiter and A.D.Rubin, Crowds: Anonymity for web transactions, ACM Trans. Inf. Syst. Secur.1(1) (1998), 66–92. doi:10.1145/290163.290168.
33.
R.Segala and N.Lynch, Probabilistic simulations for probabilistic processes, Nordic Journal of Computing2(2) (1995), 250–273.
34.
M.Stoelinga, Alea jacta est: Verification of probabilistic, real-time and parametric systems, PhD thesis, University of Nijmegen, the Netherlands, 2002.
35.
The DeepSec tool, 2023, https://github.com/DeepSec-prover/deepsec/tree/probability.
36.
R.J.van Glabbeek, Bounded nondeterminism and the approximation induction principle in process algebra, in: Annual Symposium on Theoretical Aspects of Computer Science, Springer, 1987, pp. 336–347.