Abstract
There is an increasing number of cyber-systems (e.g., systems for payment, transportation, voting, critical infrastructures) whose security depends intrinsically on human users. In this paper, we introduce a novel approach for the formal and automated analysis of security ceremonies. A security ceremony expands a security protocol to include human nodes alongside computer nodes, with communication links that comprise user interfaces, human-to-human communication and transfers of physical objects that carry data, and thus a ceremony’s security analysis should include, in particular, the mistakes that human users might make when participating actively in the ceremony. Our approach defines mutation rules that model possible behaviors of a human user, automatically generates mutations in the behavior of the other agents of the ceremony to match the human-induced mutations, and automatically propagates these mutations through the whole ceremony. This allows for the analysis of the original ceremony specification and its possible mutations, which may include the way in which the ceremony has actually been implemented or could be implemented. To automate our approach, we have developed the tool X-Men, which is a prototype that builds on top of Tamarin, one of the most common tools for the automatic unbounded verification of security protocols. As a proof of concept, we have applied our approach to three real-life case studies, uncovering a number of concrete vulnerabilities. Some of these vulnerabilities were so far unknown, whereas others had so far been discovered only by empirical observation of the actual ceremony execution or by directly formalizing alternative models of the ceremony by hand, but X-Men instead allowed us to find them automatically.
Introduction
Context and motivation
Ellison [23] introduced the concept of security ceremony as an extension of the concept of security protocol, with human nodes alongside computer nodes and with communication links that include UI, human-to-human communication and transfers of physical objects that carry data. In particular, Ellison remarked that “what is out-of-band to a protocol is in-band to a ceremony, and therefore subject to design and analysis using variants of the same mature techniques used for the design and analysis of protocols”.
However, in contrast to security protocol analysis, for which a plethora of mature approaches and tools exist, security ceremony analysis is a discipline that is still in its childhood, with no widely recognized methodologies or comprehensive toolsets. State-of-the-art approaches and tools for security protocol analysis (e.g., [3,14,24,33,38]) cannot be directly employed for security ceremonies as they take a “black&white” view and formalize protocols by
considering one or more attackers that can carry out whatever actions they are able to in order to attack the protocol, but then modeling all other protocols actors (regardless of whether they are computers or human users) as honest processes that behave only according to the protocol specification.
When considering security ceremonies, in which humans are first-class actors, it is not enough to take this “black&white” view. It is not enough to model human users as honest processes or as attackers, because they are neither. Modeling a person’s behavior is not simple and requires formalizing the human “shades of gray” that such approaches are not able to express nor reason about. It requires modeling the way humans interact with the protocols, their behavior and the mistakes they may make, independent of attacks and, in fact, independent of the presence of an attacker.
Some preliminary approaches have been proposed for security ceremony analysis (e.g., [6,7,9,16,18,29,32,41,42,54]), but they have barely skimmed the surface of taking into account human behavioral and cognitive aspects in their relation with “machine” security.
Contributions
In this paper, we introduce a novel approach for the formal analysis of security ceremonies that focuses on the vulnerabilities that result from the mistakes that human users might make.1
This paper extends and supersedes a preliminary conference version that appeared in [46].
1. Formalization. We define a formal approach that allows security analysts to model possible mistakes by human users as mutations with respect to the behavior that the ceremony originally specified for such users. For concreteness, we focus on four main human mutations of a ceremony and their combinations,
skipping one or more of the actions that the ceremony expects the human user to carry out (such as sending or receiving a message), adding an action (e.g., sending a message twice), replacing a message with another one (e.g., sending a sub-message of the original message), neglecting to adhere to one or more internal behaviors expected by the ceremony (such as neglecting to carry out an internal action that is visible only to the agent itself, like a check on the contents of a message),
but our approach is open to extensions with other mutations.
We formalize algorithms for human mutations, which take in input a ceremony specification and produce in output a ceremony or a set of ceremonies that result from the mutated actions and/or behavior of the human agent. These algorithms cannot focus only on the mutations of the human, since human ceremony mutations will possibly have an effect also on the other agents of the ceremony, honest or malicious as they may be, humans or processes as they may be. Given a human mutation, we can distinguish two cases:
the other agents are able to reply to the human mutation because the changes are not too relevant or because the ceremony has somehow made provision for it, or the other agents are not able to reply to the human mutation. when the ceremony involves the human agent interacting independently with two or more other agents, and the human skips the steps with one agent but still has enough information to exchange messages with the other agents of the ceremony (e.g., when the ceremony is split into independent phases and the human skips phase 1 with agent 1, but is still able to engage in phase 2 with agent 2, without agent 2 noticing that phase 1 did not occur), when the human agent adds some message exchanges to a ceremony and the other agents can still participate in the communication (e.g., when the human sends the same message twice and the intended recipient is able to receive both message instances, possibly considering them as messages that are part of two different executions of the same protocol), when the human agent replaces a message with another one that can however still be received by the intended recipient (e.g., when the human agents replaces a nonce in the message with a constant that the recipient is however not able to check for freshness), when the mutation is only internal to the mutating human agent (e.g., when the agent neglects to check the contents of a message he received), or when the other agents are still able to carry out their roles in the ceremony (e.g., when their role includes an if-then-else that captures both original and mutated human behavior; this could happen, for instance, in the implementation of the ceremony, but also in the original ceremony specification where the mutation forces the human to visit by mistake only the else branch of the if-then-else).
Case (1) may occur, for instance,
In all these subcases, we can simply carry out the analysis of the ceremony after the human mutation.
Human mutations will, however, often yield non-executable models as the agents playing the non-human roles simply won’t reply, thus thwarting attacks implicitly or explicitly caused by the human mutation. One might therefore be tempted to discard case (2) as not interesting, but we argue that that is not the case, especially when one considers the way in which the ceremony has actually been implemented or could be implemented.2
This is interesting not just when the implementation is actually available (in which case one could even try to extract a specification from the code by reverse engineering or model inference), but also as a means to show proactively the pitfalls of a possible implementation.

The workflow of our approach.
2. Tool support. We have developed a prototype tool called X-Men (the name was chosen to suggest that it considers human mutations), which fully automates the workflow of our approach, except for the green box “Security Analyst Checks” in Fig. 1, which is carried out by hand by the analyst. As shown in Fig. 2, X-Men creates mutated models that are then automatically used as input to Tamarin [33], one of the most advanced tools for the automatic unbounded verification of security protocols. X-Men can be used with human mutations only (without matching, e.g., as in the case of the Coach Service ceremony), or it can be used with matching mutations that are propagated to create an executable trace that can be analyzed in search for attacks (as for our two other case studies).

The workflow of the X-Men tool: from models to mutated models that are input to Tamarin.
X-Men fully automates the process of generation and analysis of mutated security ceremony models: it takes in input the specification of a ceremony and its goals, along with the choice of mutations to be considered, generates a set of mutated ceremony models, inputs them into Tamarin and then produces a report of the analysis listing all those models for which Tamarin identified vulnerabilities, those for which Tamarin’s analysis timed out after 10 minutes, and those that Tamarin was able to verify (i.e., those models for which the analysis terminated without finding any vulnerability). The models for which Tamarin identified vulnerabilities will then need to be inspected by a security analyst to check whether these are vulnerabilities that apply also to the original ceremony and whether the mutations are representative of interesting real-life scenarios. This is similar to what happens with mutation-based testing approaches, where manual intervention of a security analyst is needed (but we point out that tests have shown that we have been able to improve X-Men with respect to the previous version [46] to reduce by almost 50% the number of ceremonies that a security analyst needs to manually investigate). As we remarked above, we leave for future work the extensions that would allow X-Men to generate test cases from the attack traces and to apply them on the ceremony implementation; here we focus on the analysis of models, but we remark that the generation of test cases from attack traces could be done along the lines of [40,51].
Like for the case in which the automated analysis of a security protocol does not terminate, a security analyst will need to inspect also the models for which Tamarin did not terminate as these might still lead to vulnerabilities (although 10 minutes is often a long enough time for Tamarin to produce an output for short protocols/ceremonies such as the ones we consider in this paper).
3. Proof-of-concept. As a proof-of-concept, we have applied our approach to three real-life case studies, the Oyster Card ceremony, the SAML-based Single Sign-on for Google Apps protocol [4] and a Coach Service ceremony, uncovering a number of concrete vulnerabilities. Some of these vulnerabilities were so far unknown, whereas others had so far been discovered only by empirical observation of the actual ceremony execution or by directly formalizing alternative models of the ceremony by hand, but X-Men instead allowed us to find them automatically.
In Section 2, we introduce our motivating and running example, and the other two case studies. In Section 3, we describe the intuitions that underlie our approach. In Section 4, we describe how we formally model security ceremonies and then, in Section 5, we describe how we formally model human mutations of a ceremony. In Section 6, we describe X-Men and its proof-of-concept. In Section 7, we report on the analysis of the three case studies. In Section 8, we discuss related work. In Section 9, we draw conclusions and discuss future work. All our formal models and the code of X-Men are available online at [57]. For readability, we have moved some of the longer figures to the appendix.
Three case studies
In this section, we describe the three real-life case studies that we consider in this paper, the Oyster Card ceremony, the SAML-based Single Sign-on for Google Apps protocol and a Coach Service ceremony. We begin by introducing the Oyster Card ceremony, which we will use as a motivating and running example.
The Oyster Card ceremony
The Oyster Card (or just Oyster, for short) is a plastic credit-card-sized, rechargeable, stored-value, contactless smart card used on public transport in Greater London in the United Kingdom. The Oyster is a form of electronic ticket that can hold pay-as-you-go credit, travel cards and passes for underground and overground trains, buses and trams. It is promoted by Transport for London (TfL) and since its introduction in June 2003, more than 86 million cards have been used [47]. Similar systems are in use in a large number of other countries in almost all continents, such as France, Italy, Denmark, Finland, South Africa, Argentina, Chile, Australia and Japan, and, interestingly, most of them suffer from problems similar to the ones of the Oyster that we will discuss in the next sections.
As shown in Fig. 3a, the Oyster is used by touching it on an electronic reader when entering and leaving the transport system in order to validate it or deduct funds. Actually, this touch-in/touch-out is part of the ceremony used on the London underground (nicknamed the Tube) and trains, which is what we focus on in this chapter, whereas on London buses passengers touch in their Oyster only when boarding (passengers are instead required also to touch out when they alight the bus in Sydney, Australia, and in some other countries like Denmark, for instance). Fig. 3b shows an entrance/exit gate of the Tube.

Using the Oyster Card in the Tube.

The ceremonies for the Tube.
Fig. 4a gives a Message Sequence Chart (MSC) of the main Oyster ceremony for the Tube, which is carried out by 3 roles: the human passenger H, the entrance gate
The human passenger H touches the Oyster on the reader at the entrance gate, which amounts to H sending the Oyster number
The reader writes an identifier on the Oyster, which amounts to
At the end of the journey, the passenger touches the Oyster on the reader at the exit gate, which amounts to H sending to
Some remarks are in order. First of all, note that we did not obtain this specification from TfL, but rather we modeled our own experience of using the Oyster. This is fine as we do not need our example to be real but rather realistic enough to showcase the main features of our approach; still, the vulnerabilities that we identify are actual problems that the real Oyster system suffers from.
Second, even though the Oyster is based on the MiFare chip, which in its first version (Mifare Classic family) used the proprietary encryption algorithm Crypto-1, our specification does not use any kind of encryption for the messages. This does not represent a lack of accuracy as we actually aim to model the ceremony in a way that is independent of the low-level cryptographic details, thereby also keeping in mind that our approach focuses on what is under direct influence and control of the human, and cryptography most likely is not. However, it would not be difficult to include encryption and decryption in our models, and in fact the language that we describe below does contain cryptographic operators.3
Note also that initial versions of the MiFare chip, and thus of the Oyster, suffered from a number of attacks [17,20,26], but the current version of the Oyster does not suffer from these problems any more since it is based on the new MiFare DESFire family that uses stronger encryption algorithms.
Third, we focused only on the core message-passing of the ceremony and did not include the information that is displayed on the screens that are placed above the gate’s reader, which show, e.g., the credit on the card when entering and exiting, and the fare of the trip when exiting.
Fourth, the ceremony in Fig. 4a is actually one of the possible ceremonies that could be considered for the use of the Oyster and several variants could be modeled, such as: a ceremony in which the reader at the exit gate does not immediately synchronize with the system, a ceremony in which the passenger does not have enough credit for the entrance gate to open (if the Oyster’s balance is too low, the gate would display a message to the passenger asking them to top up the credit on the card), or a ceremony in which the passenger changes from an overground train to an underground train or vice versa, and thereby touches the Oyster at an intermediate gate to register the change of train. Again, we aim to be realistic rather than real and, in fact, our approach generalizes to these variants quite straightforwardly.
Finally, passengers are nowadays able to pay not only with the Oyster but also with a contactless credit or debit card (possibly associated with an Apple Pay or Google Pay device). In that case, the ceremony is the same as the one in Fig. 4a but without the

Warnings issued to the Tube passengers.
Before we continue with the discussion of how we formally model security ceremonies in our approach, let us return to Fig. 3a, where the sticker beside the reader reminds passengers to always touch in and out. In fact, the London underground is quite full of posters like the ones in Fig. 5. The poster on the left of Fig. 5 reminds Tube passengers that in order to pay the right fare, they need to touch in at the start and touch out at the end of all journeys; if they do not, then TfL will not know where the passenger has traveled, so they cannot charge the right fare for the journey. This is called an incomplete journey and the passenger could be charged a maximum fare ranging between £8.00 and £19.80 [48]. Passengers who do not touch in at the start of a journey are also liable to pay a penalty fare (or could even be prosecuted).
The poster on the right of Fig. 5 warns passengers that if they touch on a reader their purse or wallet containing two or more cards (be they Oyster cards or contactless payment cards), then they could experience card clash [49]. This means that when the card reader detects two cards, it could take payment from a card that the passenger did not intend to pay with, or, more dangerously, that the passenger could be charged two fares for his journey or even two maximum fares for his journey (this happens when a passenger mistakenly touches in with one card and touches out with another card, resulting in two incomplete journeys).
It is interesting to observe that, in both these cases, security is “pushed” from the system to the human user. But humans do mistakes and this might endanger their security, which here means that they possibly have to pay considerably more than they should. Our approach allows us to show (in a formal and automated way) that indeed if passengers forget to touch in or out, or touch with two or more cards at the same time, then they will be billed unfairly.
Before we explain in detail how we formally model and reason about security ceremonies, let us first introduce our two other case studies.
Single Sign-on (SSO) protocols enable entities (companies, associations, institutions or universities, etc.) to establish a federated environment in which clients sign in the environment once and are then able to access services offered by different providers in the federation. The Security Assertion Markup Language 2.0 Web Browser SSO Profile is one of the standards on which several established software companies have based their SSO implementations. Google, for instance, developed a SAML-based SSO for its Google Applications Premier Edition, a service for using custom domain names with several Google web applications such as Gmail, Google Calendar and Google Docs.

SAML 2.0 web browser SSO profile (from [4]).
The profile is shown in Fig. 6, which, like the other figures for this case study, we have taken directly from [4]. The profile works as follows: a client C aims at getting access to a service or a resource that is located at the address
Carrying out a digital signature of a long message such as the authentication assertion is computationally expensive and might slow down the interaction between the user and the machine, so, in their implementation of this protocol in 2008, Google’s engineers simplified the authentication assertion by removing two of its parameters,

The SAML-based single sign-on for Google apps protocol (from [4]).
Armando et al. [4] formalized a model of this protocol and, by using SATMC, a model checker for security protocols that is embedded in both the AVANTSSAR Platform [3] and the SPaCIoS Tool [51], they were able to discover the attack that is illustrated in Fig. 8 by considering the following scenario. Let

The man-in-the-middle attack on the SAML-based SSO for Google applications (from [4]).
The attack is due to
Armando et al. were able to discover the attack by writing a formal model of the protocol as it had been implemented by Google, with the wrongly simplified authentication assertion, and then analyzing the model using SATMC. In this paper, instead, we show how the simplified and vulnerable version of the authentication assertion can be obtained by a human
People get to and from airports thanks to different transportation services such as underground, trains, taxis or coaches (i.e., buses or similar vehicles). As our third case study, we consider a security ceremony for (the inspection of tickets in) coach services.
The majority of coach services offer essentially two types of tickets: paper tickets and electronic tickets, or e-tickets for short (we do not consider smart-card tickets because they would require a different kind of attack and, anyway, smart-card solutions are not as widely adopted for coach services, contrarily to urban buses). Customers can buy fixed-price paper tickets physically in shops at the airports, in coach stations or directly from the coach driver on the day of travel. Alternatively, they can buy e-tickets online on the website (or the app) of the coach service. The procedure to buy an e-ticket requires the customer to choose the journey specifying the date of travel (or dates in case of a return ticket, unless it is an “open” return ticket, which has a flexible validity), insert the customer’s details and pay for the ticket. Once the payment is concluded, the customer receives a confirmation email for the payment, which also acts as the e-ticket for the journey.
Different coach services offer e-tickets that share several similarities in terms of information present on the ticket itself. We considered these similarities to define a realistic standalone e-ticket based on the information that many different tickets of many different coach services have in common. Our standalone e-ticket has the following fields: customer name, ticket number, departure date, return date (if it is a return ticket), departure time, expected arrival time, from (which specifies where the journey starts), to (which specifies where the journey ends) and the price of the ticket. In case the e-ticket is a return ticket, there are additionally fields for the return departure time and return arrival time.
A number of coach services (as well as other services that offer e-tickets) enrich the e-ticket with one or more visual fields like a barcode or a QR code: these aim to simplify and speed up the ticket inspection process. To that end, the driver is equipped with a device capable of recognizing the information within the codes so that such information does not have to be controlled manually. The e-ticket may also contain other information and the terms of condition, which we have omitted here as they are not relevant for the description nor for the attack that we will discuss later.
There is no available generic and public specification of the ceremony for coach services and how tickets are bought, delivered and inspected. Thus, rather than considering the specific ceremony of a specific service, we have observed many different such services in different cities and then synthesized them into a single plausible Coach Service ceremony.4
So, both our ceremony and tickets are realistic rather than real, but this is more than enough for our purposes in this paper. We have reviewed a handful of coach services in the UK and abroad, but note that we cannot provide more details on our observations as that might allow readers to identify the specific coach services or, even worse, identify specific drivers and ticket inspectors.

The ticket inspection sub-ceremonies.
The sub-ceremony for paper tickets is shown in Fig. 9a. The driver receives the paper ticket from the customer and then checks the journey information such as from/to and the departure time. If this information is correct, then the driver admits the customer on the coach. Note that there is no real check on the genuineness of the ticket.
The sub-ceremony for e-tickets keeps the same spirit of the sub-ceremony for paper tickets, but delegating some of the checks to technology (or so one would expect). Here too, upon request of the driver, the customer shows the e-ticket that comes as the confirmation email of the payment for the journey. The driver performs a preliminary analysis of the e-ticket to understand the type of the e-ticket checking also the presence of possible visual fields, such as the check of a QR-code as is shown in the example of sub-ceremony in Fig. 9b. Based on the presence of visual fields, the driver can make two different choices in order to validate a ticket:
check if unique details printed on the ticket (e.g., ticket number) are present on a list of allowed tickets for that journey; use the visual fields reading them with an electronic reader.
If the driver decides to check the unique details in the ticket manually, he performs a visual inspection, looking for a match of any of the unique details (e.g., ticket number, journey references, service numbers, travel route references, etc.) with a list in his possession (e.g., electronic list using a pad, a printed list). Once he finds a match, the driver validates the ticket and admits the customer. For instance, the driver can first identify the ticket number on the e-ticket, and then check whether the ticket number matches a list of allowed ticket numbers for that journey on an electronic pad (that synchronizes regularly with the main system); once (and if) a match is found, the driver validates the ticket number to make it unavailable for the future and admits the customer.
If the driver decides to use the visual fields, he needs to scan them (or a selection, e.g., QR code, barcode, etc.) with a device: once the device confirms that the ticket is valid, the driver admits the customer.
For concreteness, we can then consider the Coach Service ceremony for e-tickets that is shown in Fig. 10. The ceremony has three roles: the
The
The
The
The actual online purchase process might, of course, suffer from several possible vulnerabilities, including some that are potentially due to human mistakes, but we do not consider them in this paper.
The
The
The

The Coach Service ceremony (for e-tickets).
A preliminary security analysis of the tickets and of the ceremony is beneficial to understand possible vulnerabilities. In particular, we looked at flaws that could allow an attacker to travel for free using forged tickets bypassing the ceremony’s security measures.
Typically, coach services can rely on the difficulty of physically forging accurate copies of paper tickets, which would require specific equipment such as the proper paper, a printing machine, the identical layout the coach service is using on a digital file, etc., which are not easy to obtain. These difficulties are reflected in the lower number of checks that the Paper Ticket Inspection sub-ceremony puts in place with respect to the e-ticket one. Fig. 9a shows that, typically, no specific visual checks are performed, in particular, no checks of unique fields (e.g., ticket number). The checks are limited to the plausibility of the information only.
E-tickets, on the other hand, represent a different scenario. Coach services rely (or should rely) on the difficulty of obtaining unique information displayed on the e-ticket to discourage forging. This unique information can also be used by the coach service companies to generate visual fields that can be printed on the tickets. However, during our investigation of the processes followed by different services, we encountered a number of visual fields (e.g., a machine-readable representation of numerals and characters such as a barcode or a QR code) that, even though they could have potentially stored relevant information for the ceremony, did not contain any information but were just used as graphical means to corroborate the structure of the e-tickets. This is a potential attack vector: a coach service that offers customers to choose between different types of e-tickets such as QR code or barcode with no information (e.g., in case of specific ticket such as discount tickets) can be exploited to execute a successful forging attack.
In the case of e-tickets, the precision with which the checks are carried out is crucial. The inspector (e.g., the driver) has to perform proper checks on the unique fields displayed on the ticket. However, as we observed, drivers often apply the Paper Ticket Inspection sub-ceremony also for e-tickets, even though the Paper Ticket sub-ceremony is not designed to be secure against forged e-tickets. As mentioned above, the Paper Ticket sub-ceremony does not consider checks on visual fields and relies on the complicated procedure to forge paper tickets.
Forging e-tickets, on the other hand, is considerably easier and does not require much expertise or technical knowledge. One can use a raster graphics editor such as Adobe Photoshop (or even simpler editors like GIMP) or one can directly use an email client. In fact, after having purchased an e-ticket, a confirmation email is sent to the customer’s email address. This email also works as ticket, which can also be saved/exported as PDF file (as most of the email clients allow one to do). Once in possession of the email, an attacker can forge an e-ticket by modifying it after having exported or received it in PDF format. Alternatively, a less elaborated method to forge e-tickets is for the attacker to modify the confirmation email using the email client and to forward the new email to the attacker’s own address.
If the driver, by mistake, applies the Paper Ticket Inspection sub-ceremony to an e-ticket accurately forged, then the driver will accept the forged e-ticket and the attacker will be able to use the service for free. To validate these insights, we formalized a mutation that captures the behavioral pattern that gives rise to mistakes similar to the one made by the driver (namely, to neglect carrying our proper checks on the data received) and we have applied it to analyze the Coach Service ceremony in Fig. 10.
The standard way to formally model and analyze a security protocol/ceremony is to formalize how agents (attempt to) execute the roles of the protocol/ceremony to achieve one or more security goals in the presence of an attacker. Roles are sequences of events (sending or receiving messages, generating fresh values, etc.), which are usually represented graphically by a structure generated by causal interaction such as strands [25] or the vertical lines in MSCs and Alice&Bob notation [2], or less graphically by a process in a process algebra such as in the Applied Pi Calculus [1]. In Tamarin (and thus in the X-Men tool), a role is formalized by a so-called role script, which is basically the projection to an individual role of an extended Alice&Bob specification, and corresponds to a strand or an Applied Pi Calculus process.
We can represent this graphically by viewing the roles/strands of a ceremony as separate lines of assembled jigsaw puzzle pieces that can be connected with each other as shown in the example in Fig. 11. When complete, the jigsaw puzzle produces a complete picture: a trace of the ceremony.

A simple ceremony between a User (left) and a System (right) depicted as a jigsaw puzzle.
Now, we have all been there: you are trying to assemble one of those really difficult jigsaw puzzles, one of those where the resulting image is so complex that it is difficult to understand which pieces you should actually interlock. You start from the borders, trying to complete at least one line and proceed from there, but even that is proving to be difficult as you do not understand which pieces do really fit together. So, what do you do? You try. You try to interlock pieces that appear to fit together even though this will turn out to be wrong as they will not allow you to produce the desired image – but you do not know that yet. Or maybe you simply do a mistake and append a piece that does not belong there.
This is illustrated by Fig. 12: if the human user does not know how long the edge should be, then he could terminate it by attaching the piece pictured in red as in Fig. 12b; or the human could add one more piece to the edge as in Fig. 12c; or the human could append a wrong piece pictured in red as in Fig. 12d, which raises the question of how the remaining pieces would fit (they are thus drawn with dotted lines).

A human carrying out the role User... and mutating it, by mistake or lack of understanding.
Returning to our running example, the human user might not fully understand the ceremony role that he is supposed to carry out and
skip some actions, e.g., touching out with an Oyster without having touched in with any card, as illustrated in Fig. 12b by the anticipated termination of the role; add some actions, e.g., touch in with two cards, as illustrated in Fig. 12c by the additional piece; replace an action with another one, e.g., using a contactless credit card to touch out instead of the Oyster he used to touch in, as illustrated in Fig. 12d by the different outgoing connector, which represents a different message being sent; neglect to carry out one or more role-actions, e.g., neglecting to carry out a check on the contents of a message (such as forgetting to check if the balance is positive), as illustrated in Fig. 12e.6 Role-actions, which we will define at the beginning of Section 4.2, are internal actions carried out by the roles in the ceremony, such as sending or receiving messages, generating fresh messages or checking the contents of messages received.
It is, however, not enough to simply allow the human to carry out these unforeseen actions (add, skip, replace or neglect some parts of the role). In order to reason about what would happen if the human carried out these mutations, we need to capture the fact that a mutation of the human behavior will likely have an effect also on the other agents of the ceremony. More specifically, note the difference between the role-action in the neglect mutation, which is an “internal” action as it only involves the mutating agent, and the actions in the skip, add and replace mutations, which are instead actions that involve not only the mutating agent, but also, albeit indirectly, other agents in the ceremony.
To illustrate this, consider again, for simplicity, the ceremony between User and System in Fig. 11 and consider the scenario in which a human playing the role User replaces an event of his role with a different one, i.e., sends a message
In the first case, the System is able to reply to
It could even happen because also the role System is played a human who does a mistake, as we discuss in the case of the neglect mutation.
In the second case, the System is not able to reply to
Note that this does not mean that the User is fully aware of this. The User might just be aware of (or have been instructed about) the “then branch” of the System’s role, which captures the User’s normal behavior; think of the Oyster User who follows the touch-in-touch-out ceremony as expected. Hence, the User might, unknowingly and unwillingly, fall into the “else branch” of the System’s role (e.g., by touching out with a contactless card instead of the Oyster card that was used for touch in) and thus be billed much more than expected. The problem with these “else branches” is that they often were not present in the original specification of the whole ceremony and were added to the implementation as an afterthought, after having observed the “wrong” behavior of users, as was likely the case for the Oyster ceremony. Warnings like the ones in Fig. 5 are meant to alert the users about the “else branch” of the ceremony. We believe that rather than adding the “else branch”, it would have been better to change (the specification and) the implementation of the ceremony to forbid these mistakes (e.g., by programming the gates to warn the users that they are touching with the wrong card or with two cards), but we recognize that this might not always be possible, especially if all the software and hardware components of the System have already been deployed and installed.
This is in line with mutation-based testing [15,19,21,28], which is an approach to design software tests where mutants are based on well-defined mutation operators that either mimic typical programming errors (such as using the wrong operator or variable name) or force the creation of valuable tests (such as dividing each expression by zero). In our approach, mutants are based on mutation operators that mimic typical human mistakes and that force the creation of mutations in the other ceremony agents to match the human mutation. For concreteness, for the ceremony between User and System in Fig. 11, our approach mutates the role of the System as shown in Fig. 13 to match the human User’s skip mutation of Fig. 12b. The mutation of the step of the System to match the mutated step of the human User possibly entails a mutation of the subsequent steps of the System role (the mutation is propagated), which we again illustrate with dotted lines.

The skip mutation of the role User (as in Fig. 12b) and the matching mutation of the role System.

The add mutation of the role User (as in Fig. 12c) and the matching mutation of the role System.

The replace mutation of the role User (as in Fig. 12d) and the matching mutation of the role System.
Similarly, for the same ceremony between User and System in Fig. 11, our approach mutates the role of the System as shown in Fig. 14 to match the human User’s add mutation of Fig. 12c, and it mutates the role of the System as shown in Fig. 15 to match the human User’s replace mutation of Fig. 12d.
Finally, it is interesting to note that there is no matching mutation for the human User’s neglect mutation of Fig. 12e as that mutation is local to the human and has no effect on the other roles of the ceremony.
If these matching mutations lead to an attack, then we can check (possibly discussing with the ceremony designers) whether the mutated specification makes sense and, in any case, use the obtained attack trace to generate concrete test cases to be applied to the ceremony’s implementation. This will allow us to check whether the attack entailed by the mutations is a false positive or a real attack.
So, summarizing, our approach takes as input the specification of a ceremony and the goal(s) it should achieve, and then generates both mutations of the human agent’s role (allowing him to add, skip or replace human actions or neglect role-actions) and the matching mutations of the other roles of the ceremony. The resulting mutated ceremony models are then fed into Tamarin to search for attacks. We leave the step of concretizing the attack traces found into test cases as future work (although we expect this to be not too difficult by proceeding along the lines of [40,51]).
We import, adapt and extend notions that are used in many of the state-of-the-art approaches and tools for the formal analysis of security protocols. For concreteness, our tool X-Men builds on top of the Tamarin prover [6,7,33] to model and analyze security ceremonies with mutations caused by human users, but our approach is for the most part general and independent of Tamarin and could be applied similarly to other tools such as ProVerif [14], Maude-NPA [24], the AVANTSSAR Platform [3] and its follow-up the SPaCIoS Tool [51].
We first summarize some basic notions (importing them from papers in which Tamarin is presented and used) and then discuss the formal specification of ceremonies, the execution model, the modeling of human agents, and the security goals.
Messages
(Algebra of messages).
The term algebra of messages is given by the pairing the first projection the hash the symmetric encryption the symmetric decryption the asymmetric encryption the asymmetric decryption the signature
The function
A message term m is ground when it contains no variables. We will also call a message term simply message or term. □
Having defined the algebra of messages, we can now define what is a submessage of a message and what is the format of message.
(Submessage and format of a message).
We say that if m has no top-level function symbol, then if if if m is
The format
We adopt the equational theory for messages proposed by Dolev and Yao [22], which defines equations on the relationships between operators for composing and decomposing messages.
(Dolev–Yao-style equational theory).
Messages are composed and decomposed using the standard Dolev–Yao-style equational theory for the functions in
However, as we do for instance for the Oyster ceremony in Section 7.1, our approach allows us also not to consider explicitly the presence of a (Dolev–Yao) attacker, and instead focus on capturing the way human agents might interact insecurely with the other ceremony agents even in the absence of an attacker. So, all our agents behave honestly and follow the steps of the ceremony, but the human(s) might make mistakes. In other cases, such as in the SSO ceremony and the Coach Service ceremony (see Section 7.2 and Section 7.3, respectively), we add an explicit attacker who intentionally tries to make the ceremony insecure.9
We control the Dolev–Yao attacker by using (or not) appropriate channels. The messages used in the Oyster ceremony are not encrypted, but there is no reason why they could not be. The SSO ceremony, in contrast, includes explicit cryptographic operations.
To formally specify and analyze security ceremonies, we adopt standard notions used for security protocol analysis in Tamarin, e.g., [6,7,33]. We will try to be thorough but also succinct and readers interested to know more about Tamarin are advised to refer to the papers, to Tamarin’s user manual [50] and its website
Formally, a role script is a sequence of events
Send and receive events are of the form
In the
The fresh event
The start event is the first event of a role script and occurs only once:
As a concrete example, let us return to the Generalized Main ceremony for the Tube. As shown in Fig. 4b, this ceremony has 3 roles: the human H and the entrance and exit gates

The role scripts for the roles of the Generalized Main ceremony for the Tube.

The rules for the human agent in the Generalized Main ceremony for the Tube.
We take advantage of constants in Tamarin to identify values received and sent during a ceremony. In [7], constants are used to define “tags” in order to represent the interpretation of the values in the knowledge of a human agent. We also make use of constants but we use them to define a basic notion of types, denoted in Tamarin’s notation by writing the name of the type in single quotes, e.g., ‘type’. In this paper, we only consider types of ground terms, such as the type ‘
This is enough for all ceremonies that we have encountered so far, so we leave a more thorough investigation of types to future work.
from now on we will often omit constants in role scripts and rules, so that when we write m, the reader should please mentally replace it with the constant-message pair

Agent rules for the

Agent rules for the
Our approach is based on Tamarin’s execution model [33], which is defined by a multiset term-rewriting system like in many other security protocol analysis tools (e.g., [3,24,35]). In the following, we recall from the Tamarin manual and papers the definitions of the notions of states, facts, traces, protocol specification and its rules. A state is a multiset of facts that model resources, including the information that agents know and exchange. More formally:
(States and facts).
A system state is a multiset of facts: linear facts model exhaustible resources and they can be added to and removed from the system state, persistent facts model inexhaustible resources and can only be added to the system state (persistent fact symbols are prefixed with “!”). The initial system state is the empty multiset. □
A trace is a finite sequence of multisets of role-actions that is generated by multiset rewriting.
(Traces).
A trace
Consider, for instance, a generic ceremony between a human agent H and possibly pairwise distinct agents
(Protocol model).
A protocol model consists of the agent rules, the fresh rule, channel rules and attacker rules. □
These rules are defined in the remainder of this section. The fresh rule
Tamarin comes equipped with standard Dolev–Yao attacker rules and with channel rules (introduced in [6]) to model the sending and receiving of messages over authentic/confidential/secure channels, and thus control the ability of the attacker (who, e.g., cannot send, read or replay messages on a secure channel, although he might still be able to interrupt the communication).
Agent rules specify the agents’ state transitions and communication. For instance, the rules for the human agent in the Generalized Main ceremony for the Tube are shown in Fig. 17, whereas the rules for the
The translation of the different channels into Tamarin is quite natural, e.g., by means of rules such as
We have given the rules of the Oyster case study. Before we introduce the goals of the case studies and how they can be formalized in Tamarin’s language, let us give the rules for the two other ceremonies we consider in this paper.
The agent rules for the
Rules of the Coach Service ceremony
The agent rules for the
Goals
Goals express the security properties that a ceremony is supposed to guarantee. However, many ceremonies, such as the Oyster ceremony, as we discussed in Section 2.1, “push” security from the system to the human agents. This is made evident by the three goals that we define and analyze for the Oyster ceremony:
the human ends his journey touching in and out; the human ends his journey using the same card to touch in and out; the human does not touch two cards in and out.
These goals refer to a single journey, i.e., a single ceremony run. We formalize this in our Tamarin specifications by including explicit restrictions (through the OnlyOnce restriction [50] in the Setup phase; see our models in [57]) to force the human to carry out a single journey in the ceremony. While these three goals capture the main warnings issued by TfL (cf. Section 2.1), one could of course consider a number of other goals, e.g., variants of GO3 that formalize the case in which the human touches in with two cards (regardless of whether he then touches out with one or two of these cards) or touches out with two cards (regardless of whether he touched in with one or two of these cards), or goals that span multiple journeys (i.e., multiple ceremony runs). We leave the investigation of such goals for future work as our aim here is not to carry out an exhaustive analysis of the Oyster ceremony or of the other case studies, but rather to use the goals we consider as examples for our proof-of-concept on how one can use our approach for the analysis of security ceremonies.
We can formalize these three goals using Tamarin’s notation as shown in Section 4.4.1, whereas in Section 4.4.2 and Section 4.4.3 we formalize some of the goals of the SSO ceremony and of the Coach Service ceremony, respectively.
Goal for the Oyster ceremony
Goal GO1 is formalized in Tamarin’s language by the lemma in Listing 1, which, in addition to
The security goal GO1 can be read as follows: if an agent

Lemma for the security goal GO1
Goal GO2 is formalized in Tamarin’s language by the lemma in Listing 2, which can be read as follows: if an agent

Lemma for the security goal GO2
Goal GO3 is formalized in Tamarin’s language by the lemma in Listing 3, which uses the role-action

Lemma for the security goal GO3
It is important to note here the connection between the role-action

Lemma for the security goal of SSO
For the SSO ceremony, we have formalized the goal “
This security goal can be read as follows: if some parameters
Goal for the Coach Service ceremony
We consider one security property: authenticity of the ticket. Other properties (e.g., ensuring that customers are billed properly or the other security properties for the Oyster ceremony) can be modeled and analyzed similarly. This goal is formalized in Tamarin’s language by the lemma in Listing 5, which uses, in particular, the following role-actions:

Lemma for the authenticity of the ticket in the Coach Service ceremony
The Coach Service ceremony verifies the lemma
As for the Oyster ceremony, this goal refers to a single ticket inspection ceremony, i.e., a single ceremony run, that comprises also the purchase phase in which that specific ticket has been bought. If an agent is able to end the ceremony with a ticket that has not been bought for that specific journey (e.g., the agent has changed the date of an old legit ticket to make the ticket valid once again), then the security goal is falsified as the two date fields in the role-actions
The mutations that humans carry out when executing a ceremony have repercussions also on other agents and thus on the whole ceremony. As we remarked above (cf., in particular, Fig. 1), we thus need to define not only the human mutations, which modify a ceremony trace by mutating the subtrace of the human agent, but also the mutations on the subtrace(s) of the other agent(s) that are likely (albeit not necessarily) required to “match” the mutations of the human; for instance, to receive the new or modified message sent by the human or to mirror the skip of the human actions. By introducing the human mutations, matching them appropriately (if and when needed), and propagating these mutations (if and when needed) throughout the specification, we will eventually obtain fully mutated ceremony traces, which our tool feeds into Tamarin, first to check if they are executable and then to analyze them with respect to the corresponding goal(s). Of course, in some cases, depending on what is mutated, a human mutation might yield a full ceremony trace without having to match this mutation and propagate the changes.
We first introduce, using a fairly high-level language, the notion of a generic human mutation along with the notion of a matching mutation.
(Generic human mutation and matching mutation).
A generic human mutation is a function
A matching mutation for a human mutation is a mutation
The combination
In the following subsections, we will instantiate these generic definitions to define the four human mutations
Slightly abusing notation, we will write
This is in the spirit of the step compression technique that is adopted in several security protocol analysis tools, such as [3]. The idea is that some actions can be safely lumped together. For instance, we can safely assume that if a role is supposed to send a reply to a message it received, then we can compress the receive and send actions into a single transition.
Then, the trace (1) can be rewritten as follows, where we now embed
We can now instantiate the generic human mutation in Definition 7 to the case in which a human skips one or more actions
(Skip mutation).
A
For example, Fig. 20 shows a human subtrace in which H skips the

Examples of mutations in the case of the Oyster ceremony.
We have identified five different
We describe the five
In this case, H, having arrived at state
For simplicity but without loss of generality, in the following we assume that the (fresh and “other”) facts in
where μ denotes the mutation composed of
This allows us to illustrate the need for matching and propagation on a concrete example. We namely need to consider if and how the mutated trace can be completed, for instance when H receives from if if
Our tool implements these options as described in the pseudo-code in Algorithm 1 and Algorithm 2; note that comments are written in italics and their start is denoted by ⊳.

The pseudo-code is hopefully quite explanatory, also thanks to the comments in the algorithms, but there are a couple of steps that deserve clarification. First of all, what does it mean that
we consider only mutations of a message m that preserve the format of m.

Matching mutation
In this case, H skips
The pseudo-code for this case is in Algorithm 8 and in Algorithm 9; for the sake of readability, we give these and most of the following algorithms in the appendix.
The
mutation
In this case, H skips
The pseudo-code for this case is in Algorithm 10 and in Algorithm 11.
The
mutation
In this case, H skips both
The pseudo-code for this case is in Algorithm 12 and in Algorithm 13.
The
mutation
In this case, H skips both
The pseudo-code for this case is in Algorithm 14 and in Algorithm 15.
The
mutation
There are two different cases for this mutation: the human could
send at any time any message that is created using the elements of one of the sets in
duplicate a send action.14
Note that we only consider mutations initiated by a human agent; as a consequence, we do not consider the situation in which the human agent initiates a mutation of the ceremony by adding a receive action as that would require another agent (human or not) to have added the corresponding send action first. In contrast, the approach of [7] allows human agents to receive arbitrary messages but without considering mutations for the human or other agents.
Note that if we allowed the human to send any message that he can build out of his current knowledge, then we would have to deal with an infinite set of options since, even if the human knows only one thing, such as his name, the message generation and analysis rules will allow him to generate an infinite set of messages (we will have the same problem with the
An
Consider the beginning of the trace (2). The mutation
The add mutation is best exemplified when combined with the other mutations, e.g.,
sending at any time any message that the human knows can be combined with a duplicating a send action can be combined with a
For example, H could start the Oyster ceremony touching in with one card
Our tool implements this mutation as described in the pseudo-code in Algorithm 3 along with Algorithm 4 for the matching mutation.


Matching mutation for
Note that in this matching mutation algorithm we allow the agent A to receive two copies of the message sent by the human, but do not duplicate the ensuing send action by A. This is in line with what we wrote above about the add mutation being best exemplified when combined with a replace mutation, and indeed this is what happens in our Oyster example, in which add&replace allows us to consider the case in which the human touches in with two cards, an Oyster card and a credit card as in Fig. 20, and other similar cases. One could consider an alternative matching mutation algorithm in which A’s send action is duplicated too and this is propagated accordingly, as well as other algorithms that mix different ways of adding and duplicating transitions.
This mutation captures the fact that a human user may send a message in place of another one (the case in which H replaces an action of a ceremony with another one is obtained by combining a
A
Again, we show the effect of the
We give only one algorithm for
For example, H could start the Oyster ceremony with one card


Matching mutation for
This mutation captures the fact that a human user may not adhere to one or more internal behaviors expected by the ceremony, e.g., the case in which the coach
A
In the specific case that we are considering in this paper, the mutation of a Tamarin role-action in a transition rule
We cannot illustrate this mutation by means of the beginning of the trace (2) as we need to show the role-actions explicitly, but we can use the Coach Service ceremony and, in particular, consider the following mutation of the rule
This mutated rule reflects a mistake of the
Our tool implements the
since we defined a strong correlation between some of the events in Tamarin and some facts (cf. Section 4.3), the
the
Another possibility would be to extend the mutation to deal with the removal of the Tamarin role-actions necessary for a security goal, but we leave this to future work.
if the
in the role scripts of the other agents there is nothing that corresponds to the events of another agent (so the other agents will not notice if one or more Tamarin actions of the human are removed).

Note that in this algorithm if the human neglects a check at some point, and thus the corresponding role-action is removed, then the algorithm also removes the same check (and role-action) in all the following transitions in which the check appears (if at all, since in fact it will depend on the ceremony, and on how the modeler models it, whether the check is carried out only once or repeated in many transitions). This models the fact that the human neglects that check altogether. An interesting variant that could be considered in the future is the case in which the role action is removed in only one transition but kept in all subsequent ones, thereby modeling a momentary lapse.
In this section, we describe the X-Men Tool and how it generates the mutated models that are then analyzed using Tamarin.
As we remarked in the introduction, X-Men fully automates the workflow of our approach, except for the green box “Security Analyst Checks”, which is carried out by hand by the analyst (cf. Fig. 1). As shown in Fig. 21, X-Men fully automates the process of generation and analysis of mutated security ceremony models that are then automatically used as input to Tamarin. X-Men can be used with human mutations only (without matching, e.g., as in the case of the Coach Service ceremony), or it can be used with matching mutations that are propagated to create an executable trace that can be analyzed in search for attacks (as for our two other case studies).
Our approach works in three fully-automated phases as shown in Fig. 2:
the preprocessing phase, which prepares the specification file for the mutations, the mutation phase, which generates the mutated specifications, and the analysis phase, which invoke Tamarin to carry out the analysis and outputs a report of the results.
The preprocessing phase
The entire execution of the X-Men Tool is managed by two scripts, Wolverine and Xavier, which are written using the Python programming language and which collaborate with each other to make it possible to generate mutated models and analyze them automatically. The starting point of the entire process is the Wolverine script which, in Fig. 2, is shown as “Slicer/Joiner”: Wolverine’s primary functions are to take care of the slicing, and consequently the joining, of the input ceremony models.17
We named the script after the mutant Wolverine, a member of the X-Men who possesses retractable adamantium claws that can slice through (almost) anything.
In addition to these files, another file is created to preserve the original model file as a backup copy. This file will be restored at the end of the execution of the X-Men Tool to allow the security analyst to execute the tool as many times as desired with minimum hassle when dealing with modifications of the original files.
a first part that contains everything from the beginning of the model until the channel rules (included),
a second part that contains the functions and all the agent rules,
a third part that contains the restrictions and lemmas.
The mutation phase is managed by the X-Men Tool, which is represented in Fig. 2 by the black rectangle with a yellow-black “X” containing the behavioral patterns library. X-Men is written using the Java programming language following the Model–View–Controller software design pattern. To be able to read model files written in the Tamarin syntax, X-Men takes advantage of another component called ANTLR [37], which stands for

The GUI of the X-Men tool.
Once Wolverine has invoked X-Men, a Graphical User Interface (GUI) appears as shown in Fig. 21a. The GUI is quite clear and simple, and guides a security analyst through the few steps needed to generate the mutated models. In fact, when opened, the GUI allows the security analyst only to upload a model. Once a model has been uploaded, a new interface tab allows the security analyst to select which mutations he wants to apply. The new tab, shown in Fig. 21b, allows the security analyst to select the four mutations (skip, replace, add, neglect), their variants (e.g., S, SR, R for skip, submessages or type for replace, etc...) and their combinations, and any other mutation that will be defined in X-Men’s library of behavioral patterns in the future.
Once the security analyst has selected the mutations he wants to apply to the model, he can execute the generation of the mutated models by clicking on the button that is shown in Fig. 21a. Once X-Men has terminated the generation of the mutated models, Wolverine takes back the control of the process and performs the joining of each mutated model created from part (2), which includes the functions and all the ceremony agent rules, with the other two parts, part (1) containing the initial definitions and the channel rules and part (3) containing the restrictions and lemmas. Once the joining has terminated, the X-Men Tool completes the generation of the mutated models and feeds them to the analysis phase.
As we discussed above and as shown for our three case studies in Table 1, X-Men generates a large number of mutated models. We have implemented the Python script Xavier to tackle this issue.19
We named the script after the mutant Professor X (Charles Francis Xavier), the founder and leader of the X-Men.
whether one or more properties have been falsified (and thus a vulnerability identified),
whether Tamarin has timed out before finding a proof (we have set a timeout of 10 minutes, which is usually a long enough time for Tamarin to produce an output, but of course a longer timeout could be set, especially for large and complex ceremonies), or
whether the model has been successfully verified.
Table 1 summarizes the reports generated by Xavier for our three case studies. The first column shows which mutation is considered, the second column shows which version of that specific mutation is considered, the third column shows if the attacker is enabled. The following columns display the results for each case study: how many mutated models were generated by X-Men, how many mutated models had a security property (or more properties) falsified by Tamarin, how many models made Tamarin time out, and how many models were verified by Tamarin.
Mutated models generated by X-Men (“Yes” indicates that the attacker is activated, “No” indicates that the attacker is not activated, “–” indicates that that mutation is not considered in that case study)
It is important to note that, regardless of the performance of the X-Men Tool in the generation of mutated models, the overall performance of the entire analysis process depends on the actual performance of Tamarin and its prover, in addition to the actual performance of the computer on which X-Men is run. As is commonly the case in security protocol analysis, the security analyst will need to carry out an inspection of those models for which Tamarin timed out. Moreover, even though X-Men automates the entire process, the security analyst will need to inspect the models for which Tamarin identified vulnerabilities in order to check whether these are vulnerabilities that apply also to the original ceremony and whether the mutations are representative of interesting real-life scenarios. This manual analysis is made easier by Tamarin’s graphic mode, which displays attack traces in a graphical and thus human-friendly way.
As we remarked above, this need for the manual intervention of a security analyst is similar to what happens with mutation-based testing approaches [15,19,21,28,40,51], and we leave for future work both extensions that would allow for more automation of the manual inspection and extensions that would allow X-Men to generate test cases from the attack traces and to apply them on the ceremony implementation.
In this section, we show how our formalization can be used effectively for finding attacks that are due to the (mis-)behavior of human agents in security ceremonies. As proof-of-concept, we have applied our tool X-Men to three case studies, the Oyster ceremony, the Single Sign-on ceremony and the Coach Service ceremony. As remarked above and shown in Table 1, X-Men generated a large number of mutated models for these ceremonies. These mutated models are then automatically analyzed by Tamarin thanks to Xavier activating the attacker rules when necessary (for the SSO and the Coach Service case studies but not for the Oyster one). The next three subsections summarize the results of the analyses of the three case studies, and in Section 7.4 we then discuss how to exploit the results of an analysis of a security ceremony.
Analysis of the Oyster ceremony
Table 2 shows some of the attacks found with the models obtained by applying the mutations skip, replace and add&replace to the Oyster ceremony. The table does not show the results pertaining to the 92 models generated by the add mutation: the report generated by the analysis indicates that 42 of these models are verified and 2 cause Tamarin to time out, whereas 48 models are falsified and require a check for false positives by the security analyst. We carried out that check and concluded that all these 48 falsified models are not realistic attacks as they represent cases in which the human agent systematically tries all the possible combinations of parameters without following the logic behind the ceremony. As we will discuss in more detail below, although many generated models can be excluded automatically as they are verified, there are still many models that are falsified (or time out) for which an inspection by the security analyst is required and it will be useful to provide additional automated support for this inspection. While it will be challenging to fully automate the inspection, we are currently working at devising techniques to identify common features among such models and thus group them into a small(er) equivalence classes, so that the analyst would then need to consider only one model per class. Distinguishing the falsified models that require matching mutations (like all these 48 models do) from those that do not require matching already provides some very initial but useful classification.
Some of the attacks found on the models obtained using the mutations applied to the Oyster ceremony (“✓” indicates that an attack has been found, “×” indicates that no attack was found, “∙” indicates that the functional goal is verified)
Some of the attacks found on the models obtained using the mutations applied to the Oyster ceremony (“✓” indicates that an attack has been found, “×” indicates that no attack was found, “∙” indicates that the functional goal is verified)
The column “Mutated model” of Table 2 lists the file identifier of the generated file (as used at [57]) and the table also provides mutation details as well as a brief explanation of the human agent’s behavior for each model. In addition to the three goals discussed in Section 4.4, we have used Tamarin to check the functional goal (a.k.a. executable goal) that the mutations did not create models that are not able to complete the message passing terminating with the last rule. All the models considered in the table passed this check.
We describe three interesting attacks that Tamarin has been able to find out of the many mutations generated.
Attack #1. The MSC of the attack (Fig. 22) shows how the human agent H may execute the Oyster ceremony without touching-in at the entrance (as shown by the dotted arrows representing the human agent who is not touching-in). H touches-out the
This is a real scenario that occurs when the passenger forgets to touch-in, e.g., when the station has no proper gates but only card readers at the station entrance, when the gates are already open (TfL sometimes opens the gates to speed up entry/exit during rush hour or when there are a large number of passengers), or when the reader is not working properly and does not read/write the Oyster card.

Attack that represents the Incomplete journey scenario for the Oyster ceremony.
Attack #2. H may use two different cards in a single journey, touching-in with the first and touching-out with the second, so that GO2 fails with two incomplete journeys. This may appear to be an uncommon scenario, but several tourists and even Londoners suffered from this problem, and still do. For instance, a passenger might have two Oyster cards in her pocket and confuse them, or the passenger might use Apple/Google pay (cf. Section 2.1) but using two different devices, say smartphone and smartwatch, which will cause two incomplete journeys because the Device Account Number is unique for each device and is used by TfL as the identifier for a single journey.
Attack #3. The MSC in Fig. 23 shows how H may use two cards (e.g., Oyster and a contactless card), simultaneously touching them both in/out when entering/exiting (as shown by the dotted arrows representing a parallel second execution of the Oyster ceremony), so that GO3 fails due to a card clash (cf. Section 2.1). This occurs, e.g., when a passenger touches with a wallet that holds all the passenger’s cards that the system considers to be valid payment cards.

Attack that represents the Card clash scenario for the Oyster ceremony.
The attacks on the Oyster ceremony were found using the mutations generated by X-Men as shown in Table 2. The analysis did not require the activation of a Dolev–Yao attacker as the system, through the matching mutations, replied and billed the passengers “wrongly” due to their mistakes. Hence, in this case, the matching mutations represent the concrete behavior of the implementation of the TfL system. While these attacks are, to some extent, known to TfL (cf. their warnings in Fig. 5) and can be gathered empirically by observing the concrete behavior of the Tube passengers, it is important to stress that X-Men allows us to find them automatically (based on an analysis of the specification, rather than out of observations in practice). Other attacks might be found by considering other goals or other mutations. Moreover, in the style of model-based testing (see the end of Section 3), it is possible to use the attack traces to generate concrete test cases to be executed on the code of the ceremony (if that is available).
We have specified SSO as a ceremony in X-Men, considering what would happen if
The attack found on the models obtained using the mutations applied to the Single Sign-on ceremony (“✓” indicates that an attack has been found, “×” indicates that no attack was found, “∙” indicates that the functional goal is verified)
The attack found on the models obtained using the mutations applied to the Single Sign-on ceremony (“✓” indicates that an attack has been found, “×” indicates that no attack was found, “∙” indicates that the functional goal is verified)
Note that, in fact, SSO is a security protocol that does not involve human agents; in particular, the IdP is not a human, but in other contexts humans might indeed serve as identity providers and they might make mistakes like the ones we consider here, so we believe this example to be quite useful to illustrate this issue. When we were looking for a case study that would allow us to show the importance of the replace mutation, we choose the SSO example since Google engineers had implemented the protocol by replacing the original message in the standard specification with a submessage. In [4], the attack was found by human analysts (Armando et al.) inspecting the implementation and writing a specification that would represent the mistake done by the engineers. In a sense, Armando et al. had perceived that removing some fields from the messages might lead to attacks and they carried out an automated analysis to validate their intuition. We decided to use the SSO example to illustrate that, in contrast, our approach allows us to avoid having to rely on intuition and instead directly start from the actual specification and then consider an admittedly large number of mutations that however includes the one that gives rise to the attack. We could have looked for other case studies, but we wanted to show that our approach would have allowed security analysts to find the possible attack starting from the specification as it was written in the standard.
The specification of the Coach Service ceremony includes:
a phase in which the customer buys a genuine e-ticket (this phase is needed to obtain a first e-ticket that will be modified later, for other journeys) and
a ticket inspection phase performed by a driver of a coach service.
We have specified the Coach Service as a ceremony in X-Men, considering what would happen if
The decision to use the attacker rules is due to the following facts:
the neglect mutation focuses on showing a mistake caused by not carrying out a check rather than the creation of some messages, the creation of a forged ticket by the the use of another mutation of the ceremony that affects a different actor, in this case a mutation of the
We considered the scenario in which an attacking
The attack found on the models obtained using the mutations applied to the Coach Service ceremony (“✓” indicates that an attack has been found, “×” indicates that no attack was found, “∙” indicates that the functional goal is verified)
The attack found on the models obtained using the mutations applied to the Coach Service ceremony (“✓” indicates that an attack has been found, “×” indicates that no attack was found, “∙” indicates that the functional goal is verified)

Attack on the Coach Service ceremony (for e-tickets): the
We now briefly discuss how to exploit the results of the analysis of a security ceremony carried out using our approach. Similar to what happens for the formal analysis of security protocols, if the tool terminates and verifies the model under consideration, then we can provide a security guarantee. In the case of security ceremonies, our approach allows us to provide such guarantee not only for the original ceremony but also for its mutations for which Tamarin’s analysis terminates with a proof. To some extent, this applies also when the analysis times out, although in that case, like in the case of a non-terminating analysis of a security protocol, an intervention of the security analyst is needed since the tool has not been able to give a definitive answer.
If the analysis of a security protocol model instead terminates with the discovery of an attack, typically the attack trace can be used to distill a fix to the protocol specification; one would also usually wish to check whether the attack on the model also applies for the concrete implementation (assuming that it is available), so the attack trace can also be used to devise test cases for the implementation (see, e.g., [27,40,51]). The same applies in the case of our mutated ceremonies when the model did not need to match and propagate mutations. Then, we can use the attack trace to distill a fix and generate test cases. If the model contains matching mutations, then the security analyst should check whether the matching of mutations yields a false positive that makes little sense in real life or whether the attack is a real attack (and then one would want to generate test cases similar to what is done in mutation-based testing).
In both these cases (real attack or false positive), given the presence of human users, one can also think of using the attack trace to distill recommendations and guidelines for the users of the ceremony so that they interact with it in a way that does not endanger security. After noticing the issues caused by the unexpected use of the Oyster/credit cards at the in/out gates, the posters in Fig. 5 were hung in the London underground to remind users about the expected use of the cards, and one could use our approach to distill similar recommendations from a formal analysis before the system is actually deployed and the issues observed in practice. Similarly, one could use the attack traces as a basis to devise specific training for the ticket inspectors of the coach ceremony (as well as for humans acting as identity providers like in the SSO example), pointing out which behaviors and mistakes will lead to attacks. The rules of [7] that restrict how the human can deviate from the protocol specification are a good example for such guidelines. In future work, we aim to investigate if and how recommendations and guidelines could be generated (semi-)automatically from the analysis of security ceremonies and their mutations (similar to the generation of test cases), and how they could be communicated to human users in an effective way. To that end, we plan to exploit also our works on how to provide security explanations to laypersons [52,53,55,56] and on how to beautify security ceremonies [11–13] and thus make their secure use more appealing to human users.
Related work
In this section, we discuss related work, expanding on the discussions that we have already given in previous sections. We compare our approach with [7], which is the most closely related work, but we also consider research that has inspired our work or that might provide interesting future synergies.
In [38], Paulson introduced the Oops rule to model mistakes done by agents when executing a security protocol, such as the loss (by any means) of a session key. However, the notion of security ceremony and the explicit investigation of the consequences of explicitly considering human agents and their mistakes was introduced by Ellison in [23], one of the pioneers of socio-technical security.
One of the first formal approaches to investigate security ceremonies is the concertina model introduced in [9], which spans over a number of socio-technical layers, focusing in particular on the socio-technical protocol between a user persona and a computer interface, but without explicitly considering human mistakes nor accounting for an explicit attacker. Similarly, the approaches in [16,31] provide a formal model to reason about how a Dolev–Yao-style attacker can attack the communication between humans and computers, including storing of human knowledge, but without explicitly considering human mistakes.
In contrast, Basin et al. [7] provide a formal model for reasoning about some errors that humans involved in security protocols may make. They specify rules formalizing different types of humans (untrained, infallible or fallible humans), modeling a human who can send and receive any messages, resulting in attacks because a human discloses information, but also in attacks because the human just enters the same information on the wrong device or accepts a received message he should not. They successfully applied their model to analyze some authentication protocols. Although their approach is similar in spirit to ours and there are some affinities, there are fundamental technological, methodological and philosophical differences between our approach and that of [7]. The four main differences are the following ones.
First, there are possible human behaviors that we consider that [7] does not. For instance, they consider a rule that allows humans to send “controlled” messages that are in their current knowledge. This overlaps with our add mutation and, to some extent, with the replace mutation, but their approach does not have an exact equivalent for the skip mutation nor for the neglect mutation. They can model a situation in which a human sends a message that he was meant to send much later in a ceremony, and that amounts indirectly to a form of skip, but our skip mutations cover a more general landscape. The approach of [7] includes an
Second, Basin et al. only consider scenarios in which the Dolev–Yao attacker actively attacks the protocol, whereas our approach works also when the attacker is not present thanks to the matching mutations. Fundamentally, as is standard in security protocol analysis, their approach takes the point of view of the Dolev–Yao attacker, in the sense that they look for what the Dolev–Yao attacker can do when human agents make the mistakes that they model in their approach. We do not take this point of view, but instead, in a sense, take the one of the human agent as we consider whether attacks can occur due to human mutations even in the absence of the Dolev–Yao attacker, as we do for instance in the case of the Oyster case study. This is not just technologically different (as the modeling is different) but also philosophically as via the mutations the human agent “forces” the attack on herself even in the total absence of an attacker (as in the Oyster example). We also include the possibility in which the Dolev–Yao attacker might be present and exploit the human’s mistakes, as in the case of the SSO and Coach Service examples. In fact, the Coach Service ceremony also lends itself well to consider two humans, a customer who attacks by replacing some fields of a ticket and a ticket inspector who neglects to carry out all checks. While our mutations could possibly be modeled by restricting or extending the Dolev–Yao behavior, this would make us lose the philosophical difference that our approach works also in the absence of the attacker.
Third, while our approach allows a security analyst to consider only human mutations like [7], we formalize also matching and propagating mutations. Many of the human mistakes that we consider, if left to themselves, would not lead to an attack as the ceremony analysis would not terminate. As we discussed earlier, by matching and propagating the mutations, we investigate how they could lead to attacks, thus providing a wider (and in a sense also deeper) analysis. It is thanks to these mutations that X-Men is able to (re-)discover automatically the attacks on Oyster, SSO and Coach Service. If we did not have these mutations, then our approach, like a Dolev–Yao-based one that only considers the original specification even in the presence of human mistakes, would not be able to discover many of them as the ceremony execution would not terminate. Even if we took the Dolev–Yao-centric approach, we would still need to extend the Dolev–Yao attacker to be able to match the mutation (e.g., to be able to receive any message, in addition to being able to send any message). So, our approach, in contrast to that of [7] and instead similar to mutation-based testing, is not sound in the sense that we get false positives that the security analyst should check to identify attacks that would not exist if, e.g., the specification was implemented as originally specified since then the other agents would not respond to the human mutations. In return, however, our approach also encompasses the cases in which the original specification would allow them to respond, and we do so for more possible human mistakes.
Fourth, there are features of the approach of [7] that our approach does not yet capture. For instance, the approach of [7] allows human agents to receive arbitrary messages (with a certain structure). Since we only consider mutations initiated by a human agent, we do not consider the situation in which the human agent initiates a mutation of the ceremony by adding a receive action as that would require another agent (human or not) to have added the corresponding send action first. We will consider adding this case in the future.
Moreover, the rules of [7] that restrict how the human can deviate from the protocol specification provide guidelines for users to interact with a protocol in a secure way. As we remarked above, in future work we plan to investigate how to extend our mutation-based approach to generate recommendations and guidelines, and how they could be communicated to human users in an effective way.
Another quite closely related work is that of Bella, Giustolisi and Schürmann in [10], which was partly inspired by our previous joint work [45] as well as the preliminary version of this paper [46]. In contrast to our approach, all technical components in their approach are assumed to behave as intended, and they consider explicitly distributed and interacting human threats, so that every human may misbehave for his personal sake, without any fixed prescription to collude with others, yet may directly favor someone else. There are also differences in the formal modeling of security ceremonies, since they employ epistemic modal logic, which allows them to formalize ceremonies, threats, and properties in a way that many readers will find more intuitive than when using Tamarin’s constructs, but this comes at the cost of an extra effort for their encoding into the prover of choice (Tamarin or another tool).
We have also joined forces with Bella and Giustolisi in previous works, in particular [45], in which we have given a systematic definition of an encompassing method to build the full threat model chart for security ceremonies from which one can conveniently reify the threat models of interest for the ceremony under consideration. To demonstrate the relevance of the chart, we formalized this threat model using Tamarin and analyzed three real-life ceremonies that had already been considered, albeit at different levels of detail and analysis, in the literature: MP-Auth [7], Opera Mini [47], and the Danish Mobilpendlerkort ceremony [31]. The full threat model chart suggested some interesting threats that had not been investigated although they are well worth of scrutiny. In particular, we found out that the Danish Mobilpendlerkort ceremony is vulnerable to the combination of an attacking third party and a malicious phone of the ticket holder. The threat model that leads to this vulnerability had not been considered before and arose thanks to our charting method. We are currently working at combining the threat model chart with the mutation-based approach that we presented in this paper, which we believe will help us identify novel vulnerabilities but also, we hope, reduce the number of mutations that need to be considered.
Other related approaches are those of [8,18,29,39,41]. Similar to [7], Curzon et al. [18] propose a formal human model that includes a specific attacker able to exploit the errors against the human user. The errors considered are those caused by the humans’ interpretation of the system and by the design of the interfaces, but not those entailed by human choices or mistakes as we do. Moreover, they do not consider communication channels.
Johansen and Jøsang [29] define probabilistic processes to model the actions of a human agent, separating the model of the human and that of the user interface. They introduce a “compilation” operation in order to capture the interaction of the human agent and the user interface. Their probabilistic model for the human agent is an extension of the persona model [44]. Their approach provides only a preliminary formalization without a security analysis.
Beckert and Beuster [8] provide a formal semantics for GOMS models augmented with formal models of the application and the user’s assumptions about the application, but they do not consider human mistakes in detail.
Pavlovic and Meadows [39] employ actor-networks as a formal model of computation and communication in networks of computers, humans and their devices, but they too do not consider human mistakes in detail.
Radke and Boyd [41] introduce the notion of human-followable security wherein a human user can understand the process and logic behind authentication protocols. They focus on showing how to transform existing authentication protocols into protocols with human-followable security.
While our approach is quite radically different from the research in [8,18,29,39,41], we believe that there might be interesting synergies between our mutations and the way in which they model the assumptions and perceptions of the human users, which we plan to investigate in future work.
Conclusions
At the end of this long paper, let us take stock, and summarize the main points and discuss some future work. Our approach allows us to treat humans as first-class actors in security ceremonies as they should be considered to be, since it is not enough to take the “black&white” view of security protocol analysis, in which there is a Dolev–Yao attacker (the black agent) against a set of honest agents (the white agents). It is namely not enough to model human users as “honest processes” or as attackers, because they are neither, and we need to model the behavior of human users of ceremonies as “shades of gray”. Our approach allows us to model, by means of mutations, some of the ways humans interact with the other agents, their behavior and the mistakes they may make, independent of attacks and, in fact, possibly independent of the presence of an attacker since attacks may occur even without the presence of an attacker.
Our approach does, however, come at a price, explicited mainly by the large amount of mutations that are generated and by the need for the security analyst to intervene with a manual inspection to check for potential false positives or in case Tamarin times out. While the latter is a common problem in the analysis of security protocols/ceremonies, where many of the tools do not terminate as the search space is infinite in the presence of a Dolev–Yao-style attacker, the former is an issue akin to what happens in mutation-based testing. As future work, we plan to investigate how to improve the efficiency of our approach by
reducing the number of generated mutated models (e.g., by identifying isomorphic models), automatically or at least semi-automatically checking whether attacks are real or not (thereby reducing the effort required of the security analyst in the green box in Fig. 1).
We are also investigating how to link our formal analysis to mutation-based testing by generating test cases out of the attack traces, which will allow us to test concrete implementations of the ceremonies.
In Section 7.4, we have discussed how to exploit the results of the analysis of a security ceremony carried out by means of X-Men and in there and in the rest of the paper, we have already mentioned a number of directions for future work. We believe that the most interesting ones are the following ones.
Our current mutations are restricted by a number of constraints that we have imposed to be able to manage them efficiently, such as constraints on the types and formats of the messages. We plan to improve them by weakening of some of these constraints, say to consider other controlled notions of “sendable” message or the case in which the right message is composed in a wrong format.
We also plan to extend X-Men’s library of behavioral patterns with other mutations, which will hopefully allow us to identify novel vulnerabilities in the ceremonies that we considered here and in the ones that X-Men will be applied to in the future. For instance, in this paper we do not model the fact that a human might forget some information as we follow the standard approach in which the knowledge of the agents increases monotonically, be they “machines” or humans. Forgetting could be modeled by a fifth human mutation, which would remove terms from the current knowledge of a human and thus limit the number of messages that the human can send at that stage. For example, this could be useful to model ceremonies for password recovery.
In addition to improving the four mutations considered here and to considering new mutations, we believe that it will be interesting to formalize complex combinations of mutations, in the spirit of the add&replace that we employed in this paper. To that end, it will be useful to attempt to prove compositionality results, e.g., showing under which conditions the analysis of composed mutations can be decomposed into the simpler analysis of the individual mutations (similar to what can be done for the analysis of composed security protocols, which, in some cases, can be split into the simpler problem of the analysis of the individual component protocols [36]).
As we mentioned in the introduction, security ceremony analysis is a discipline that is still in its childhood. Our mutation-based approach is, in our opinion, very promising and can investigate scenarios that were not possible before, but there are still a number of open problems, including the more philosophical question if mutations are the optimal way to go. They are possibly not, but they do help model the behavior of human users, and we are currently working with colleagues expert in psychology to investigate if their behavioral studies can be of help. After all, behavioral mutations are investigated commonly in psychology studies (as well as in other disciplines such as epigenetics).
To tackle security ceremonies in full there is thus much more work that needs to be carried out. Other interesting extensions of this work, and of security ceremony analysis in general, are the ones that consider additional abilities of the attacker (e.g., as in [5]) or that consider explicitly the content of messages instead of handling them symbolically as we did here (similar to the case of cyber-physical systems, where message integrity is one of the crucial security properties [30]).
Finally, we plan to consider other, even more complex, case studies, ideally starting from real ceremonies’ models or implementations.
