Abstract
We propose a formal and automated approach that allows one to (i) reason about vulnerabilities of web applications and (ii) combine multiple vulnerabilities for the identification of complex, multi-stage attacks. We have developed WAFEx, an automatic tool that implements our approach and we show its efficiency by applying it to real-world case studies. WAFEx was able to generate, and exploit, previously unknown attacks.
Keywords
Introduction
Context and motivation. A number of different approaches have been proposed to test the security of web applications (web apps, for short), ranging from vulnerability assessment [46] and penetration testing [10], which are the two main approaches that security analysts typically undertake when assessing the security of a web app and other systems under analysis (see also [13,22,24,39]), to the more formal model-based testing [23,45] that has been steadily maturing into a viable alternative and/or complementary approach.
During a vulnerability assessment, automatic scanning tools are used to search for common vulnerabilities of the system under analysis. However, it is well known [20] that state-of-the-art automatic scanners do not detect vulnerabilities linked to the logical flaws of web apps. This means that even if a vulnerability is found, no tool can link it to logical flaws leading to the violation of a security property. The result of the vulnerability assessment is thus used to perform a second and more complicated step: during a penetration test (pentest), the security analyst defines an attack goal and manually attempts to exploit the discovered vulnerabilities to determine whether the attack goal he defined can actually be achieved. A pentest is meant to show the real damage on a specific system under analysis resulting from the exploitation of one or more vulnerabilities. Consider the following example, which is simple but also fundamental to understand the motivation for the analysis that we propose.
Trustwave SpiderLabs found a SQL injection vulnerability in Joomla! [28], a popular Content Management System (CMS). In [43], Trustwave researchers show two things: the code vulnerable to SQL injection and how the injection could have been exploited for obtaining full administrative access. The description of the vulnerable code clearly highlights an inadequate filtering of data when executing a SQL query. The description of the damage resulting from the exploitation of the SQL injection shows that an attacker might be able to perform a session hijacking by stealing session values stored as plain-text in the database. The result of this analysis points out two problems: Joomla! is failing in (1) properly filtering data used when performing a SQL query and (2) securely storying session values. Problem (1) could have been identified by vulnerability scanners (e.g., sqlmap is able to identify the vulnerability) and a static analysis tool might be able to identify problem (2). However, we are not aware of any automatic tool that is capable of combining the two vulnerabilities in order to compromise the security of the web app through a multi-stage attack. Moreover, the need to reason about multi-stage attacks is crucial when assessing the threats of vulnerabilities: combining two (or more) vulnerabilities whose severity is medium might result in a highly critical issue that needs to be addressed immediately.
To the best of out knowledge, the analysis of such a combination of vulnerabilities is nowadays effectively carried out only by a manual pentesting session. However, manual pentesting relies on the security analyst’s expertise and skills, making the entire pentesting phase easily prone to errors. An analyst might underestimate the impact of a certain vulnerability leaving the entire web app exposed to attackers. This is why it is important to develop new approaches capable of automatically identifying complex attacks to web apps.
Contribution. Our contributions are two-fold. First, we propose a formalization to represent the behavior of the vulnerabilities of web apps that are listed as most dangerous by the OWASP Top 10 project [31], which, at the time of writing, are: SQL injection (SQLi), Cross-Site Scripting (XSS), Cross-Site Request Forgery (CSRF) and file-system related vulnerabilities. In particular, we show how the canonical Dolev–Yao (DY) attacker model [18] can be used to search for multi-stage attacks where multiple vulnerabilities are combined together to violate security properties of web apps. A number of formal approaches based on the DY attacker model have been developed for the security analysis of web apps, e.g., [1,4,11,16,37,47], but combinations of multiple vulnerabilities have never been taken into consideration by formal approaches before. It is crucial to point out that our approach does not search for payloads that can be used to exploit a particular vulnerability, but rather we analyze the security of web apps by exploiting multiple vulnerabilities that lead attackers to violate a security property of a web app. Our approach can successfully exploit and combine vulnerabilities related to SQLi, XSS, CSRF and file-system access, but the approach is general enough to be, fairly easily, extended to cover more classes of vulnerabilities.
Second, to show that our formalization can effectively generate multi-stage attacks where multiple vulnerabilities are exploited, we have developed a prototype tool called WAFEx (Web Application Formal Exploiter [ 49 ]). WAFEx follows the canonical model-based testing approach (model creation, model analysis, concretization of abstract attack traces, testing) and is capable of automatically generating different attack traces that violate the same security property. This is illustrated by the main boxes in Fig. 1, which we will discuss in detail in Section 4, where we explain all the seven steps of the workflow of our approach and thus of our tool.
WAFEx is capable of generating, and exploiting, complex multi-stage attacks that, to the best of our knowledge, no other state-of-the-art-tool for the security analysis of web apps can find in their entirety. We tested WAFEx on well-known vulnerable web apps used to test the skills of pentesters and WAFEx was able to identify all the vulnerabilities covered by our formalization. We then applied WAFEx to three real-world case studies: a fresh installation of Joomla! v3.4.4, Cittadiverona (a public web app provided by Virtuopolitan S.r.l.), and WordPress v4.8.1. WAFEx was able to automatically identify attacks that security analyst have so far been able to find only by means of manual analysis. Moreover, WAFEx was able to identify previously unknown attacks to Cittadiverona, which we disclosed to the provider before writing them up in this paper.

Workflow of our approach (and of the WAFEx tool).
Organization. We proceed as follows. In Section 2.1, we provide useful background on the vulnerabilities considered here. In Section 3, we present our formalization and in Section 4 we describe the implementation of WAFEx that puts our formalization into practice. In Section 5 we discuss experimental results. In Section 6 we discuss related work, and conclude by summarizing and discussing future work in Section 7.
Attacking web apps
In this section, we give an overview of the main issues relevant to the security of web apps, focussing in particular on the different vulnerabilities that we consider here; this will provide a coherent and uniform stepping stone to reason about vulnerabilities of web apps that will also be used in Section 3.
We collect the vulnerabilities in three categories representing the three main components of web apps: database, file-system and client. Along with the vulnerabilities relevant for that component, we describe mitigation techniques to ensure a secure interaction with that component. More specifically, we define vulnerabilities that violate two security properties:
Authentication bypass: the attacker has access to a restricted area without providing valid credentials. Confidentiality breach: the attacker has access to content stored in the web app that is not meant to be publicly available.
We focus on these two specific types of vulnerabilities due to their importance to companies: based on our work experience in the security testing of web apps, authentication bypasses and confidentiality breaches are the two main issues that companies require to be tested during a traditional web app pentest.
In the following subsections, we give a brief overview of database-related vulnerabilities, file-system-related vulnerabilities and client-side-related vulnerabilities. We limit our discussion to what is relevant for the rest of this paper; more details, as well as the different techniques that have been devised to prevent these vulnerabilities, can be found in the literature that we cite.
Database-related vulnerabilities
SQL injection (SQLi) is one of the most widespread security vulnerabilities of the web: according to the Open Web Application Security Project OWASP, SQLi is the most critical threat for the security of web apps [31], and the MITRE corporation lists improper SQLi neutralization as the most dangerous mistake [14]. A variety of SQLi scanners, such as sqlmap [40] and sqlninja [41], have been proposed in recent years to search for SQLi injection points and payloads, and a number of general classifications based on the payloads of the SQLi (and the scenario in which they are exploited) have been put forth, e.g., in [26,31]. Based on these classifications, SQLi techniques can be divided into six different categories: Boolean-based blind SQLi, Time-based SQLi, Error-based SQLi, UNION Query-Based SQLi, Second-Order SQLi and Stacked Queries SQLi. Avoiding SQLi attacks is quite straightforward: developers can use sanitization functions or prepared statements, where, roughly speaking, the general idea is to not evaluate the injected string as a SQL command.
File-system-related vulnerabilities
Modern web apps often make intensive use of functionalities for reading and writing content from the web app’s file-system (i.e., the file-system of the web server that hosts the web app). Reading from and writing to the file-system are routine operations that web apps perform for many different tasks. For instance, the possibility of dynamically loading resources based on runtime needs is commonly adopted by developers to structure the web app’s source code for better and stronger re-usability. Similarly, for what concerns writing, an increasing number of web apps allow users to upload (write) content that can be shared with other users or can be available from a web browser as in a cloud service. Reading and writing functionalities are offered by most server-side programming languages for developing web apps such as PHP [35], JSP [42] or ASP [7]. Modern database APIs also provide a convenient way to interact with the file-system (e.g., backup or restore functionalities), but such APIs also increase the attack surface an attacker could exploit. Whenever an attacker finds a way to exploit vulnerabilities that allow him to gain access to the web app’s file-system, the security of the whole web app is put at high risk. Indeed, both OWASP [31] and MITRE [14] list vulnerabilities that compromise the file-system among the most common and dangerous vulnerabilities that afflict the security of modern software.1
The Top 10 compiled by OWASP is a general classification and does not include a specific category named “file-system vulnerability”; however, “Injections”, “Broken authentication and session management”, “Security misconfiguration” (just to name a few) can all lead to a vulnerability related to the file-system.
We have identified five categories of vulnerabilities of web apps that lead to compromising the file-system: Directory Traversal (a.k.a. Path Traversal), SQLi, File Inclusion, Forced Browsing (a.k.a. Direct Request) and Unrestricted File Upload. Such classification comes from searching and analyzing the Common Weakness Enumeration (CWE) database [14] for vulnerabilities related to web apps for which the common consequence is reading files or directories. CWE contains a list of common weaknesses that afflict the security of software. Each entry in the CWE database has a section called “Common Consequences” that, as the name suggests, lists the common consequences resulting from exploiting that specific vulnerability. Let us describe the three categories that are relevant for our formalizations and the attacks that we found.
SQLi. Most modern databases provide APIs that extend SQL’s expressiveness by allowing SQL code to access a web app’s file-system for reading and writing purposes. Reading APIs allow developers to produce code that retrieves content stored in the web app’s file-system and loads it in the database. Writing APIs allow developers to produce code that saves content from the database to the web app’s file-system. SQLi could then be used to exploit reading and writing APIs to access the underlying file-system [15].
File Inclusion. All programing languages for the development of web apps support functionalities for structuring code into separate files so that the same code can be reused at runtime by dynamically including files whenever required. A File Inclusion vulnerability refers to a lack of proper sanitization of user-supplied data during the creation of a file location string that will be used to locate a resource meant to be included in a web page. When the file location depends on user-supplied data, an attacker can exploit it and force the inclusion of files different from the ones intended by the developers. An attacker could exploit File Inclusion to access arbitrary resources stored on the file-system and to execute code.
Unrestricted File Upload. A widespread feature provided by web apps is the possibility of uploading files that will be stored on the web app’s file-system. An Unrestricted File Upload vulnerability refers to a lack of proper file check when a web app allows for uploading files. The consequences can vary, ranging from complete takeover with remote arbitrary code execution to simple defacement, where the attacker is able to modify the content shown to users by the web app.
Modern web apps deliver content that takes advantage of the technologies provided by client-side web browsers. We now describe the improper use of two client-side technologies employed to improve the way users interact with web apps: (i) client-side scripting is used to deliver an interactive experience that resembles the one of a stand-alone app, (ii) HTTP Cookies are used to circumvent the stateless nature of the HTTP protocol, thus allowing a state to persist when a user interacts with the web app. While the vulnerabilities related to the database and the file-system are performed by an attacker who is directly targeting the web app by exploiting a false sense of trust by the web app on the data received by users, the vulnerabilities related to the client side involve the interaction with a client where the attacker exploits a false sense of trust by the client on the data received from a web app.
Cross-Site Scripting (XSS). XSS vulnerabilities are a type of injection vulnerabilities that force a web app into sending malicious code that is then executed by web browsers; thus, the web app hosts the payload while the main target is the web browser. For an attack to be successful, a user has to visit an infected page containing malicious JavaScript, which is then executed by the web browser of the user.2
JavaScript is the most widely used language for exploiting XSS since it is widely supported by all modern web browsers, but any supported client-side language could be used.
A Reflected XSS happens when a web app implements a dynamic page that takes user-supplied input and simply displays it in a web page. User supplied input might contain malicious client-side code that is thus executed by the browser.
In a Stored XSS the attacker sends a request containing malicious client-side code to a web app, the web app stores (typically in the database) the value received in the request, and then, when other users are browsing the web app, the malicious content is retrieved and delivered to users.
A DOM-Based XSS happens when the attacker’s payload is processed by the client-side code running in the victim’s browser, thus no requests are sent to the web app and the payload affects the Document Object Model of the web page.
Cross-Site Request Forgery (CSRF). In CSRF, the attacker takes advantage of another user’s session by forcing that user into performing arbitrary HTTP requests to the web app. HTTP Cookies are automatically managed by web browsers and whenever a new request is performed to a web app that generated a cookie, the browser sends the cookie along with the request. A CSRF attack exploits the trust of a web app believing that the request it received was indeed willingly generated by the user who performed the request.
The modeling language used in our approach is ASLan++ [8,48], the formal and typed security protocol specification language used by the AVANTSSAR Platform [4] and the SPaCIoS Tool [47], but in fact our approach is general and, mutatis mutandis, it could be used with other specification languages and/or other reasoners implementing the DY attacker model. We have chosen ASLan++ because of three main characteristics. First, it is a fully formal language, which saves us from introducing an abstract formal language to give the intuition for our models and then use a specification language for the actual models. Second, and perhaps even more importantly, ASLan++ was designed to be close to specification languages for security protocols and web services, but also to procedural and object-oriented programming languages, so that it can be employed by users who are not experts of formal protocol/service specification languages. In fact, ASLan++ was developed in collaboration with engineers at IBM, SAP and Siemens to meet their modeling needs and abilities, and ASLan++ and the tools have been embedded in some of their products. Third, ASLan++ is close, in spirit albeit not always syntactically, to other role-based languages used for formal analysis of security protocols and services, such as the Applied Pi calculus and BPMN, for which translations to ASLan++ exist.
A complete description of the ASLan++ language can be found in [8]. Here, we only discuss the features of ASLan++ that are needed to understand the ASLan++ code of our formalizations, starting from the fact that ASLan++ code is written with typewriter fonts and ASLan++ keywords are written in boldface. Similar to object-oriented languages, an ASLan++ specification consists of a hierarchy of entity declarations, which are similar to Java classes or roles in security protocol specification languages. The top-level entity is typically called
ASLan++ supports variables (capital letter symbols), constants (lower case symbols), and functions and predicates that can be distinguished by the return type, respectively message and fact (described afterwards).
An entity is composed by two main sections:
The body of an entity contains three different types of statements: assignments, message send and message receive.
Assignments are of the form
Sending and receiving of messages are expressed in Alice-and-Bob notation:
It is possible to use different kinds of channels:
The section
The
The two statements
There are various data types in ASLan++, e.g.:
ASLan++ supports facts retraction with the keyword
In addition to
In ASLan++ the term intruder is used instead of attacker, thus the letter
Concrete examples of ASLan++ formalizations will be given in the next sections, where the syntax summarized here will be put at work. We will also explain other ASLan++ keywords and notations as the need arises.
We now describe how we formally represent the behavior of a vulnerable web app and how the DY attacker model can successfully exploit vulnerabilities of web apps. It is important to remark two things. First, our formalization can be carried out in a black-box scenario, i.e., the security analyst does not need to have any knowledge of the source code of the web app in order to create a formal model of it. Second, in order to keep the model of a web app simple and avoid state-space explosion, we do not make an explicit distinction between the server-side and the client-side parts of a web app; this is not a limitation since, as will be clear later on, our formalization is capable of dealing with vulnerabilities such as Cross-Site Scripting which affect the client-side part of a web app.
Our approach can successfully exploit and combine vulnerabilities related to: SQL injection, File-system access, Cross-Site Scripting (reflected and stored) and Cross-Site Request Forgery.4
Although we focus on these vulnerabilities, the approach that we propose is general enough and it can be, fairly easily, extended to cover more classes of vulnerabilities. In order to do so, one has to research the main aspects that characterize the class of vulnerabilities that needs to be represented and include those aspects into the model of the web app similarly to what we have done in this paper.
the web attacker, which interacts with the web app and the honest browser,
the file-system, which interacts with the web app and the database for reading and writing content,
the database, which interacts with the web app and the file-system, providing a means to query a database,
the web app, which defines the interaction with the web attacker, the file-system, the database and the honest browser,
honest browser, which interacts with the web app and the web attacker,
In general, the web app manages the session handling and the interactions between the entities, and the web attacker formalizes the DY attacker. The other entities allow us to capture the three main components of web apps: honest browser allows us to formalize the client, file-system allows us to consider file-system-related attacks, database allows us to consider injection attacks.
In the following, we first define the data types and the communication model we adopt in our formalization, and then we describe the main entities of our formalization.
As already mentioned in Section 2.2, in ASLan++ data types can have subtypes. This is particularly useful to keep the analysis simple and efficient. In this subsection, we define the subtypes we use in our formalization: Listing 1 defines the subtypes

ASLan++ code of the subtypes used in our formalization.
We define messages, the basic communication units, as follows.
Messages consist of variables V, constants c, concatenation
In this paper, we need not distinguish between different kinds of encrypted messages, but we could do it by following standard practice. Here we do not even need to consider explicitly encrypted messages, but we add them for completeness.
We define that
As illustrated in Fig. 2, for the communication model of our formalization, we assume:
that the honest browser and the web app communicate over an authentic and confidential channel, that the web app, the file-system and the database are on the same physical machine i.e., no attacker can read or modify the communication between them.
These assumptions derive from the fact that we are not interested in finding attacks against the communication between honest browser and web app but rather we want to exploit vulnerabilities of the web app — this is not a limitation as our formalization could be quite straightforwardly extended to include attacks to the communication channel. Furthermore, the file-system and the database entities are not real network nodes and thus no attacker can put himself between the communication, i.e., man-in-the-middle attacks are not possible.

The communication model between the honest browser, the web app, the file-system and the database.
In order to express this assumption in ASLan++, we formalize the communication between the honest browser and web app by using a confidential and authentic channel, represented in ASLan++ by means of the
The communication between web app, file-system and database does not involve the exchange of any message (as we assumed that they are on the same physical machine). We represent the communication between these entities by means of facts and facts retraction. Every operation that can be performed on either the database or the file-system is associated to a fact. The web app uses a fact associated to an action (e.g., perform a query, read a file) and, once the operation is completed, the entity entitled to execute that operation (the database or the file-system) uses fact retraction to remove the fact and restore execution to the web app.
The web attacker that we propose can be reused in every specification and is based on the canonical model by Dolev and Yao [18], which defines an attacker that has total control over the network but cannot break cryptography: he can intercept messages and decrypt them if he holds the corresponding keys for decryption, and he can generate new messages from his knowledge and send messages under any agent name. Message generation and analysis are formalized by derivation rules, expressing how the attacker can derive messages from his knowledge. Our web attacker originates from two fundamental aspects of our work: (1) as stated, we are not interested in generating the payloads that will exploit vulnerabilities, but rather we want to represent that a vulnerability can be exploited and what happens when it is exploited, and (2) we want to make the models as simple as possible so as to avoid state-space explosion when carrying out the analysis with the model checker.
We do not explicitly represent payloads since we are not interested in finding the right payload; we thus introduce a constant

Horn clause representing an attacker’s attempt to exploit a vulnerability.
Our approach allows the DY attacker to exploit the following attacks:
SQLi for extracting the content of the database,
SQLi for creating a tautology,
SQLi for reading from the file-system,
SQLi for writing to the file-system,
SQLi for inserting new content into the database,
file inclusion for reading arbitrary content from the file-system,
remote code execution by uploading an arbitrary malicious file,
XSS (stored and reflected) for redirecting a user,
XSS (stored and reflected) for stealing the user’s knowledge,
CSRF for forcing the user into performing a request chosen by the attacker.
We now give the formalization of the file-system entity that can be used in any specification when searching for attacks related to file-system vulnerabilities of web apps. The file-system is always actively listening for incoming requests. As already stated, we assume the file-system to be on the same physical machine of the web app (Fig. 2), thus no attacker can put himself between the communication (i.e., man-in-the-middle attacks are not possible).
Our formalization aims to abstract as many concrete details as possible, while still being able to represent the exploitation of file-system vulnerabilities, and so we do not represent the entire file-system structure but rather formalize messages sent and received along with reading and writing behavior. This allows us to give a compact formalization so as to avoid state-space explosion problems when carrying out the analysis with the model checker. The ASLan++ code representing the file-system entity is given in Listing 9 — note that, from now on, to ease readability, long specifications are not given in-line but rather in the appendix.
File-system content
To exploit vulnerabilities related to reading and writing operations, we need to represent the files available in the file-system. We represent the existence of a file by means of a set of messages formalized by the constant
File-system operations
We consider only reading and writing operations, for which we use facts
Reading and writing behavior
Before detailing the formalization of reading and writing behaviors, we recall that the goal of our approach is to test the security of a web app and thus we do not need to consider access control policies/models for the file-system as they are external to the web app. Hence, we assume that every file that is in the file-system can always be read and that every writing operation will always succeed.
The file-system defines two variables
We need not define a malicious writing operation since we assumed a writing operation to always succeed due to not considering access control policies.
Whenever the fact
Whenever the fact
Finally, the file-system defines the behavior in case of a malicious attempt at accessing the file-system where both
We now give the formalization of the database entity that can be used in any specification when searching for SQLi attacks in web apps. Just like the file-system, the database is always actively listening for connections and the communication with it is assumed secure (i.e, man-in-the-middle attacks are not possible, cf. Fig. 2).
We aim at giving a formalization that can be used in any specification and that is both compact, to avoid state-space explosion problems, and general enough not to be tailored to a given technology (e.g., MySQL or PostgreSQL). Hence, we do not represent the entire database structure, the SQL syntax nor access policies specified by the database. Rather, we formalize messages sent and received, the tables of the database and queries. The ASLan++ code representing the database behavior is given in Listing 10.
Database content
The database is defined as a set of sets, represented by the constant
Database operations
Since we do not consider the SQL syntax, we explicitly define the type of queries supported by our formalization, for which we use the following facts:
The behavior of queries
The database entity formalizes a generic database so that this entity can be reused in every scenario where a web app interacts with a database. To formalize the behavior of queries, the database entity defines, within the symbols definition (line 10 of Listing 10), variables for: a table (line 10), tuples (line 11) and a file (line 12). We represent the database entity to be always ready to execute new queries (line 15). More specifically, we define the database entity with six
SQLi for extracting the content of the database,
SQLi for creating a tautology,
SQLi for reading from the file-system,
SQLi for writing to the file-system,
SQLi for inserting new content into the database.
When a query needs to be executed (line 18), the database verifies if
When an insert query needs to be executed (line 44), the database can perform a legitimate insert query (line 46) or one of the malicious behaviors that we described previously (lines 50–52). In case of a legitimate insert query, the database inserts the value
When a delete query needs to be executed (line 57), the database can perform a legitimate delete (line 59) or one of the malicious behaviors that we described previously (lines 63–65). In case of a legitimate delete query, the database deletes
When an update query needs to be executed (line 70), the database can perform a legitimate delete (line 72) or one of the malicious behaviors that we described previously (lines 77–79). In case of a legitimate update query, the database deletes the old tuple
When a query for reading from the file-system needs to be executed (line 84), the database can perform a legitimate query for reading from the file-system (line 86) or one of the malicious behaviors that we described previously (lines 89–91). In case of a legitimate query for reading from the file-system, the database treats the value
When a query for writing to the file-system needs to be executed (line 96), the database can perform a legitimate query for writing to the file-system (line 98) or one of the malicious behaviors that we described previously (lines 102–104). In case of a legitimate query for writing to the file-system, the database treats the value
The web app
The web app is a node of the network that can send and receive messages. In our formalization, the web app can communicate with the honest browser by means of the HTTP protocol and with the file-system and database. The file-system and the database entities do not depend on a specific scenario and thus they can be reused in every model. The web app formalization, on the other hand, does depend on the scenario being modeled and thus it is not possible to provide a single entity that can be reused in every model. However, we can give the skeleton of a specification and a series of guidelines on how to represent the web app’s behavior for testing the interaction with the client, the file-system and the database.
The HTTP protocol. HTTP is the base communication protocol for interacting with a web app. We follow the same approach that we adopted for the file-system and the database entities, and use
Client communication. An HTTP request (and response) header comprises different fields that are needed for the message to be processed by the server and the browser. In our formalization, we do not need to represent all the information of a real request (or response) header as they are not relevant for the analysis. We introduce two uninterpreted functions two variables (sender and receiver), a constant for the requested page, a concatenation of variables for the parameters, and a concatenation of variables for the cookies.
For an HTTP response, we represent:
two variables (sender and receiver);
a constant for the response page,
a concatenation of constants, variables and uninterpreted functions for the response body, and
a concatenation of variables for the cookies.
File-system and database communication. As already stated in Section 3.4, whenever the web app has to read content from the file-system, it uses the predicate
Similarly, when the web app has to perform a SQL query on the database, it uses one of the six predicates representing a database query supported by our formalization (see Section 3.5) and waits until they are retracted. The result of executing a file-system or database operation is stored in a shared variable
Sessions. To circumvent the stateless nature of HTTP and experience a stateful interaction with a web app, developers make use of sessions to bound one request to another (HTTP cookies as described in Section 2.1.3). In order to represent sessions, we introduce a set called
Remote code execution. Our formalization can represent scenarios where the attacker is able to write arbitrary files to the file-system, which might lead to arbitrary remote code execution. As described above, the model of a web app is represented as a sequence of

ASLan++ code representing a remote code execution example.
Our formalization can address attacks involving the interaction of the DY attacker with an honest browser. More specifically, as we stated in Section 3.3, we aim at representing XSS attacks (stored and reflected) and CSRF attacks that require the attacker to interact with an honest browser. In the following, we thus define our representation of an honest client/browser.
The honest browser represents the basic functionalities provided by a web browser. More specifically, it gives us the possibility of executing arbitrary HTTP requests along with an internal knowledge (represented by the set
The ASLan++ code representing the behavior of the honest browser is given in Listing 11. Within the symbols definition, the honest browser defines a variable used to represent a page (line 3), a variable for representing HTTP parameters (line 4), a variable for representing a cookie (line 5), a variable for representing the body of a response (line 6) and a variable for representing a fresh nonce (line 7). Like the other entities, the honest browser is also actively ready to communicate with the web app (line 10). The behavior of the honest browser is formalized by two
To perform a general request (line 12), the honest browser initializes a fresh nonce variable
Finally, in order to represent CSRF attacks, the formal model of our honest browser reacts to messages sent directly by the attacker. More specifically, the attacker can force the honest browser into performing any HTTP request by simply sending that request to the honest browser. Whenever the honest browser receives a request from the attacker, it will blindly execute it, thus representing the exploitation of a CSRF attack (line 24). In this case, the honest browser creates a nonce to ensure a fresh communication (line 25) and non-deterministically decides the value for the variable

The MSCs of the Cittadiverona case study.

(Continued.)

(Continued.)

(Continued.)

(Continued.)

(Continued.)
We now show how our formalization can be used to create the model of Cittadiverona, which is a web app developed and maintained by Virtuopolitan S.r.l., a communication agency focused on the development and managing of touristic web apps. Cittadiverona has been active since 2005 and is now the main point of reference for locals and foreigners who are looking for information about the many events offered by different organizations in the city of Verona (Italy) and its surroundings. It currently has 969.000 registered users and around 15 new events are posted every week.
Cittadiverona is depicted in Fig. 3 as a series of Message Sequence Charts (MSCs) in which there are four entities: honest browser, web app, file-system and database. Note that we follow standard notational conventions: constants begin with a lower case character (e.g.,
Cittadiverona is a web app based on the Joomla! CMS that over the years has gone through many changes that were required to maintain the web app functioning and meet new requirements. Some of the changes made to Cittadiverona allowed it to be highly ranked on Google. Cittadiverona provides the following features:
a public registration page that allows users to create an account on the web app (Fig. 3(i)); a public HTTP login page that allows users to log in the web app by providing username and password (Fig. 3(j)); a public page that users can use to query the database looking for events — users can search events based on different filters, such as description, time period, location, etc. (Fig. 3(k)); a restricted page where only logged in users can submit new events — once the user sends a new event, the administrators can view and edit it (Fig. 3(l)).
Furthermore, Cittadiverona provides administrative functionalities that are accessible only to specific users identified as administrators:
a page where administrators can login (Fig. 3(a));
a backup page that can be used to save the entire content of the database and the file-system (Fig. 3(b));
a page where administrators can view events (Fig. 3(c)) and edit events (Fig. 3(d));
a user-management area where users can be created (Fig. 3(e)) and edited (Fig. 3(f));
a file-management area that can be used to upload new files (Fig. 3(g)) and read from the file-system, i.e., retrieve files (Fig. 3(h)).
The first phase of our analysis was to create a model of the Cittadiverona case study. We then used our Model Creator Plugin to create the skeleton of the web app (see Section 4.1) and we manually refined the skeleton to reflect the functionalities offered by Cittadiverona. Listing 13 shows the resulting ASLan++ specification, which we describe in the following subsection.
The specification
The symbols used in the formalization of Cittadiverona are given in lines 3–8 of Listing 13. Administrators are allowed to log in using a dedicated web page
Administrators can perform a series of actions. In particular, they can show events saved in the database (line 30). The web app receives a request for page
Administrators can also edit events. The web app receives a request for page
Administrators can also upload new files to the file-system. The web app receives a request for the page
Administrators can also read content from the file-system. The web app receives a request for the page
Administrators can create new users (line 38) and edit user details (line 45). Creating a new user is achieved by sending a request to page
Finally, administrators can retrieve the entire database and file-system as part of a backup functionality. The web app receives a message for the page
Registered users can log in by sending a request to the page
Normal users can register to the web app by sending a request to the page
Logged in users can insert new events in the database by sending a request to the page
Users that are not logged in the web app can query the database to search for events. The web app receives a request for the page
Finally, our specification of the Cittadiverona web app handles two additional requests for allowing remote code execution as formalized in Section 3. The first requests allows the attacker to access the entire file-system (lines 127–128) and the second allows the attacked to access the entire database (lines 130–131) provided that he managed to upload a malicious file identified by
Security properties
The last component of the formalization of a web app, is the enumeration of the security properties (or goals) that should be verified. In particular, as we discussed in Section 3, we are interested in defining security properties that are related to the exploitation of authentication bypass and confidentiality breach vulnerabilities. An authentication bypass represents the possibility for the attacker to access some part of the web app that should be protected with some sort of authorization mechanisms. A confidentiality breach represents the possibility for the attacker to obtain information that is “leaked” from the web app. Such “leakage” can happen from either the file-system, the database of the honest browser.
We use the LTL “globally” operator

Confidentiality goal for file inclusion.
We have implemented, using the Python3 language, a prototype tool called Web Application Formal Exploiter (WAFEx [
49
]) to support our formal approach and show that it can indeed be used to generate and exploit attacks that so far security analysts were able to find only by manual inspection. As depicted in Fig. 1, WAFEx implements a model-based testing approach that consists of two main phases:
a model creation phase, in which the security analyst creates a model of a web app, and a concretization phase, in which the model is analyzed and tested on the real web app.
We applied WAFEx to a number of case studies that are discussed at length in Section 5. Before we do that, we describe the two phases in more detail.
Model creation
The model creation phase implemented in WAFEx consists of a plugin for the Burp Proxy [36] (Burp, for simplicity) that helps the security analyst in the creation of a specification in ASLan++. As described in Section 3.6, we represent a web app at the HTTP level. Hence, in order to create the model of a web app, the security analyst does not need to have any knowledge of the source code of the web app. We tried to automate the model creation phase in WAFEx as much as possible, but, as is typical in model-based testing approaches, the model creation requires some knowledge of the security analyst on how the web app works. More specifically, the way a formal model is created follows two main steps:
The security analyst records an HTTP trace of requests and responses. This trace will represent how a client interacts with the web app and is automatically translated into a formal model of the web app. The security analyst completes the formal model automatically generated in step one and defines how the web app interacts with the database and with the file-system. This information cannot be automatically extracted from the HTTP requests and responses, so the security analyst has to fill the gap. In order to do so, the security analyst needs to use his experience and perform educated guesses on whether or not a specific request is interacting with the database and/or with the file-system.
It is quite unfeasible to determine how difficult it can be to fill the gap and complete the model of a web app. Such an evaluation would require WAFEx to be used by a large number of security analysts so as to collect feedback on the model creation phase. Such an experimental evaluation is beyond the scope of this paper. Still, we can provide results based on our own experience on using WAFEx. We performed many different experiments in order to test the capability of WAFEx and this required the creation of several different models performing several different operations. Based on this, we can say that completing a formal model was rather easy as the security analyst actually does not even need to have any kind of security knowledge. This phase does not involve the identification of payloads and does not require attacking the web app. All that is required is to perform educated guesses on how the web app works, a task that, we believe, can be carried out by any modeler who has a basic knowledge of web app development. We now describe how our plug-in can help in the creation of a formal model.
The security analyst records an HTTP trace by using Burp ( 1 in Fig. 1) and with the Model Creator Plugin [ 50 ] generates an ASLan++ skeleton of the web app along with a concretization file ( 2 in Fig. 1). The skeleton of the ASLan++ model is created following the formalization we presented in Section 3, while the concretization file is meant to tie together abstract requests in ASLan++ with details needed to perform the real requests on the web app. In order to do so, we performed some minor changes to the ASlan++ formalization we described in Section 3. More specifically, we performed the following two changes:
Every HTTP request is associated with a progressive constant
Instead of having one constant
The concretization file is represented in the JSON data format and its structure is shown in Listing 5. For every request in the ASLan++ model, an identification tag is created in the concretization file (line 2). The details of a request are:
the HTTP method used to perform the request (line 3),
the real URL of the request (line 4),
the parameters used in
the parameters used in
the cookies that are also represented as a JSON object (line 9) mapping an abstract cookie with the corresponding pair consisting of key and value (line 10).

JSON structure of the concretization file.
Our implementation of the Model Creator Plugin provides a graphical user interface (Fig. 4) that can be used to edit the skeleton of an ASLan++ model and the concretization file. The security analyst can thus further refine the specification to include details such as the interaction with the database or the security property to be tested. In particular, referring to Fig. 4, the user interface of the Model Creator Plugin provides: a table collecting the requests to translate ( 1), a panel showing details of a selected request ( 2), a file selector for specifying the database file ( 3), two tabs for ASLan++ and the concretization editors ( 4), and the actual editor that also provides syntax highlight ( 5).

WAFEX model creator: main interface area.
The security analyst interacts with WAFEx ( 3 in Fig. 1) to start the testing of a web app. WAFEx makes use of the model checker CL-AtSe [44] to analyze an ASLan++ model ( 4 in Fig. 1) and, in case an attack is found, CL-AtSe generates an Abstract Attack Trace (AAT) represented as a MSC ( 4 in Fig. 1). An AAT shows the requests that, when executed, violates the security property defined on the model of the web app. However, since the model of the web app represents an abstraction of the real web app, it is required to test the AAT on the real web app to ensure that the attack found can actually be carried out. WAFEx does so automatically by reading the AATs, along with the concretization file ( 6 in Fig. 1), and performs the attack on the real web app ( 7 in Fig. 1). WAFEx is then capable of understanding what kind of request should be performed at a given step of an AAT thanks to the additional details in the ASLan++ model that we described in Section 4.1. In case a vulnerability has to be exploited, WAFEx uses state-of-the-art tools such as Wfuzz [51] and sqlmap [40] for the generation of a malicious payload, otherwise it uses the information in the concretization file to perform the request.
Limitations
The current implementation of WAFEx is a prototype that is not yet ready for full industrial deployment, as it presents some limitations that we discuss in this subsection (but note that these limitations do not actually afflict the formal approach that we propose and the results obtained). One limitation is due to the fact that the model checking phase of our approach employs the CL-AtSe model checker, which does not allow for the generation of multiple AATs (nor do the other back-ends of the AVANTSSAR Platform). Thus, whenever a trace was found in our analysis of the case studies, we disabled the branch corresponding to the attack in the
The concretization phase also suffers from some limitations, specifically in the generation of XSS payloads. It is quite well known that automatic XSS exploitation is challenging, especially when it comes to Stored XSS. State-of-the-art scanners, such as Burp, search for XSS by injecting a predefined set of payloads and analyzing whether the payload is reflected in the responses. We have implemented the same approach in WAFEx defining a database of common XSS payloads. Whenever a certain XSS needs to be exploited, WAFEx selects an appropriate payload and tries to perform the exploitation. This approach, however, is limited in its ability of exploiting Stored XSS as there is no automatic way to verify whether or not a specific payload was successful. We believe that the generation of XSS payloads for Stored XSS might benefit from our formal representation of web apps, specifically, it will benefit from the AAT that can be used to better understand where the payload is going to be stored, thus allowing for executing multiple requests in order to find the appropriate payload. We are currently investigating how to extend WAFEx by combining the generation of XSS payloads with model checking.
One may wonder whether the approach is limited by the use of WFuzz or sqlmap. This is not the case since the multi-stage attack traces are generated before the use of either WFuzz or sqlmap. The application of WFuzz and sqlmap is only needed during the concretization phase where an attack trace is tested on the real web app. It might happen that WFuzz or sqlmap are not be able to exploit a vulnerability, but, as stated above, this does not limit the attack traces that can be generated. Moreover, payload generation is not the main focus of our research as we focus our attention on the generation of multi-stage attacks.
Experimental results
In this section, we show how our formalization can be used effectively for representing and testing attacks involving the exploitation of multiple web app vulnerabilities. During the development of WAFEx, we first applied it to some well-known vulnerable web apps to check whether WAFEx was able to identify the vulnerabilities we formalized. In particular, we applied WAFEx to: DVWA [21], WebGoat [33] and Gruyere [25], which provide state-of-the-art environments for pentesters to improve their skills and tools. WAFEx has been able to correctly identify all the vulnerabilities described in Section 2.1, which allowed us to refine the concretization phase of WAFEx to deal with the actual exploitation. After this initial testing, we took a step further and considered concrete and complex case studies in order to show that our approach can indeed be applied to real web apps but more importantly that WAFEx can be used to perform an analysis that no other tool is currently capable of. The case studies that we consider in detail are: Joomla! (Section 5.1), Cittadiverona (Section 3.8) and WordPress (Section 5.3). In particular, we applied WAFEx to Cittadiverona (
Case study: Joomla!
In Section 1, we described a case study related to a quite recent version of Joomla! (specifically, versions 3.2 through to 3.4.4) vulnerable to SQLi and how the security team at Trustwave was able to find the vulnerability and exploit it to compromise the session of an active user. In this section, we show how our approach can be used to represent such a scenario and, more importantly, how it can find the attack automatically, without the need of human interaction. In a nutshell, the SQLi vulnerability lies in the
Our formal model represents a fresh installation of Joomla!, meaning that our case study does not have content, customization nor third-parties plug-ins. We have thus created a formal model of the three main parts that one can interact with in such a basic installation: (1) the login, (2) the administrative area and (3) the components loader.
The specification
The formal model of the Joomla! case study in Listing 12 answers to two main requests: login (line 14) and component loader (line 38). Users are allowed to login via a dedicated page
Our model also represents how it is possible to access the administration page (line 33). Specifically, the web app receives a request for page
The components loader accepts six parameters that are needed to load a component in Joomla!, i.e.,
Finally, our specification of the Joomla! case study handles two additional requests for allowing remote code execution as formalized in Section 3. The first request allows the attacker to access the entire file-system (lines 46–47) and the second allows the attacker to access the entire database (lines 49–50) provided that he managed to upload a malicious file identified by
The abstract attack trace

Authentication goal for the Joomla! case study.

AAT for performing a SQLi and hijack a user’s session.
We ran WAFEx on the ASLan++ specification to test whether or not the attacker could have access to the administrative area, as formalized by the goal in Listing 6. WAFEx was able to generate one AAT (Fig. 5) that shows the same attack that was carried out by Trustwave. It shows an honest browser logging in as an administrator and an attacker performing a SQLi attack in order to steal the administrator’s session. The honest browser sends a request for page
The attacker now interacts with the web app by sending a request for the
WAFEx performed the concretization phase trying to attack the real web app and exploit the SQLi in order to hijack a session. The execution of the attack was successful and WAFEx was able to hijack the session. WAFEx first logged in as an administrator and then performed the SQLi to steal the administrator’s session cookie stored in the database. Finally, it used the stolen session cookie to start the attacker instance and log in as a legitimate administrator. This case study shows how the manual investigation performed by Trustwave in [43] can be fully automated by using WAFEx.
Case study: Cittadiverona
We already discussed our formalization for Cittadiverona in Section 3.8, so we now focus out attention to describing the results of our analysis.
The analysis of Cittadiverona
In collaboration with Virtuopolitan S.r.l., we cloned Cittadiverona into a secure environment where we were able to freely test the web app without causing any harm to the real system. We started the analysis by performing a vulnerability assessment by using common vulnerabilities scanners, in particular we used Arachni [3] and OWASP Zed Attack Proxy (ZAP) [32]. This assessment allowed us to discover a number of common vulnerabilities, which are summarized in Table 1.
However, a proper security analysis does not stop after such a vulnerability assessment but continues with a penetration testing phase that aims at understanding how the identified vulnerabilities could be used to harm the web app. This is where WAFEx plays a fundamental role by helping the security analyst in generating AATs where vulnerabilities are combined to violate a security property.
Vulnerabilities identified in the Cittadiverona case study
Vulnerabilities identified in the Cittadiverona case study
We ran WAFEx on the ASLan++ specification to test whether or not the attacker could access the database, as formalized by the goal in Listing 7.

Confidentiality goal for accessing the database in the Cittadiverona case study.
WAFEx was able to generate a total of five AATs, which we describe in the following.
Abstract Attack Trace #1. This first AAT (Fig. 6) shows a simple attack where the attacker might be able to exploit a SQLi in the login phase to directly access the entire database. In this AAT, the attacker sends to the web app a request for the page

AAT #1 for accessing the database in Cittadiverona.
Abstract Attack Trace #2. We disabled the branch that allows the attacker to force the dump of the entire database and ran the model checker again to generate a different AAT (Fig. 7). In this AAT, the attacker sends to the web app a request for the page

AAT #2 for accessing the database in Cittadiverona.

AAT #3 for accessing the database in Cittadiverona.
Abstract Attack Trace #3. We disabled the branch that allows the attacker to force the database into writing to the file-system and ran the model checker again to generate a different AAT (Fig. 8). In this AAT, the attacker sends to the web app a request for page

AAT #4 for accessing the database in Cittadiverona.
Abstract Attack Trace #4. WAFEx generated a fourth, more complex AAT, which shows how the attacker can have access to the database even when no SQLi attack is possible. To generate this AAT (Fig. 9), we removed the possibility for the attacker to perform any kind of SQLi on the database. In this AAT, the interaction starts with the honest browser sending a request for page
Abstract Attack Trace #5. We disabled the possibility for the attacker to perform any SQLi and run the analysis on the model to see if the attacker could find a way to access the entire database. WAFEx then generated the AAT in Fig. 10. The attacker logs in the web app by using credentials
It is worth noting that the message from the web app to the attacker ( 8) contains the same variables sent by the attacker ( 5) representing the new event plus the constant

AAT #5 for accessing the database in Cittadiverona.
This last AAT shows the exploitation of a Stored XSS in order to perform a session-hijacking attack on the administrator so that the attacker can log in as administrator and access the database via the backup functionality. We believe this AAT to be particularly representative of the strength of the possibilities provided by our approach. While the attack might look obvious to the most skilled penetration testers, it is important to observe that, except for WAFEx, no automatic tool is currently able to identify such an multi-stage attack and only a manual investigation can be effective.
We ran WAFEx on a Mac Book laptop (Intel i5-4288U with 8 G RAM and Python3.5). For instance, the execution time of the model-checking phase for generating the AAT in Fig. 10 was around 80s. The AAT was successfully exploited by WAFEx showing that it was indeed possible to perform the attack and thus have access to the entire database.
Case study: WordPress
WordPress is a very popular, free and open-source content management system (CMS) that is used by millions of websites. WordPress is similar to Joomla! and, very much like Joomla!, it provides functionalities that might be vulnerable to web app vulnerabilities and, in fact, in September 2017, a security research firm found a Stored XSS vulnerability in the core of WordPress version 4.8.1. WordPress comes with different user roles built in its code, including an editor role and an administrator role. Editors are allowed to log in, publish articles and perform basic operations approved by the administrators. Administrators, on the other hand, have complete and full access to every functionality provided by WordPress. The Stored XSS vulnerability can thus be exploited by users that are editors with the purpose of attacking the administrators and hijacking their sessions.
The specification
Listing 14 shows our ASLan++ specification for the WordPress case study. Editors are allowed to log in using the same approach as the one used by administrators. More specifically, editors are allowed to log in using a dedicated web page
Administrators are allowed to log in using a dedicated web page

Authentication goal for the WordPress case study.
We ran WAFEx on the ASLan++ specification to test whether or not the attacker could have access to the administrative area, as formalized by the goal in Listing 8. WAFEx was able to generate one AAT (Fig. 11) that shows the same attack that was generated for the Cittadiverona case study. This AAT shows the exploitation of a Stored XSS in order to perform a session-hijacking attack on the administrator so that the attacker can log in as administrator. The attacker logs in the web app by using the credentials

AAT for accessing the administrator area in WordPress.
The honest browser now logs in the web app as an administrator user, providing credentials
WAFEx performed the concretization phase trying to attack the real web app and exploit the Stored XSS in order to hijack another user’s session. In this case, the execution of the attack was unsuccessful, meaning that the attack was a false positive. The analysis of the real web app pointed out that the session cookie in WordPress version 4.8.1 is generated with the HttpOnly flag, thus preventing any JavaScript code from being able to access it and stealing it. We believe this case study to be significant since it shows how WAFEx can generate the same attack trace regardless of the web app. As we remarked throughout the paper, our approach does not aim at finding vulnerabilities nor finding payloads. The purpose of WAFEx is to generate complex attacks by combining vulnerabilities of web apps in order to violate a security goal. WAFEx then generates an AAT showing how the provided security goal could be violated by exploiting multiple vulnerabilities. However, whether or not the AAT generated by WAFEx can actually be exploited depends on whether or not the vulnerabilities exploited in the AAT exist and can be exploited.
Related work
To the best of our knowledge, this paper is the first attempt to show how model-checking techniques and the standard DY attacker model can be used for the generation of attack traces where multiple vulnerabilities are used to violate security properties of web apps. There are, however, a number of previous works that inspired the work we presented in this paper and that are thus worth discussing.
As we remarked in the introduction, formal model-based security testing has been steadily maturing into a viable alternative and/or complementary approach to penetration testing. Although industry were initially quite reluctant to adopt model-based security testing (and, more generally, model-based testing) due to the complexity of providing formal models (as we also discussed above), this has been changing quite rapidly in recent years: leading companies such as SAP, Siemens and UTRC have been adopting it for the company-internal analysis of their commercial products, and there are a number of examples of successful application of model-based security testing in industry as described in particular in [23,47] and at the website [29], which lists over 100 model-based security testing approaches, including a summary of the maturity of each evaluated system and the corresponding evidence measures and level. The ability of finding more complicated vulnerabilities, such as the multi-stage attacks that are the focus of this paper, will contribute to convincing industry to adopt WAFEx and other model-based security testing approaches and tools.
A combination of vulnerability assessment and penetration testing remains the leading methodology for the security analysis of web apps. The vulnerability assessment phase has received much attention and several different tools have been proposed to help the security analyst in identifying the presence of a vulnerability. On the other hand, penetration testing has typically been left to the experience of the security analyst. This is because the human component is crucial in evaluating the security of the web app. Related to the vulnerability assessment phase, we mention the main, freely available, tools that greatly support the security analyst in finding the presence of vulnerabilities in web apps: sqlmap, sqlninja, DotDotPwn, WFuzz, xss-proxy, Burp Proxy (Community Edition) and OWASP Zed Attack Proxy (ZAP). The main purpose of these tools is to find vulnerabilities by generating the payloads that can be used to exploit them. The purpose of our work, on the other hand, isn’t payload generation but the exploitation of multiple vulnerabilities so to generate multi-stage attacks. As a result, none of these tools is directly comparable to WAFEx. However, the widespread usage of these tools cannot be overlooked and thus we discuss them here.
sqlmap [40] and sqlninja [41] are tools used to find SQLis entry points. sqlmap supports many different databases, whereas sqlninja only supports the Microsoft SQL server and provides features that are specific to attacking the Microsoft SQL server that sqlmap does not provide. DotDotPwn [19] and WFuzz [51] are used for fuzz testing when searching for injection points. DotDotPwn is specifically tailored to test for directory traversal vulnerabilities, whereas WFuzz is a general purpose fuzzer that given a library of payloads and one or more injection points, performs many requests to the web app trying every payload in every injection point. On the client side, it is worth to mention the tool xss-proxy [52], which is used for advanced XSS attacks. HTTP/S proxies also play an important role in understanding the behavior of a web app. Specifically, it is worth to mention Burp Proxy (Community Edition) [36] and OWASP Zed Attack Proxy (ZAP) [32].
As we already stated, none of these tools give any clue on how a vulnerability can be used in the web app under analysis and they do not say if an attack that uses that vulnerability can actually be carried out. WAFEx, on the other hand, aims at helping the security analyst understand how vulnerabilities related to web app can violate a given security property, actually helping the penetration testing phase rather than the vulnerability assessment phase.
In [1], Akhawe et al. presented a methodology for modeling web apps and considered five case studies modeled in the Alloy [27] language. The idea is similar to our approach, but they defined three different attacker models that should find web attacks, whereas we have shown how the standard DY attacker can be used. They also represent a number of HTTP details that we do not require. Finally, and most importantly, they do not take combination of attacks into consideration.
In [11], Büchler et al. presented SPaCiTE, a formal approach for the security analysis of web apps. SPaCiTE is a model-based security testing tool that starts from a secure ASLan++ specification of a web app and, by mutating the specification, automatically introduces security flaws. SPaCiTE implements a mature concretization phase, but it mainly finds vulnerability entry points and tries to exploit them, whereas our main goal is to consider how the exploitation of one or more vulnerabilities can compromise the security of the web app.
The “Chained Attack” approach of [12] considered multiple attacks to compromise a web app. The idea is close to the one we present in this paper. However, the “Chained Attack” approach does not consider file-system vulnerabilities nor interactions between vulnerabilities, which means that with that formalization it would be impossible to represent a SQLi to access the file-system. Finally, the “Chained Attack” approach requires an extra effort of the security analyst, who should provide an instantiation library for the concretization phase, while we use well-known external state-of-the-art tools.
In [37], Rocchetto et al. model web apps to search for CSRF attacks. They limit the analysis to CSRF only but their work inspired the introduction of CSRF in our formalization. Still, we did not use their formalization as it is since we wanted to create a more comprehensive representation of the honest browser that could be used in every specification.
The methodology proposed by Armando et al. in [5,6] is focused on binding the specification of security protocols to actual implementations. The methodology starts with the definition of an abstract model (written using a role-based language) of the HTTP messages composing the security protocol. A model checker is then used to derive a counterexample violating some given security properties. The abstract messages, contained in the counterexample, are then mapped to concrete messages used to test the web app. The results differ from ours since WAFEx deals with web apps and vulnerabilities related to them rather than with the security of protocols that are employed by web apps.
In [34], Pellegrino et al. introduce Deemon, an automated testing framework to discover CSRF attacks. The approach they propose is based on an analysis of a web app to create a model that uses property graphs to represent information such as execution traces and data flows. Deemon is specifically tailored to finding CSRF and requires access to the source code of the web app, whereas WAFEx can be used without having access to the source code; moreover, we do not focus on finding any specific vulnerability but we rather combine multiple vulnerabilities to violate a security property.
In [9], Backes et al. introduce an analysis technique for PHP applications based on code property graphs. They start from the source code of a PHP web app and create a graph structure representing a canonical representation of the web app incorporating the syntax, control flow, and data dependencies. Finally, they use queries for graph databases to describe web apps vulnerable to specific vulnerabilities. They claim that their approach works with high-level, dynamic scripting languages such as PHP but it is unclear if it can be applied to non scripting and typed languages such as Java servlet. In contrast, we designed WAFEx to be as technology-independent as possible and thus we are not tied to a specific programming language or environment. Moreover, as already stated for [34] and throughout this paper, the goal of WAFEx is to find attacks that, combining multiple vulnerabilities, violate a security property of a web app.
In [2], Alhuzali et al. describe a static analysis approach that tackles web applications composed of several dynamic features with the intent of identifying vulnerabilities of web applications. Our approach can also tackle web applications composed of several dynamic features, as clearly shown by our case studies, but the main difference between the two works lays in the fact that the purpose of our approach is not to identify single vulnerabilities but to combine multiple vulnerabilities with the intent of generating multi-stage attacks.
Finally, we compare with our own preliminary work. The analysis we first introduced in [16] was limited to SQLi vulnerabilities, whereas the work in [17] introduced file-system vulnerabilities and combined SQLi and file-system vulnerabilities to show a preliminary result of how multiple vulnerabilities can be automatically combined to support the penetration testing phase. The present paper extends [16,17] by including XSS and CSRF attacks, a better concretization phase and a Burp Proxy plug-in to help the security analyst in the creation of a formal model of a web app. It also provides the detailed discussion of two real-life case studies that show that WAFEx can be used to perform an analysis that no other tool is currently capable of.
Conclusions and future work
In this paper, we have presented a formal, model-based testing approach for the security analysis of web apps. More specifically, we have formalized the most dangerous vulnerabilities of web apps (SQLi, XSS, CSRF and file-system related vulnerabilities) and shown how the DY attacker model can be used in order to find and exploit these vulnerabilities. Moreover, our approach is able to find multi-stage attacks to web apps that, to the best of our knowledge, no other tools can find, which involve the combined exploit of the vulnerabilities we have formalized. We have implemented a Burp Proxy plugin to ease the creation of a formal model for a web app and a prototype tool called WAFEx that takes an ASLan++ specification of a web app and a concretization file, generates an AAT and automatically tests the resulting AAT against the real web app. WAFEx does not find payloads for exploiting vulnerabilities but rather makes use of state-of-the-art tools (Wfuzz [51] and sqlmap [40]) in order to find the proper payload. We have applied WAFEx to real-world case studies (Multi-Stage and Cittadiverona) and we were able to generate AATs representing previously unknown attacks that no other tool for the security analysis of web apps is able to identify. The results we have presented in this paper clearly show that model-checking techniques can be used to identify complex attacks that so far only the manual analysis carried out by a penetration tester could find. The results are promising but further research is required for such an approach to be applied in a professional environment. We thus plan to further extend the approach by focusing on two main aspects: model creation and AATs generation. The creation of a model is a difficult task as it might introduce errors on the specification or might result in non-termination of the analysis. We envision to extend our model creation phase to make it fully automatic, and we are currently considering approaches for model-extraction such as JModex [30]. We also aim to improve the model-checking phase by (1) employing a model checker that is able to generate multiple traces and (2) considering new approaches such as [38] that might allow for a faster analysis. Finally, there is also some theoretical work that would be interesting to carry out. In this paper, we have considered, in quite a general way, multi-stage attacks that occur when multiple vulnerabilities are combined together to violate security properties of web apps. In future work, we plan to provide a full-fledged formal characterization of the different kinds of multi-stage attacks, mapping out the possible combinations, their relative difficulty (detailing both how complex it is for the attacker to carry out the attack and how complex it is be for our approach to detect the vulnerability) and the severity of the resulting attack.
