Abstract
Design Patterns are now widely accepted and used in software engineering; they represent generic and reusable solutions to common problems in software design. Security patterns are specialised patterns whose purpose is to help design applications that should meet security requirements. The enthusiasm surrounding security patterns has made emerge several catalogues listing up to 180 different patterns at the moment. This growing number brings an increased difficulty in choosing the most appropriate patterns for a given design problem. We propose a security pattern classification to facilitate the security pattern choice and a classification method based on data integration. The classification exposes relationships among software attacks, security principles and security patterns. It expresses the pattern combinations that are countermeasures to a given attack. This classification is semi-automatically inferred by means of a data-store integrating disparate publicly available security data. The data-store is also used to generate Attack Defense Trees. In our context, these illustrate, for a given attack, its sub-attacks, steps, techniques and the related defenses given under the form of security pattern combinations. Such trees make the pattern classification more readable even for beginners in security patterns. Finally, we evaluate on human subjects the benefits of using a pattern classification established for Web applications, which covers 215 attacks, 66 security principles and 26 security patterns.
Introduction
The World Wide Web is continuously expanding with new unstructured or semi-structured information, especially since the recent explosion of digitalisation of almost everything e.g., music, books, encyclopaedia, documents, etc. In the domain of software security, many documents, knowledge bases or papers are now publicly available to help engineers develop secure applications. These numerous digitalised resources are presented with different viewpoints (attackers, defenders, etc.), formats (text, database, etc.), abstraction levels (security principles, attack steps, exploits, etc.) or contexts (system, network, etc.). Furthermore, these different documents are meaningful at different stages of the software life cycle. For instance, the Exploit base [18] gathers scripts that can be used to test whether an application is vulnerable. In another context, the notion of security patterns, which is one of the topics of this paper, aims at providing guidelines to help design secure systems [8,24,27,36]. Security patterns are defined as reusable elements to design secure applications, which will enable software architects and designers to produce a system that meets their security requirements and that is maintainable and extensible from the smallest to the largest systems [24]. Schumacher also postulates that Security patterns relates countermeasures to threats and attacks in a given context [27].
This plethora of (often complex) documents makes increasingly harder for a designer to select the appropriate solution in a given context. Indeed, they cannot be experts in all fields and they usually lack guidance for conceiving secure software or systems.
Several authors recently focused on security patterns to better guide designers. Security patterns were organised along the following categories: by security principles [2,37], by application domains [5] (software, network, user, etc.), by vulnerabilities [1,3] or by attacks [1,34]. Despite the benefits brought by these classifications, they suffer from several limitations, which prevent their adoptions in the industry. Firstly, these classifications were manually devised, by comparing directly textual descriptions of different security concepts (patterns, principles, vulnerabilities, attacks, etc.). As these descriptions are generic and have diverse abstraction levels, the categorisation of a pattern can only be performed when there is an evident relationship with a security property. In addition, as these classifications are not deterministic (no strict definition of the classification process [2]), it becomes often delicate to upgrade them. Yskout et al. also reported that the security pattern adoption is limited possibly due to a sub-optimal quality of the documentation [38]. We indeed believe that security pattern classifications lack Navigability and Comprehensibility, which are quality criteria, proposed in [2] and respectively related to: the ability to direct a software designer among collaborative and related patterns; the ease to understand patterns by both a novice and an expert developer.
From these observations, we propose in this paper a method for classifying security patterns based on the concept of data integration, namely we combine security data coming from different sources and provide an unified view on this data. To make this classification navigable and comprehensible, we propose to automatically infer Attack Defense Trees [13], which illustrate the security pattern combinations that can be used to prevent an attack. More precisely, our contributions are as follows.
We present a data-store meta-model and a data integration method consisting of six steps, which extract data from various publicly accessible sources and store relationships among attacks, security principles and security patterns. The method breaks down security properties into sub-properties and associates them to achieve a precise classification.
Our security pattern classification is automatically derived from this data-store. For an attack, the classification expresses the pattern combinations that can be integrated in the application model to later prevent the attack from being successfully carried out.
We automatically generate Attack–Defense Trees (ADTrees) which aim at supplementing the classification with illustrations depicting, for a given attack, its (more concrete) sub-attacks, steps and techniques along with defenses expressed with security patterns. These ADTrees aim at improving the understanding of the previous classification. They can also be used as security requirement documents for threat risk modelling.
As a proof of concept, we have generated a data-store and a security pattern classification specialised to the Web application domain. The classification is composed of 215 CAPEC attacks, 26 security patterns and 66 security principles covering various security aspects. We also provide a tool to generate ADTrees. This classification and the ADTree generator are available in [21]. We employed them to evaluate, on 24 human subjects, the benefits of using our pattern classification and ADTrees with regard to the following criteria: Comprehensibility, Effectiveness and Accuracy.
The remainder of the paper is organised as follows. We present in Section 2 the related work and the motivations of our approach. Besides, we introduce some security notions and data used throughout the paper. The method used to integrate security data and to build a data-store, is given in Section 3. Section 4 shows how the pattern classification and ADTrees are automatically extracted from the data-store. We list the quality criteria met by the classification and discuss its limitations in Section 5. We evaluate the classification and ADTrees in Section 6. We also discuss about the threats to the validity of the evaluation. We finally conclude and give some perspectives for future work in Section 7.
Background
Related work
Several security pattern catalogues are available in the literature [17,28,38], gathering a total of 176 patterns. These growing catalogues make difficult the choice of the appropriate patterns for overcoming a security problem.
Many classifications were proposed in the literature to ease the pattern choice with regard to a given context. The classifications proposed in [1,30,31,34] expose pattern categories by focusing on the attacker side and attacks. This choice of categorisation seems quite interesting and meaningful as attacks are more and more known and examined by software designers. Initially, Wiesauer et al. presented in [34] a short taxonomy of security design patterns made from links between attack textual descriptions and security pattern purposes. Tondel et al. presented in [30] the combination of three formalisms of security modelling (misuse cases, attack trees and security activity models) in order to give a more complete security modelling approach. In their method of building attack trees, they linked some activities of attack trees with CAPEC attacks; they also connected some activities of SAGs (security activity diagrams) with security patterns. The relationships among security activities and security patterns are manually extracted from documentation and are not explained. Shortly after, Alvi et al. presented a natural classification scheme for security patterns putting together CAPEC attacks and security patterns for the implementation phase of the software life cycle [1]. They analysed some security pattern templates available in the literature and proposed an augmented template composed of the essential elements needed for designers. They manually completed the CAPEC attack documentation with a section named “Relevant security patterns” composed of some patterns [1]. After inspecting the CAPEC base, we observed that this section is seldom available, which limits its use and interest. Uzunov et al. introduced in [31] a classification of security threats and patterns specialised for distributed systems. They proposed a library of threats and their relationships with security patterns in order to reduce the expertise level required for the design of secure applications. They considered that their “threat patterns” are abstract enough to encompass security problems related to the context of distributed systems.
In the papers [2,5], the authors discuss some limitations exposed by pattern classifications. Alvi et al. outlined 24 pattern classifications, including security pattern classifications, and established a comparative study to point out their positive and negative aspects [2]. They chose 29 classification attributes (purpose, abstraction levels, life-cycle, etc.) and compared the classifications against a set of desirable quality criteria (Navigability, Comprehensibility, Usefulness, etc.). They observed that several classifications were built w.r.t. a unique classification attribute, which appears to be insufficient. They indeed concluded that the use of multiple attributes enables the pattern selection in a faster and more accurate manner. Bunke et al. presented a systematic literature review of the papers dealing with security patterns between 1997 and 2012. In addition, they listed a set of classification criteria and compared design pattern and security pattern classifications [5]. They finally proposed a classification based on the application domains of patterns (software, network, user, etc.). Yskout et al. also reported that the security pattern adoption is limited possibly due to a sub-optimal quality of the documentation [38].
After reviewing these classifications, we indeed believe that security pattern classifications lack Navigability and Comprehensibility. We also observed these are all manually conceived by interpreting different documents to find abstract relationships. Justifying these classifications or extending them is often difficult. Furthermore, the relations among patterns are often not given, yet we noticed that some patterns are compatible together and that others are conflicting. As a consequence, a designer may be still confused about the pattern choice. As in [1], we propose a pattern classification expressing which security patterns can be used to prevent an attack step from being successfully executed on an application (and hence an attack, even though a step is more precise). Our classification proposes a more precise and accurate mapping between patterns and attacks. It is more accurate in the sense that we translate the meaning of the patterns and attacks into smaller properties, i.e., strong points, attack steps, techniques, and countermeasures. We establish relations among these properties with respect to security principles, which identify the meaning of these relations. In addition, the classification is completed with the inter-pattern relationships found in [37]. This is why we claim that our classification is more precise. Another contribution of this paper lies in the presentation of a classification process based on data integration. This one includes six manual and automatic steps, which offer the advantage of justifying the soundness of the pattern classification and reduce the efforts required to add new patterns or attacks to the classification. Finally, we complete the classification with ADTrees making the classification readable even for novice in patterns or security.
We now compare this paper with some of our own previous work. Initially, we proposed in [22] a security pattern classification grouping patterns according to the weaknesses that they can cure. That paper can be seen as a first step toward the approach presented here. We indeed exploit the weaknesses listed in the CWE base [16] to categorise patterns. However, unlike the present paper, we did not focused on attacks or attack trees. In [20], we have completed the previous method this time to organise security patterns in relation to attacks. Attacks are collected from the CAPEC base as the first step of the method developed in this paper. Then, attacks are associated to weaknesses, themselves linked to patterns. The difference with the present paper firstly lies in the data integration process. We do not consider weaknesses but other kinds of security properties. We indeed consider that an attack is composed of techniques and sequential steps, which we associate to countermeasures. We semi-automatically groups them with a text mining approach. The resulting clusters are finally associated to patterns. These relations are strictly modelled by a new meta-model presented in this paper, which structures our data-store. We also generate ADTrees illustrating some relations of the meta-model, in particular the sequences of attack steps. Furthermore, we present an experimentation performed on 24 participants to estimate the benefits of using our classification.
Publicly accessible resources for the data integration
We present below the publicly accessible resources (documents, databases, research papers) we studied to devise a data-store for the security pattern classification generation.
Security pattern documents
We firstly recall that security patterns provide guidelines for secure system design and evaluation [8,36]. Generally, they are presented textually or with schema, e.g., UML diagrams, and are characterised by a set of structural and behavioural properties. Schumacher defines more precisely security patterns as triples
Several security pattern catalogues are available in the literature [17,28,38], themselves extracted from other papers. In these documents, a security pattern is usually characterised with its solutions (a.k.a. intents), its interests called forces and the consequences of applying the pattern to an application. The quality of a pattern and its classification can be established by means of its strong points, which are sub-properties of the pattern [10] related to its features. Strong points are manually extracted from the forces and consequences of a security pattern, given in its description.
In addition, a security pattern can be documented to express its relationships with other patterns. These properties may noticeably help combine patterns and not to devise unsound composite patterns. Yskout et al. proposed the following annotations between two patterns
“depend” means that the implementation of
“benefit” expresses that implementing
“impair” means that the functioning of
“alternative” expresses that
“conflict” encodes the fact that if both

Class layout of the security pattern “Secure Logger”.
“Secure Logger” is a security pattern example whose primary objective is to store application events in a centralised way so that it should be impossible to alter log files. Figure 1 depicts a class diagram of this security pattern. This schema implies that “Secure Logger” provides a mean of decoupling the implementation details of the logger from the remainder of the application. This corresponds to a strong point of the pattern. Its strong points are summarised below:
logs sensitive information that should not be accessible to unauthorized users;
ensures the integrity of the logged data to determine if it was tampered with by an intruder;
captures output at one level for normal operations and at other levels for greater debugging in the event of a failure or an attack;
centralises control of logging in the system for management purposes;
code must be adaptable and extensible to protect against both current and future threats;
performs all of the necessary security processing prior to the actual logging of the data, which allows management of each function independently of the others without the risk of impacting overall security.
This pattern can be implemented by means of two other security patterns [38]: “Audit Interceptor” or “Secure Pipe”. The former may be used to collect the events that are stored by “Secure Logger”. The latter may be used to guarantee that the data is not tampered with in transit to a secure store.
The Common Attack Pattern Enumeration and Classification (CAPEC) is an open database offering a catalogue of attacks in a comprehensive schema [15]. Attack patterns are descriptions of common approaches that attackers might take to attack software or systems. An attack pattern, which we refer here as documentation (to avoid the confusion with security pattern), consists of several sections. The section “Related attack patterns” shows interdependence among attacks having different levels of abstractions. The first two levels (denoted Category and Meta pattern) give attack mechanisms, the last two levels (Standard pattern and Detailed attack pattern) details attacks.
Different binary relations are given between two attacks. Among them, we noted:
a is member of/child of b: when the attack a is a refinement of the attack b,
a has member/parent of b: when the attack a is more abstract than b.
Besides, the attacks of the last two levels have specific paragraphs to describe other properties, e.g., impact, prerequisites, severity, required attacker skills, etc. Another section lists the security principles affected by the attack. But, we observed that those given often have a high level of abstraction making their interpretation too abstract as well.
The section called “Attack Execution Flow” provides a sequence of steps that has to be followed to successfully accomplish an attack. The first step is often called “Explore”; it is often followed by the steps “Experiment” and “Exploit”, themselves composed of sub-steps. All of these are sequential, which means that if a step cannot be achieved then it is assumed that the attack cannot be applied. In other terms, it seems pointless to try the next step. Every step is accompanied with a sub-section called “Security controls”, which lays down some effective security controls that should be used to prevent or to counter the attack step. Furthermore, the CAPEC base provides some techniques (or combinations of techniques ) with each step. When one technique is successfully applied, the step is satisfied.
Security principles
We briefly recall that a security principle is a desirable property, structure or behaviour of software that aims at reducing the impact and the likelihood of a threat realisation [32]. They represent an insight on the nature of close security tasks whose contexts are not taken into consideration.
Numerous works focused on security principles since the last four decades. Saltzer and Schroeder firstly proposed a set of eight best practices for system security [25], which were widely expanded to form security principles [7,14,26,32]. Most of the papers dealing with security principles reveal that a security principle has a level of abstraction; it may be the realisation of other security principles, or has subordinate principles.
Data-store
We present in this section the meta-model of the data-store we devised to store relationships among different security concepts. This data-store is then used to automatically generate a security pattern classification providing the set of patterns that can be used as countermeasures against a given attack (in reference to the security pattern definition of Schumacher [27]). Then, we present the data integration steps.
Data-store meta-model

Metamodel 1 of the data-store.

Metamodel 2 of the data-store.
Instead of finding some direct relations among attack and security patterns by reading documents, we chose to decompose the security concepts available in these documents into more detailed properties, which can be interconnected in an explicit manner.
We surveyed the literature and some attack bases [15,16,19] to list relationships among security properties. This study has confirmed to us the importance of the following associations: an attack can be documented with more concrete attacks, which can be themselves segmented into steps. These steps can be performed with techniques and can be prevented with countermeasures. All these properties and associations are modelled with entities in the meta-model of Fig. 2. Taking another viewpoint, an attack also exploits a weakness, which may be composed of several more concrete weaknesses. Some actions can be applied to reduce the impact of these weaknesses. We call them mitigations here to underline the fact that these actions differ from the countermeasures used to neutralise an attack. These other associations are modelled in the meta-model of Fig. 3.
Security patterns can be characterised with strong points, which are criteria of software engineering quality partially deduced from the section “consequences” in the pattern descriptions. In the context of security patterns, these correspond to desirable security properties. Besides, a security pattern can have relations with others patterns. Figures 2 and 3 depict these properties and relations with entities in the same way.
Countermeasures, mitigations and strong points refer to the notion of attack prevention. But directly finding relations among them is still a laborious task as these properties, which have diverse purposes, are explained with different keywords. To solve this issue, we have chosen to focus on security principles as mediators. Indeed, as introduced by Wassermann et al. in [33], security patterns are classifiable w.r.t. security principles. Here again, we consider that security principles are organised into a hierarchy, which shows the materialisation of a principle with more concrete ones. A principle hierarchy offers a lot of flexibility to reach the same abstraction level among strong points, principles, countermeasures and mitigations. Countermeasures and mitigations are often detailed security properties. We observed that gathering them into groups (clusters) often reduces the efforts required to find connections with security principles. But the cluster granularity, i.e., the size of the groups has to be correctly chosen not to set wrong associations. These last security properties and associations are identically modelled in the meta-models of Figs 2 and 3.
Both meta-models could be used to structure our data-store. We have chosen to focus on the first meta-model because it offers the possibility to store more details about attacks (decomposed into steps, techniques and related to countermeasures). A designer can follow how an attack is sequentially performed. As attack steps are associated to security principles and finally to security patterns, he or she also can obtain and select the list of patterns that are required to counter every attack step, one after the other. Hence, the relations among attacks, steps, principles and patterns offer a refinement in the pattern choice that the second meta-model does not provide.
Now that we have a data-store meta-model, the next section shows how the data integration is performed in order to extract a security pattern classification and ADTrees.
Security data are integrated into the data-store with six steps, which aim at establishing the different relations depicted in Fig. 2. Steps 1 to 5 give birth to databases, and Step 6 consolidates them so that every entity of the meta-model is related to the others as expected. The steps 1 and 6 are automatically done with tools whereas the remaining ones require some manual interventions to supervise the digitalisation of key concepts or texts.
We implemented these steps and applied them to the Web application context as a proof of concept. The tools and databases are available in [21]. The implementation is mostly based upon the tool Talend,1
We also illustrate these steps with the attack “CAPEC-34: HTTP Response Splitting”, which refers to a maliciously HTTP request that causes the production of two separate responses instead of one by a vulnerable web server. The target, i.e., the client, may interpret the second response and display maliciously-crafted contents.
We have chosen to focus on the CAPEC base to extract information about security attacks because this appeared to be the most complete base with the largest number of attacks accompanied with a lot of details (steps, techniques, risks, security controls, etc.)
We extracted attacks of the CAPEC base and organised them into a single tree that describes a hierarchy of attacks from the most abstract to the most concrete ones so that, we can get all the sub-attacks of a given attack. To reach that purpose, we rely on the relationships among attack descriptions found in the CAPEC section “Related Attack Patterns”. More precisely, by scrutinising all the CAPEC documents, it becomes possible to develop a hierarchical tree whose root node is unlabelled and connected to the attacks of the type “Category”. These nodes may also be parent of attacks that belong to the type “Meta Attack pattern” and so on. The leaves are the most concrete attacks of the type “Detailed attack pattern”. Then, for every attack, we collected from the CAPEC base (Section “Attack Execution Flow”) its steps, which may be composed of more concrete sub-steps, and for each step, the corresponding techniques and security controls, the latter referring to countermeasures.
This data extraction was automatically performed and yields a database
The attack CAPEC-34 has no sub-attacks as it belongs to the section “Detailed Attack” of the CAPEC base. The realisation of the attack is done after following three main steps called Explore, Experiment, Exploit. The third one is itself composed of two steps. Three techniques are listed to achieve the first step; the other steps are linked to two techniques each. Any of the available technique can be used to accomplish the related step.
Step 2: Countermeasure hierarchical clustering
The countermeasure number grows quickly while reading the attacks of the CAPEC base. Many of them have a close meaning though, which can be explained by the number of different contributors that added them. We hence group these countermeasures into families to later associate them with security principles.
We semi-automated this process by applying a hierarchical clustering technique of documents. We firstly used the tool KHcoder2
the tool POS Tagger (included in KHcoder) is called to sort the keywords found in the countermeasure descriptions (log, input, credentials, etc.) by their frequencies and types (noun, verb, adverb, etc.);
from the frequencies, weights are computed and scaled with the Jaccard coefficient to measure distances (a.k.a. dissimilarities) among countermeasures. The distance between two countermeasures
Afterwards, we chose to apply the method Ward, an agglomerative hierarchical clustering method [35], to semi-automatically make a hierarchy of countermeasure clusters. Ward offers the possibility to merge groups, piece by piece, instead of providing large clusters. In our case, this second solution would tend to build clusters covering too much disparate countermeasures, which would be later associated with too much security principles. Instead, Ward successively constructs levels of clusters, a level somehow expressing a level of abstraction. Its algorithm is summarised in Algorithm 1. At the beginning, Ward takes a distance matrix, here previously computed by KHcoder. Every countermeasure is encompassed into a new cluster. The algorithm merges every pair of clusters having the closest distance into a new cluster and so forth. Every time a new cluster is created, the algorithm updates the distance matrix. The distance between two clusters is calculated with the formula

Hierarchical clustering

Hierarchical clustering of 23 countermeasures into 4 clusters using KH Coder.
Finally, the level to consider in the cluster organisation (and implicitly the number of clusters to keep) is selected manually, as the choice of the number of clusters is supervised in the domain of natural languages [29]. The level can be selected with a dendrogram. Figure 4 illustrates an example of dendrogram, obtained with 23 countermeasures. At the lowest level, the dendrogram shows all the countermeasures and its top level represents one final cluster. The choice of the number of clusters to keep comes down to draw an horizontal line in the dendrogram and to enumerate the number of cut vertical lines. There are two basic criteria to consider when inserting the line: a low cut is divisive, i.e., it may place two similar countermeasures in different clusters; a high cut is agglomerative, i.e., it may put in the same cluster two unrelated countermeasures. Therefore, in order to get a coherent clustering, the most suitable level has to be chosen after some iterations by checking whether the countermeasures obtained in the clusters refer to the same security principle or set of principles. In this example, we obtained four clusters.
The resulting clusters are stored into the database
We manually collected security patterns and their strong points from the catalogue given in [38]. Strong points are seldom explicitly provided, and have to be deduced from the pattern descriptions, more precisely from their forces and intents. Then, we manually established two relations among patterns and strong points:
the first one is a many-to-many relation between security patterns and strong points, each pattern being characterised by a set of strong points, which can be shared with other patterns;
the second relation defines inter-pattern connections based upon the annotations “depend”, “benefit”, “impair” or “alternative” [37]. With P a set of patterns, this relation is defined as a mapping from
These data and relations, which provide connections among security patterns and strong points, are encoded into the database
Step 4: Security principle integration
We collected 66 security principles related to Web applications from the papers [7,14,25,26,32]. Then, we organised them into a hierarchy, from the most abstract to the most concrete principles. This principle organisation gives a complete hierarchical view on security mechanisms, which are required to counter an attack and provided by security patterns at the same time. As principles are hierarchically organised, we can link a strong point and a countermeasure cluster through this principle organisation even if they do not exactly have the same level of abstraction. For instance, consider a strong point and a cluster that are linked to two principles being at two different levels of the hierarchy. As a principle is a child of the second one, it is possible to find an association between the strong point and the cluster.
The resulting hierarchy is certainly not exhaustive but covers the security patterns dealt with in the catalogue given in [38]. Figure 5 depicts the security principle hierarchy, which is stored in the database

Hierarchical organisation of security principles.
In this step, we established the many-to-many relation between strong points and security principles. We have chosen to manually integrate this relation because strong points and principles are mostly presented in an abstract manner, with textual documents. We observed that the abstraction level of the strong points better fits with the most concrete principles, which are the leaves of the hierarchical organisation depicted in Fig. 5.
Afterwards, we established the many-to-many relation between countermeasures clusters and security principles. After Step 3, the clusters include countermeasures sharing the same security concepts. Once these concepts are known, linking clusters to security principles becomes straightforward, as principles are often defined with regard to these same concepts.
These relations are materialised with the database
If we take back our example of attack CAPEC-34, its first step “Explore” aims to explore a Web application to record its user-controllable input points. A countermeasure of this attack step consists in storing and auditing all the application accesses to detect the application exploration. Only the administrator should be able to perform this task. This countermeasure belongs to a cluster that is associated to the principles “Audit”, “Log” and “File Authorization”. We associated “Log” with the strong point “log sensitive information that should not be accessible to unauthorised users”, which finally belongs to the security pattern “Secure Logger”.
Step 6: Data consolidation
The previous databases
In our implementation, this step is automatically performed by the tool Talend by means of the meta-model given in Fig. 2. As two databases do not share more than one entity of the meta-model, this process does not raise any particular issue. This step produces the final database
Security pattern classification and ADTree generation
The final database
Security pattern classification
By means of the relations defined in the meta-model of Fig. 2, we extract from
the information about the attack (name, identifier, description);
the tree
for each step
for each principle
After the data extraction, we obtain a security pattern classification presented in a tabular form. The data integration steps and the classification extraction offer the advantage of semi-automatically achieving a security pattern classification that may be updated. For instance, if one want to add a new attack, the steps 1, 2 and 5 have to be followed. Likewise, if a new security pattern is proposed in the literature, the steps 3, 4 and 5 have to be applied. And the classification extraction can be re-executed every time the data-store is updated. From our database

Extraction of the pattern classification for the attack CAPEC-34.
Figure 6 depicts an extraction example for the attack CAPEC-34. The first column gives the attack ID. This attack has no sub attacks (otherwise, the next columns would list them too). Columns 2 to 4 index the attack steps and techniques. To ease readability, we only illustrate the step Experiment here. The security patterns allowing to prevent the step are given in Column 5. These four patterns have to be integrated in the application model and implemented to prevent the attack. The last two columns list the security patterns being associated with the patterns of Column 5 and their relations. For instance, Fig. 6 reveals that “Application Firewall” and “Input guard” are alternative patterns, hence using one of them is enough (although using both is not incorrect). Figure 6 also illustrates that “Secure Logger” may benefit from the security patterns “Secure Pipe” or “Audit Interceptor”.
A designer can interpret this extraction to select security patterns in a precise manner step after step. For instance, the attack step Experiment refers here to the sending of malicious requests by means of the application entry points (URLs, forms, etc.). These entry points were identified by the attack step Explore earlier. The first security solution is to validate requests either with the security pattern “Input Guard” or “Application Firewall”. The choice of the pattern mostly depends on the application design and features. “Application Firewall” allows the decoupling of the input validation from the remainder of the application. But it is also more cumbersome to implement than “Input Guard” as it aims at filtering all the application requests and responses.
Figure 6 reveals that our classification does not list all the data available in the data-store and related to an attack, e.g., countermeasures or strong points. We stated in Section 3 that this information is mostly used to establish direct relations among security concepts to eventually generate links between attacks and security patterns. We have chosen to generate a classification that includes these links but not all the underlying details to make it more readable. However, the data-store can still be queried to extract more related details about attacks (techniques, counter-measures, affected principles) or patterns (strong points, principles).
Raw tabulars may not be easily comprehensible for beginners in security or in patterns. Actually, this classification representation may contradict the criterion Comprehensibility, which refers to the ability of using the classification by experts or novices. This is why we supplement the classification with graphical models called ADTrees to improve readability.
Attack Defense Trees “are graphical representations of possible measures an attacker might take in order to attack a system and the defenses that a defender can employ to protect the system” [13]. We recall that ADTrees have two different kinds of nodes: attack nodes (red circles) and defense nodes (green squares). A node can be refined with child nodes with disjunctive or conjunctive refinements. The former is recognisable by edges going from a node to its children. The latter is graphically distinguishable by connecting these edges with an arc. Here, we extend these two refinements with the sequential conjunctive refinement of attack nodes, defined by the same authors in [11]. This operator expresses the execution order of child attack nodes. Graphically, a sequential conjunctive refinement is depicted by connecting the edges going from a node to its children with an arrow.
We generate ADTrees having the general form illustrated in Fig. 7(a). An ADTree root node is labelled by an attack. This root node may be disjunctively refined with sub-attacks. When an attack is defined with steps, its node is refined with child nodes labelled by these steps (sequential conjunctive refinement). The most concrete steps are graphically represented with attack nodes refined with other attack nodes labelled by techniques (disjunctive refinement). A node labelled by an attack step has one child defense node (in green in Fig. 7(a)), which may be the root of a defense sub-tree expressing security pattern combinations.

Generated ADTree forms.
ADTrees are obtained with the following steps:
a new ADTree is generated for every attack stored into
for each attack
for each step
The parent defense nodes, resulting from the above steps, are combined to a defense node labelled by “Pattern Composition” with a conjunctive refinement. This root defense node is linked to the attack node labelled by
When an attack step is linked to several security patterns, the second step may achieve a large defense sub-tree. This one can be reduced though by using logical expression simplifications. In short, if we replace the relations depend, benefit by the operation AND, the relation alternative by OR and the relations impair, conflict by XOR, we obtain classical logical expressions. These can be reduced with tools, e.g., BExpRed.3
An ADTree resulting from the previous steps represents all the possible scenarios that can lead to the realisation of the attack given in the root node. It provides sequences of attack steps and techniques that have to be executed in the right order to perform the attack with success. On the opposite side, it also includes defense nodes, which may be the roots of sub-trees expressing combinations of security patterns. It remains for the designer to chose one combination of patterns for every step at the application design stage.
We implemented the ADTree generation with a tool available in [21]. It takes as input an attack identifier and yields an ADTree, which is stored into an XML file. ADTree files can be modified or updated as the designer wishes with the tool given in [12].

ADtree of the Attack CAPEC-34.
Attack techniques descriptions
Figure 8 illustrates the ADTree obtained for the attack CAPEC-34 and Table 1 lists the techniques labelled in this ADTree. The root of the tree is the main goal of the attacker. Its second and third levels relate to the attack steps. These nodes are sequential conjunctive refinements of the root node. For instance, the step Exploit is achieved if both steps 3.1 and 3.2 are successfully executed in the right order (from left to right). An attack step has a disjunctive refinement of attack nodes labelled by techniques. The step is achieved if one of the attack techniques is applied with success. The lower nodes, labelled by attack steps, are linked to (green square) defense nodes, which illustrate security pattern combinations. We observe that the step 1.1 “Spider” can be prevented by designing the application with both patterns “Audit interceptor” and “Secure logger”. “Audit interceptor” can be used to detect the application crawling and to warn an administrator. The audit logs are secured by means of “Secure logger”, which guarantees that the audit logs cannot be accessed or altered by unauthorised users.
We discuss in this section the quality, accuracy and the limitations of our classification.
Classification quality
To assess the quality of this classification we studied the nine criteria proposed by Alvi et al. in [2]. Our classification meets seven of these criteria:
Navigability: our classification (supplemented with ADTrees), satisfies this criterion as it exposes the hierarchical refinements of an attack and, for every attack step, the combinations of patterns, which should be integrated in the application model. Besides, the classification provides the relationships among security patterns, which help choose the most appropriate pattern combination;
Determinism: the classification is clearly defined by means of the methodology steps. All these steps justify the soundness of the classification;
Unambiguity/Comprehensibility: as patterns are classified w.r.t. attacks, steps, and security principles, we provide a clear structure of categories. This organisation, which is illustrated by means of ADTrees, makes our classification readable and comprehensible even for novices in security patterns;
Usefulness: we believe the classification can be used in practice since it is based upon a known security pattern catalogue [38] and upon the CAPEC base, which is more and more employed in the industry. Furthermore, the Attack tree formalism is one of the most prominent security formalism for analysing threats. The ADTree model is supported by several tools, in particular ADTool [12]. Our ADTree generator actually generates XML files taken as inputs by ADTools;
Acceptability: an acceptable classification schema should be structured in a way that it provides help in partitioning the security pattern landscape and becomes generally approved [2]. Our classification partitions security patterns with regard to attacks and security principles. Furthermore, the evaluation given in Section 6 suggests that the classification makes participants more efficient and confident on their pattern choices without providing new constraints;
Repeatability: the classification is generic and can be reused. Furthermore, the data-store and the classification can be updated.
Our classification does not yet satisfy two quality criteria called Mutual exclusivity (patterns should be categorised into at most one category) and Completeness (all the existing security patterns are covered). Mutual exclusivity does not hold because a security pattern can be related to several attacks and security principles in the meta-model of Fig. 2. Even though this is not a primary goal of our classification, we could fix this issue by grouping the most concrete attacks into contexts in a mutual way as in [5]. To do so, the meta-model of Fig. 2 should be updated with a new entity called Context linked to an entity Concrete Attack itself linked to the Entity Attack.
Classification accuracy
We conducted a systematic review of the data integration steps and of the classification to ensure that the security patterns provided to counter an attack are effective and that none is missing among the 26 patters considered in this paper. Unfortunately, this review had to be manually done as most of the security concepts considered in the meta-model of Fig. 2 are abstract in nature. In short, the review process was carried out as follows. The first author studied the associations between security patterns and principles (steps 3, 4 and 5). Some relations were corrected and strong points were added during this process. The second author audited the associations between attacks and principles (steps 2 and 5). In particular, the clusters of counter-measures were carefully examined as the clustering technique requires supervision. Step 1, which involves the automatic extraction of information about attacks was quickly examined as the extraction was done from a public base regularly reviewed by thousands of users. Then, we asked two master students to review half of the pattern classification each one, i.e. the patterns provided to counter every attack. One of the authors reviewed the complete classification. The other author studied the associations between patterns and security principles and especially checked whether some associations between strong points and principles were missing. When there was a disagreement in answers, we discussed the issues until we reached an agreement.
Furthermore, to validate the accuracy of the classification, we also compared it with the results issued in the two papers dealing with the associations between patterns and attacks/weaknesses [1,34]. In these works, the security pattern intents are manually compared to the summaries of the attacks. As these textual sections are abstract, few relations were found. The largest contribution is provided by Alvi et al. who considered around 20 attacks and manually linked them to 5 patterns. The relations exposed in [1,34] does not reveal any inconsistency with our classification. For instance, the attack “CAPEC-66 SQL Injection” is related to the security patterns “Intercepting Validator” and “Input validation” in [34]. The attacks “CAPEC-244: Cross-Site Scripting via Encoded URI Schemes” and CAPEC-66 are only associated with the pattern “Intercepting Validator” in [1]. For these attacks, our method generates two ADTrees, which provide 4 combinations of 7 patterns for the CAPEC-244 and 8 combinations of 9 patterns for the CAPEC-66. These ADTrees give equivalent patterns. For instance, the ADTrees exhibit the pattern “Input Guard”, which can be implemented by “Intercepting Validator”. But, they also list other security patterns. For the CAPEC-244, some of these patterns are alternative to “Input Guard”, e.g., “Application Firewall”. Other patterns, e.g., “Authentication Enforcer” or “Controlled Object Monitor” are related to specific countermeasures of the attack CAPEC-244. We believe these patterns, which are not given in the previous classifications, are required to counter the attack with regard to the application context. More generally, we have observed that our classification exposes more pattern combinations per attack; the more choice is not always the better though. But, after inspection, we have concluded that more than one or two patterns are generally required to counter attacks.
Limitations
After the review of our classification, we have observed that it presents some limitations, which could lead to some research future work.
The notion of attack combination is not considered in the paper. Such a combination could be seen as several attacks or as one particular attack. If an attack combination can be identified and documented with its sub-attacks, then it can be integrated in our data-store.
The ADTree size limit is not supported by our ADTree generator. When an attack has a high level of abstraction, the resulting ADTree size may become large because it includes a set of sub-attacks, themselves linked to several patterns. This is a strong limitation since large trees are usually unreadable, which contradicts the method purposes.
The classification is not exhaustive: it includes 215 attacks out of 569 (for any kind of application) and 26 security patterns out of 176. It can be completed with new attacks automatically. But the completion of the data-store with new security patterns requires some manual steps. It could be interesting to investigate whether text mining techniques would help partially automate them. The classification exhaustiveness also depends on the available security data. In the ADTree of Fig. 8, all the lowest attack nodes are linked to defense nodes. We sometimes observed that no defenses are provided with other attacks. This can be usually explained by three main reasons:
security databases or pattern catalogues are incomplete (lack of mitigation, countermeasure, etc.). More data are required during the data integration process. In particular, we observed that some countermeasures are missing for some attacks of the CAPEC base;
the attack is relatively new. It is not documented yet or no pattern based solution is available;
security data are missing because we did not considered them in the manual data integration steps. For instance, as the pattern descriptions do not clearly provide strong points, it may be easy to skip one of them.
Several steps require manual interventions, which are prone to errors. These steps may lead to associations among security data that are bound to be controversial. We reviewed and compared our classification with other papers to check whether the selected patterns are appropriate. As these papers provide few associations between patterns and attacks, our validation process is incomplete. Validating every relation of the meta-model of Fig. 2 is a hard problem. It could be partially solved by the use of verification methods. But the writing of formal expressions for modelling the entities and associations of our meta-model is another long and error-prone task that should be addressed.
Finally, the inter-pattern associations are defined with binary relations only, as presented in [37]. These relations could be updated to link several patterns together.
Empirical evaluation
We evaluated our classification to ensure that the previous quality criteria can be met in practice. We empirically studied two scenarios where 24 participants were given the task of choosing security pattern combinations to prevent two attacks, CAPEC-244: Cross-Site Scripting via Encoded URI Schemes and CAPEC-66: SQL Injection, on two vulnerable Web applications, Ropeytasks4
In the first scenario, denoted Part 1, we supplied these documents to the students: some UML sequence diagrams capturing the main functionalities of the application, the CAPEC base, two concrete examples showing how to perform each attack, the catalogue of security patterns given in [38] and the pattern classification proposed in [1]. The catalogue includes 36 patterns whose most of them can be used with Web applications. For simplicity, we refer to these documents as basic documents in the remainder of the evaluation. In the second scenario, denoted Part 2, we supplied additional documents for the two attacks, i.e., our classification under the form of tabulars giving the attack steps, techniques and combinations of security patterns (as in Fig. 6), two ADTrees generated from the data-store (Fig. 8 is one of them). At the end of each scenario, the students were invited to fill in a form listing these questions:
Q1: Was it difficult to choose security patterns?
Q2: Was it difficult to use the CAPEC documentation (in Part 1)/
Q3: Was it difficult to use the basic pattern documents (in Part 1)/
Q4: What was your time spent for choosing security patterns?
Q5: How confident are you in your pattern choice?
Q6: What are the patterns you have chosen?
This form was devised to evaluate these three criteria:
C1 Comprehensibility: does our classification make the pattern choice less difficult? C2 Efficiency: does our classification help reduce the time needed to choose patterns? C3 Accuracy: are the chosen patterns correct?
From the forms returned by the participants (available in [21]), we extracted the following results. Firstly, Fig. 9 illustrates the percentages of answers to the questions Q1 to Q3. For these, we proposed this four-valued scale: easy, fairly easy, difficult, very difficult. From Question Q4, we collected the time spent by the participants for choosing patterns (in Part 1 and 2 of the experimentation). In summary, response times varied between 15 and 50 minutes for Part 1, and between 5 and 30 minutes for Part 2. We gauged the levels of confidence of the participants towards their security pattern choices (Question Q5). The possible answers were for both scenarios: very sure, sure, fairly sure, not sure. The bar charts of Fig. 10 depicts the levels of confidence of the participants.

Response rates for Q1 to Q3.

Confidence rates (Q5).
We finally analysed the security pattern combinations provided by the participants in Question Q6. We organised these responses into four categories (ordered from the more to the less accurate):
Correct: we considered that any pattern combination allowing to counter the attack is a good solution. Several pattern combinations were accurate. When a participant gives one of these combinations, its response belongs to this category;
Missing: we gather in this category, the incomplete pattern combinations without additional patterns;
With these categories, we obtained the bar charts of Fig. 11, which gives the number of responses per category and per experiment scenario.

Accuracy Measurement (Q6).
C1: Comprehensibility
Figure 9 shows that 33% of the participants estimated that the pattern choice was easy with our classification and ADTrees (Q1). In contrast, no participant found that the choice was easy when using only the basic pattern documents. The rate of “Easy” “Fairly easy” increased by 70,8% between Part 1 and Part 2. With Question Q2, 41,7% of the participants found “Fairly easy” the use of the CAPEC base, whereas 87,5% esteemed our documents (ADTrees) “Easy” and “Fairly easy” to use. Similarly, only 37,5% of the participants found “Easy” and “Fairly easy” the reading of the basic pattern documents. This rate reaches 87,5% with our classification. Consequently, Fig. 9 shows that our classification and ADTrees make the pattern choice easier and that they are simpler to interpret than the basic pattern documents. In addition, Fig. 10 expresses that the confidence of the participants on their responses increased by 20,8%.
C2: Efficiency
The average time spent by the participants for choosing patterns is equal to 32 minutes in the first scenario (Part 1). This time delay decreases to 15 minutes when the participants employed our classification and ADTrees. Furthermore, no participants went over 30 minutes for choosing patterns in Part 2 (in contrast with 50 minutes for Part 1). Hence, our documents make the participants more efficient.
C3: Accuracy
Figure 11 reveals how complicated it is to read the basic pattern documents. Indeed, no participant gave a correct pattern combination in Part 1. In contrast, when they used our classification and ADTrees, the number of correct responses rises to 15 out of 24 (60%). Furthermore, the category of responses “
Threat to validity
There are many application and system contexts, but this preliminary experimental evaluation is applied on Web applications only. This is a threat to external validity, in the sense that the results about Comprehensibility and Accuracy cannot be generalised to all software systems. This is why the experiments deliberately avoid drawing any general conclusion. But, this threat is somewhat mitigated by the considered context itself. The Web development is indeed a rich field in great demand in the software industry. Web applications also expose a lot of well-known vulnerabilities. Besides, this well-studied application context helped us propose experimentations involving participants having the adequate knowledge on software development and security.
This leads to the second threat to validity concerning the audience. Our evaluation was indeed performed on a public of students following a block release training. This sort of subject is sometimes considered as a bias, as any strict process should help them improve their work. But, several studies conclude that student experiments are appropriate as evaluation for software engineering approaches, especially when Comprehensibility is a criterion taken under evaluation [6]. We also believe that we would have achieved the same results with a group of developers from the industry, as they often do not have any software security skills. Evaluating our tools with a public of security experts could be interesting, but this work initially do not target this kind of audience.
Another threat relates to the learning effect, which may happen between the two stages of the evaluation: we indeed applied the same approach to the same participants. We only replaced the Web application in Part 2 (to avoid another bias, we took care to provide an application providing similar functionalities and exposing the same vulnerabilities), and we completed the available security documents by a list of ADTrees. We deliberately chose to apply the same approach in Part 1 and 2 to evaluate the felling of the participants about the security documents (Q3, Q2) and their confidence on their work (Q5). In other words, the fact of changing the audience might have brought a bias on our analysis of Comprehensibility. But, it is indisputable that this kind of experiment may influence our results. We tried to mitigated this threat by changing the question order. Besides, our results suggest that there are significant improvements in Comprehensibility and Accuracy between Part 1 and 2. We believe that these improvements cannot only be explained by a learning effect and that our approach has a real impact here.
There is also a risk as we asked the participants to select security patterns independently of the application design, even though the latter was partly given to them. Hence, our results on Accuracy (depicted in Fig. 11) might include patterns whose integration in the application is not possible. We studied the problem of pattern selection with regard to an existing model in another work [23]. We have shown that the strict analysis of pattern integration in a model is not trivial and depends on several factors, e.g., structural or behavioural integration. We thus believe that this study requires its own kind of experiment. In addition, the task of selecting patterns, which can fit in an existing model, would require much longer experiment times that those we considered. But, let us examine the impact of considering as correct only the patterns that fit in the application design. With regard to Fig. 11, this new viewpoint would only affect the results of the two first columns: we should have less pattern combinations in the column Correct, which are transferred in the second column
We thus believe that the empirical experience reported in this paper provides relevant insights on the benefits of using our pattern classification.
Conclusion
We have proposed a security pattern classification method based on the integration of various security data. The method, composed of 6 manual and automatic steps, generates a data-store and a classification associating attacks, security principles and security patterns in order to help designers choose security pattern combinations to design secure applications. The method also builds ADTrees, which graphically illustrate the classification. As a proof of concept, we implemented these steps and generated a security pattern classification, which includes 215 CAPEC attacks, 66 security principles and 26 security patterns. We evaluated the quality of this classification by means of the criteria exposed in [2] and with an experimentation performed on 24 participants. The experimentation suggests that our classification and ADTrees make the pattern choice easier, more accurate, and the participants more efficient.
We also mentioned several limitations, which could lead to future research. We firstly intend to focus on the automation of some of the data integration steps. We will investigate whether some text mining techniques could help partially automate the extraction and integration of security data without bringing ambiguity. Our method does not take into consideration the size of the ADTrees, which might impede Comprehensibility. The ADTree reduction could be a first solution to this problem. But, the literature does not yet provide a generic method for this kind of reduction. Reducing such trees remains a hard problem as the node meaning must be taken into account in the node aggregating process. Another line of future work is to integrate other security concepts in our data-store to provide other kinds of classifications. For example, several researchers studied the relations between security patterns and tactics. Tactics are described as measures or decisions taken to improve quality factors [4,9]. Patterns actually represent some realisations of tactics. In the meta-model of Fig. 2, the notion of security tactic should find its place between principles and strong points. But, another way of integration would be to replace security principles with tactics. This alternative could refine the pattern selection, as the desirable security property might be more precise. But we estimate that the effort required to associate counter-measures, strong points and tactics would be substantial as there are many more tactics than principles. Once more, text mining might help reduce this effort. We will also continue to investigate the use of data integration techniques to simplify some steps of the software life cycle. In particular, we will study whether our generated ADTrees could serve for the test case generation.
