Hybrid adversarial defense: Merging honeypots and traditional security methods 1

Abstract

Most past work on honeypots has made two assumptions: (i) they assume that the only defensive measure used is a honeypot mechanism, and (ii) they do not consider both rational and subrational adversaries and do not reason with an adversary model when placing honeypots. However, real-world system security officers use a mix of instruments such as traditional defenses (e.g. firewalls, intrusion detection systems), and honeypots form only one portion of the strategy. Moreover, the placement of traditional defenses and honeypots cannot be done independently. In this paper, we consider a Stackelberg-style game situation where the defender models the attacker and uses that model to identify the best placement of traditional defenses and honeypots. We provide a formal definition of undamaged asset value (i.e. the value that is not compromised by the attacker) under a given defensive strategy and show that the problem of finding the best placement so as to maximize undamaged asset value is NP-hard. We propose a greedy algorithm and show via experiments, both on real enterprise networks and on ones generated by the well-known network simulation tool NS-2, that our algorithm quickly computes near optimal placements. As such, our method is both practical and effective.

Keywords

Adversarial defense of enterprise systems game theoretic models

1. Introduction

There has been a tremendous amount of work on the design and placement of honeypots to protect enterprise networks [5,37,42]. At the same time, there has been a great deal of work on the use of game theoretic models for protecting enterprises [3,17,29,30,39,49].

In this paper, we consider a first step towards what we call hybrid adversarial defense. There are obviously multiple ways in which we can defend an enterprise. Examples include honeypots as mentioned above, but also include more traditional defenses such as multilevel access control, firewalls, and intrusion detection systems. Past efforts on the placement of honeypots within a network have two major drawbacks.

First, they largely assume that honeypots are the only form of defense. This was a reasonable assumption to make in early works. But in the real world, honeypots form one part of an ensemble of defenses. These traditional defenses include firewalls [4], intrusion detection systems [28], and mandatory access control schemes [35]. In this paper, we present a unified framework in which traditional defenses can be seamlessly deployed in conjunction with honeypots.

Second, most past work on honeypot placement [5,37,42] does not model the attacker’s behavior – even though there has been much work in security in general on modeling the behavior of intelligent attackers [17,30,49]. In this paper, we present a mathematical model of an attacker who seeks to maximize the (expected) damage she causes. The model is capable of taking various possible attacker behaviors into account – for instance, it can consider attackers that seek to minimize the likelihood of discovery. Based on this, we identify methods to simultaneously place both honeypots and traditional defenses in the presence of an attacker model.

The framework presented in this paper overcomes these drawbacks of past work and, to our knowledge, is the only work that identifies a mechanism to optimally place both traditional defenses and honeypots in the presence of an attacker model. Of course, others have developed other types of intelligent attacker models in other cybersecurity contexts [17,30,49].

Moreover, the framework allows modeling subrational attackers that do not behave in a fully optimal way, i.e. they do not carry out attacks that maximize their utility, and unskilled or unexperienced defenders that act suboptimally in the detection process, i.e. under- or over-estimate the likelihood of detecting attacker actions. Our framework proves to be robust in providing appropriate defenses even when dealing with subrational attackers and unskilled defenders.

The paper is organized as follows. Section 2 starts off with an analysis of past work in this area. We provide an initial set of definitions about the environment in Section 3 – this section introduces the notion of an enterprise network and a system vulnerability dependency graph from past work [41]. Section 4 defines the important concept of an Attacker Belief Evolution Tree which captures how the attacker’s beliefs about the network change with time and analyzes the attacker’s behavior. Section 5 then develops the technical material explaining how the defender strategy is formulated, i.e. where the defender chooses to simultaneously place traditional defenses and honeypots so as to minimize the maximal expected damage the attacker can cause. We also show that this problem is NP-hard. Section 6 develops an exact algorithm (H_Exact) which solves the optimization problem exactly – but of course, this is a time consuming algorithm. We therefore develop the polynomial-time H_Greedy algorithm. Section 7 reports the results of detailed experiments showing that H_Greedy reflects significant run-time savings compared to H_Exact. In addition, we show that in practice, H_Greedy delivers placements of traditional defenses and honeypots that are near optimal, typically well over $95 %$ of the optimal value. In conclusion, our proposed framework provides theoretically grounded and practically deployable mechanisms for use by system security officers.

2. Related work

The problem of defending enterprise networks is well studied in the literature from a wide variety of perspectives. We will evaluate past work from three perspectives: (i) Does the work consider honeypot placement together with the installation of traditional security software? (ii) Does the work include an attacker model? (iii) Can the model take into account the tendency of the attacker to reduce the probability of discovery? Table 1 shows past work and the criteria they fulfil. The reader can readily see that the only work satisfying all three criteria is the current paper.

Table 1
Comparison with related work

Hybrid adv. defense Attacker model Attacker model w/detection probability

[12,13,36,37,41] – – –

[5 ,20 ,23 ,43] – √ –

This paper √ √ √

	Hybrid adv. defense	Attacker model	Attacker model w/detection probability
[12,13,36,37,41]	–	–	–
[5 ,20 ,23 ,43]	–	√	–
This paper	√	√	√

One body of work studies automatic patching of vulnerabilities, and when this is not possible, the extreme solution is the deactivation of products containing vulnerabilities (see [41]). [12,13,36] consider the problem of finding plans for patching vulnerabilities, that are tradeoffs between cost and risk, by using Pareto analysis. No honeypots are considered. [13] also defines a problem involving a game theory-based solution which assumes that the attacker does not know the strategy of the defender (in our paper, we assume that the attacker can potentially detect the defender’s strategy through one or more network scan operations, which is consistent with the way attackers maneuver through the network). Moreover, they do not consider deception via the intelligent placement of honeypot nodes.

Much work has been done on honeypot networks. A comprehensive report about existing implementations [37] analyzes and compares the advantages and the weaknesses of several implementations. Many of these are able to emulate the honeypot network on just one machine as opposed to an enterprise network. Though this solution is less expensive than generating real honeypot nodes, the attacker who probes the network can easily detect fake nodes. Moreover, honeypots that look too vulnerable might raise the suspicion of attackers. Therefore there is a need to strategically deploy a honeypot network that carefully considers attacker behavior. One example is related to data leakage, i.e. the distribution of private data to unauthorized entities (a survey can be found in [42]). In [5], fictitious data called “honey tokens” are automatically generated, then verified through a Turing test to determine if a human (in this case an attacker) will be able to recognize them as honey tokens or not.

The use of game theory in cybersecurity is not new [29,41]. For instance, [20,23] study different game models that address strategies for deploying honeypot networks. They define basic honeypot selection games where the defender chooses the properties of the network, plus an extension where the attacker can also probe the network, and a final version where the attacker strategies are represented by an attack graph. These game models make restrictive assumptions such as the perfect rationality of the attacker. Moreover, they assume that honeypots are the only defensive mechanism used. [10] analyzes a game framework to evaluate the identification time of a decoy node w.r.t. a real one. In particular they study the benefit of randomizing the space of node IPs (when the specific application allows this option) on the detection of decoy and real nodes. [38] presents a dynamic model capturing progressive attacks on a computer network. They define a supervisory control problem with imperfect information, modeling the computer network operation by a discrete event system and provide the best policy for the defender. A similar but more general optimal control approach (not only based on a computer network) focused on information security is provided in [44]. For a complete and vast survey about game and network security, see [30].

[43] highlights the importance of honeypot networks not just for the ability to attract the attacker but also to delay her. They define the concept of distraction chains, i.e. sequences of decoy systems used to entice an adversary to explore useless information in order to distract her from the real network. They study the problem of creating the distraction chains and embedding them in the network. As in the preceding case, they do not consider the impact of using traditional defensive software along with honeypots.

Moreover, none of the above works considers how the attacker can react in detail. One important tendency today is to spread out attacks over a long period in order to decrease the likelihood to be discovered. This is typical of Insider Threats [11] and Advanced Persistent Threats [6] – our framework can model the tendency of the attacker to reduce the likelihood to be discovered according to her beliefs.

[27] defines a game-theoretic model of the interactions between nodes in wireless networks. Each node is a player with a reputation score – a node can decide whether to attack a neighbor or not, but also decide whether to increase or decrease the reputation scores of its neighbors depending on whether they are malicious or not. Nodes decrease the reputation of other nodes via a rumor algorithm to communicate this reduction to other nodes. If the reputation of a node goes below a threshold, no other devices will provide services to it. The best policies to govern these interactions are captured via an equilibrium. [45] provides a comprehensive review of research combining radio networks and game theory.

In summary, our framework differs from past works because it tries to defend an enterprise network by:

simultaneously adding honeypot nodes and traditional security software in order to minimize the expected damage;

considering much more realistic attacker behaviors than those considered in past work;

modeling each player’s beliefs on the other – for instance, at each action the attacker maintains her belief about the possibility to be detected by the defender, and her whole behavior will be conditioned by this belief;

considering subrational adversaries and situations where the defenders may have widely different skill levels;

providing an efficient approximate solution to the problem of computing the best defender strategy, in some cases with a fixed approximation ratio of $1 - \frac{1}{e}$ . This property makes our work especially effective in situations of “high dynamism” where attacker strategies evolve rapidly and always-new vulnerabilities and patches appear and will be easily known on the (dark) web.

3. Preliminaries

In this section, we formally define enterprise networks and system vulnerability dependency graphs (SVDGs) – both concepts defined by us previously in [41].

We assume the existence of some universe $S$ of software packages and some set $V$ of known vulnerabilities – we do not consider zero-day vulnerabilities. Though it may seem apparent that known vulnerabilities should be patched, the actual real world patching behavior of enterprises is astonishingly poor. Different studies have shown different statistics on the time gap between when a vulnerability is discovered and when it is patched. For instance, Infosecurity Magazine lists this time gap as 100–120 days on average.2

²
https://www.infosecurity-magazine.com/news/companies-average-120-days-patch

ZDNet says that on average, the financial sector takes up to 176 days to apply patches.3

http://www.zdnet.com/article/financial-sector-takes-176-days-on-average-to-patch-security-vulnerabilities

The firm SecurityMetrics says that on average, merchants were vulnerable for 470 days before an attacker compromised their system.4

⁴

http://blog.securitymetrics.com/2015/10/how-long-businesses-vulnerable.html

Irrespective of which of these numbers is correct, it is clear that these time gaps for patching are too long. CSO Magazine [16] states that patching is difficult for a multitude of reasons including: (i) too many patches are released every day for security officers to handle the load, (ii) legacy vulnerabilities that existed before a given security officer joined an organization cause problems, (iii) the ability to patch certain code is limited to vendors rather than an enterprise and these vendors are not patching the code, (iv) patching often interrupts the smooth 24/7 operation of an enterprise leading to delays. As an example, a patch5

⁵

https://technet.microsoft.com/en-us/library/security/ms17-010.aspx

for the CVE-2017-0147 vulnerability was released several months before the infamous WannaCry ransomware of 2017 unleashed considerable damage in the US and beyond – victims of WannaCry had not patched this vulnerability. In general, there exist many vulnerabilities that are not patched or cannot be patched in enterprise systems.

A software vulnerability mapping $ν : S \to 2^{V}$ associates a set of vulnerabilities with each piece of software. We use $impact (v)$ to denote the impact of vulnerability $v \in V$ if it is successfully exploited by an attacker. There are several publicly available databases which capture all of these quantities – examples include NIST’s National Vulnerability Database [34] and their Common Vulnerability Scoring System [31], and MITRE’s Common Weakness Scoring System [32].

Definition 1 (Enterprise Network).

An enterprise network is a triple $EN = (N, E, compr)$ where N is a set of enterprise nodes, $E \subseteq N \times N$ is a set of edges, and $compr \subseteq N$ is a set of compromised nodes.

Example 1.
Figure 1(a) shows an example enterprise network which serves as a running example. Here, we have $N = {n_{1}, n_{2}, n_{3}}$ , $E = {(n_{1}, n_{2}), (n_{2}, n_{1}), (n_{1}, n_{3}), (n_{3}, n_{1}), (n_{2}, n_{3}), (n_{3}, n_{2})}$ , and $compr = {n_{1}}$ .
Fig. 1.
A schematic diagram of an enterprise network (a) and a system vulnerability dependency graph (b).

Suppose ${EN}_{1}$ and ${EN}_{2}$ are enterprise networks, with ${EN}_{i} = (N_{i}, E_{i}, {compr}_{i})$ . The union of ${EN}_{1}$ and ${EN}_{2}$ , denoted ${EN}_{1} \cup {EN}_{2}$ , is the enterprise network $(V_{1} \cup V_{2}, E_{1} \cup E_{2}, {compr}_{1} \cup {compr}_{2})$ .

Attackers frequently attack networks by compromising one node (by exploiting some vulnerability in a piece of code running on that node) and then launching an attack on another vulnerability in a neighboring node (or within the same node). In order to capture this “hopping” behavior, we define the notion of a “system vulnerability dependency graph” (SVDG).
Definition 2 (System Vulnerability Dependency Graph (SVDG)).

Given an enterprise network $EN = (N, E, compr)$ , a system vulnerability dependency graph is a directed graph $G = (S V, E V)$ where:

$S V \subseteq N \times S \times V$ is the set of vertices such that if $(n, s, v) \in S V$ then $v \in ν (s)$ ;

$E V \subseteq S V \times S V$ is the set of edges such that if $((n_{1}, s_{1}, v_{1}), (n_{2}, s_{2}, v_{2})) \in E V$ then $(n_{1}, n_{2}) \in E$ .

Intuitively, a vertex $(n, s, v) \in S V$ means that node n runs software s with vulnerability v, whereas an edge $((n, s, v), (n^{'}, s^{'}, v^{'})) \in E V$ means that once the attacker has exploited v on node n she can then exploit $v^{'}$ on node $n^{'}$ .

As said before, NIST’s NVD documents known vulnerabilities in software, which allow us to automatically create the nodes of an SVDG. Attack graphs [19] are a variant of SVDGs and there are several tools to automatically create such graphs after scanning networks.6

⁶
https://www.open-scap.org/tools/scap-workbench

Given an SVDG $G = (S V, E V)$ , we use $in (G, sv)$ and $out (G, sv)$ , respectively, to denote the sets ${{sv}^{'} | ({sv}^{'}, sv) \in E V}$ and ${{sv}^{'} | (sv, {sv}^{'}) \in E V}$ , respectively.

Example 2.

Fig. 1(b) shows an example system vulnerability dependency graph G built on top of the enterprise network of Fig. 1(a). In this case, we have $in (G, (n_{3}, s_{3}, v_{3})) = {(n_{2}, s_{2}, v_{5}), (n_{2}, s_{2}, v_{4}), (n_{3}, s_{3}, v_{4})}$ , and $out (G, (n_{2}, s_{2}, v_{5})) = {(n_{3}, s_{3}, v_{3}), (n_{3}, s_{3}, v_{2})}$ . We also assume that three vulnerabilities $v_{2}$ , $v_{3}$ and $v_{4}$ are present in software $s_{3}$ in node $n_{3}$ ; similarly, vulnerabilities $v_{4}$ , $v_{5}$ and $v_{6}$ are present in software $s_{2}$ in node $n_{2}$ .

Suppose $G_{1}$ and $G_{2}$ are SVDGs, with $G_{i} = ({S V}_{i}, {E V}_{i})$ . The union of $G_{1}$ and $G_{2}$ , denoted $G_{1} \cup G_{2}$ , is the SVDG $({S V}_{1} \cup {S V}_{2}, {E V}_{1} \cup {E V}_{2})$ .

The defender can modify the enterprise network by adding honeypot nodes, as is the case for node $n_{h}$ in Fig. 2. Obvously, the SVDG associated with an enterprise network $E N$ changes when $E N$ changes.

Fig. 2.

The enterprise network of our running example after the addition of a honeypot node $n_{h}$ and a traditional defensive software on $n_{2}$ (a) and the updated system vulnerability dependency graph (b).

We assume that installing defensive software in a node (Fig. 2) will enable detecting any action of an attacker on that node while not changing the topology of the enterprise network and of the SVDG. We will provide more details about deploying traditional defensive software in Section 5.

Table 2 summarizes the notations used in this paper.

Table 2

Notations used throughout the paper

Notation	Description
$S$	Set of software packages
$V$	Set of vulnerabilities
$ν : S \to 2^{V}$	Software vulnerability mapping
$impact (v)$	Impact of v
$EN = (N, E, compr)$	Enterprise network
$G = (S V, E V)$	SVDG where $S V \subseteq N \times S \times V$ and $E V \subseteq S V \times S V$
${EN}_{T} = (N_{T}, E_{T}, {compr}_{T})$	Entire real enterprise network
$G_{T} = ({S V}_{T}, {E V}_{T})$	Entire real SVDG
$st = (A, EN, G)$	Attacker state
$VA (st)$	Valid attacker actions in state $st$
${st}_{0}$	Initial attacker state
${st}^{'} = tr (st, a)$	Transition from state $st$ due to action a
$cost (a)$	Cost of action a (attacker)
$T = (TV, TE)$	ABET
${tv}_{k}^{l} = ({st}_{k}^{l}, a_{k}^{l}, status, P (A_{k}^{l}), cost (A_{k}^{l}))$	Vertex of the ABET (level/time l, identity k at level l)
$P (A)$	Probability of being detected after the actions in A (attacker)
$cost (A_{k}^{l})$	Total cost of the actions in A (attacker)
$P (a \| st)$	Action selection probability (attacker)
$Q (a)$	Probability of detecting action a (attacker)
$util (a)$	Utility of action a (attacker)
$H$	Set of honeypot nodes
$(h, n) \in AH$	Honeypot node $h \in H$ replicates the connections of node $n \in N_{T}$
$δ = (A H, A T)$	Defensive strategy, where $A H \subseteq AH$ and $A T \subseteq N_{T}$
${cost}_{d}$	Cost of elements of $AH \cup N_{T}$ (defender)
${cost}_{d} (δ)$	Total cost of the actions in δ (defender)
${\hat{c}}_{d}$	Defender budget
$T_{D} = (TV, TE)$	DABET
$P_{d} ({tv}_{k}^{l} \to {tv}_{k^{'}}^{l + 1})$	Transition probability (defender)
$P_{d} (a \| st)$	Attacker action selection probability (defender)
$Q_{d} (a)$	Probability of detecting action a (defender)
$damage (tv)$	Damage yielded by action $a_{k}^{l}$ where $tv = (\cdot, a_{k}^{l}, \cdot)$
$damage (p)$	$\sum_{i \in [0, z]} damage ({tv}_{i})$
$Exp D (T_{D})$	Expected damage
$T_{a}$	Total asset value
$U_{a} (T_{D})$	Undamaged asset value
$T_{C D} = (TV, TE)$	CDABET

4. Analysis of the attacker

Henceforth, ${EN}_{T} = (N_{T}, E_{T}, {compr}_{T})$ and $G_{T} = ({S V}_{T}, {E V}_{T})$ will denote an arbitrary but fixed enterprise network and SVDG, respectively. We assume that the attacker’s set of actions is $\begin{matrix} {scan (n) | n \in N_{T}} \cup {exploit (n, s, v) | (n, s, v) \in {S V}_{T}} . \end{matrix}$

Observe that the attacker does not initially know the entire enterprise network and SVDG. She uses her actions to iteratively uncover more about them.

An attacker strategy is a sequence of attacker actions (without duplicates, since scanning the same node twice or exploiting the same vulnerability twice leads to no improvement). However, not all attacker strategies are feasible. For this, we first need to define the concept of an attacker’s state.

Definition 3 (Attacker State).

An attacker state is a triple $st = (A, EN, G)$ where A is the sequence of actions taken to reach the state7

⁷
As our notion of attacker state is really about what the attacker knows, we include the history of actions A in it so that the attacker can reason about the effects of her past actions.

and

EN

and G represent the enterprise network and the SVDG known to the attacker, respectively.

The initial state of the attacker is ${st}_{0} = (\emptyset, {EN}_{0}, G_{0})$ – the set of actions the attacker has performed is ∅, and the attacker knows an initial network ${EN}_{0}$ and an initial SVDG $G_{0}$ . In turn, ${EN}_{0} = (N_{0}, E_{0}, {compr}_{0})$ where (i) ${compr}_{0}$ is some initial set of entry points that the defender believes the attacker can use, (ii) $N_{0} = {compr}_{0} \cup {n | (n^{'}, n) \in E_{T}, n^{'} \in {compr}_{0}}$ contains the nodes in $N_{0}$ together with the nodes accessible from those in the set, and (iii) $E_{0} = {(n, n^{'}) | n, n^{'} \in N_{0}, (n, n^{'}) \in E_{T}}$ consists of the obvious edges. We initially set $G_{0} = (\emptyset, \emptyset)$ . Starting from this initial state, the attacker can take actions that build up her knowledge of the enterprise network.

Definition 4 (Valid Attacker Actions).

Given an attacker state $st = (A, (N, E, compr), (S V, E V))$ , the set of valid attacker actions in $st$ , denoted $VA (st)$ , is the union of the following sets:

${scan (n) | n \in N ∖ compr \land scan (n) \notin A}$ ;

${exploit (n, s, v) | (n, s, v) \in E V \land exploit (n, s, v) \notin A \land \exists exploit (n^{'}, s^{'}, v^{'}) \in A s.t. ((n^{'}, s^{'}, v^{'}), (n, s, v)) \in E V}$ .

Thus, a $scan (n)$ action is valid if the attacker has not done it before and if she has not already compromised that node. An $exploit (n, s, v)$ action is valid only if she has already exploited an $(n^{'}, s^{'}, v^{'})$ triple that allows her to exploit $(n, s, v)$ . In addition, she should not have used the specific exploit before.

Definition 5 (State Transition after an Action).

Suppose the attacker performs an action a in state $st = (A, (N, E, compr), (S V, E V))$ . This results in the state ${st}^{'} = tr (st, a) = (A^{'}, {EN}^{'}, G^{'})$ where:

if $a = scan (n)$ then $A^{'} = A \cup {scan (n)}$ , ${EN}^{'} = (N, E, compr)$ , and $G^{'} = ({S V}^{'}, {E V}^{'})$ where ${S V}^{'} = S V \cup {(n, s, v) | (n, s, v) \in {S V}_{T}}$ and ${E V}^{'} = {(sv, {sv}^{'}) | sv, {sv}^{'} \in {sv}^{'}, (sv, {sv}^{'}) \in E V})$ ;

if $a = exploit (n, s, v)$ then $A^{'} = A \cup {exploit (n, s, v)}$ , $G^{'} = G$ , and ${EN}^{'} = (N \cup {n^{'} | (n, n^{'}) \in E_{T}}, E \cup {(n, n^{'}) | (n, n^{'}) \in E_{T}}, compr)$ .

Simply put, a scan operation does not change the knowledge the attacker has of the enterprise network, but it allows her to augment her knowledge of the true SVDG because she can identify potential vulnerabilities in the scanned node. In contrast, an exploit action allows the attacker to uncover new nodes in the enterprise network, as well as new edges.

We assume the existence of a function $cost$ that assigns a cost to each attacker action a. This function may encompass the attacker’s financial costs to carry out an action (e.g. if she has to buy an exploit) or capture her risk of being discovered, or some combination of these factors. For instance, the zero-day exploit reseller Zerodium published the cost of many exploits in November 2015. RAND Corporation published a comprehensive book on exploit prices, not just for zero-days, but also for more mundane attacks (e.g., stolen data) [2]. The MITRE CVE list of the National Vulnerability Database contains “exploitability scores” that describe how difficult it is to exploit a vulnerability – it could also be reasonable to assume that an attacker is aware of this and considers that her costs are proportional to this score. There are also other proposals (see, e.g., [46,47]), however we generally prefer CVE as it considers many different factors, such as the type and number of authentication processes, whether a system is accessible from outside or not, etc. It uses a formula that outputs values ranging between 0 and 10 – many researchers adopt these measures. In addition, MITRE keep improving their measurement method to reflect state-of-the-art attack techniques.

Definition 6 (Valid Attacker Strategy).

Given a maximum cost $\hat{c}$ , a sequence of attacker actions ${a_{1}, \dots, a_{m}}$ is a valid attacker strategy if there exists an associated sequence ${{st}_{0}, {st}_{1}, \dots, {st}_{m}}$ of attacker states such that:

$\forall i \in {1, \dots, m}$ , $a_{i} \in VA ({st}_{i - 1})$ ;

$\forall i \in {1, \dots, m}$ , $s_{i} = tr ({st}_{i - 1}, a_{i})$ ;

$\sum_{i = 1}^{m} cost (a_{i}) ⩽ \hat{c}$ ;

$\forall a \in VA ({st}_{m})$ , $\sum_{i = 1}^{m} cost (a_{i}) + cost (a) > \hat{c}$ .

It should be observed that the last condition in Definition 6 requires the attacker to carry out “maximal” attack strategies, in the sense that no further action can be added without exceeding the budget $\hat{c}$ . However, if the attacker is believed to only use up to $k %$ of her actual budget, we can simply use ${\hat{c}}^{'} = k \cdot \hat{c}$ as the maximum cost in the definition.8

⁸
Estimating the total budget of an attacker requires the defender to consider adversaries with deep pockets in some cases, such as those involving big companies or defense departments. On the other hand, a small company with credit card data in their network may need to defend against less sophisticated attackers. We suggest that the maximal cost that the attacker is willing to bear should be anti-monotonic in the cost of mounting a defense to secure the enterprise network and anti-monotonic in the size of the adversary. Several such functions can be written down – note that while this is not a perfect solution, today’s system managers are largely not even attempting to estimate attacker costs.

4.1. Attacker belief evolution

We model the interaction between the attacker and the defender using a Stackelberg game that has two players, a leader and a follower. The leader commits to a strategy first, and then the follower optimizes her reward, by considering the action chosen by the leader. In our case, the defender is the leader and her strategy is determined by which security resources should be allocated on the network. The follower, in our case the attacker, given the defender’s configured network, will start to explore the network and exploit the vulnerabilities using a randomized greedy approach that locally maximizes the attacker’s utility – this amounts to exploring a structure called attacker belief evolution tree (ABET, see the remainder of this section). The use of a Stackelberg game also captures the fact that the defender has usually low chances to observe the reaction of the attacker – we can only emulate the strategies of the attacker to decide her best strategy, but once the defender chooses the strategy, it is difficult to observe what the attacker will do. In fact, Stackelberg-style games have been extensively used in both the cybersecurity and physical security literature in such situations. The defender uses a different structure (called DABET, see Section 5) in order to emulate the attacker’s behavior – this structure also takes in account the beliefs that the defender has about the attacker.

We now introduce the notion of ABET that enumerates all possible valid attacker strategies and represents how her belief evolves with time. This belief may or may not be identical to the real probability of detection, since the attacker does not know how well the defender might detect her. In the definition of ABET below, the superscript l denotes the level of a vertex of the tree (which corresponds to timepoint l) and subscript k differentiates the vertices at a particular level.

Definition 7 (Attacker Belief Evolution Tree (ABET)).

An attacker belief evolution tree is a tree $T = (TV, TE)$ where:

Each vertex ${tv}_{k}^{l} \in TV$ is a tuple $({st}_{k}^{l}, a_{k}^{l}, status, P (A_{k}^{l}), cost (A_{k}^{l}))$ where ${st}_{k}^{l} = (A_{k}^{l}, {EN}_{k}^{l}, G_{k}^{l})$ is the attacker state, $a_{k}^{l}$ is the action that led to that state, $status \in {D, \neg D}$ represents whether at least one of the actions in $A_{k}^{l}$ has been detected (D) or not ( $\neg D$ ),9

⁹
We assume that the detection of an action leads to immediate removal of the attacker.

P (A_{k}^{l})

is the probability of being detected given all the actions taken by the attacker to reach the current state, and

cost (A_{k}^{l})

is the total cost of the actions taken by the attacker until time l.

The root vertex is $(S_{0}, ⊥, \neg D, 0, 0)$ .

Each edge $({tv}_{k^{'}}^{l}, {tv}_{k}^{l + 1})$ is labeled with the action that leads from ${st}_{k}^{l}$ to ${st}_{k^{'}}^{l + 1}$ .

In an ABET, every vertex with

status = \neg D

has two children connected through the same action: one with

status = \neg D

and the other with

status = D

. More specifically, for every possible action

a \in VA ({st}_{k}^{l})

, the tree contains two edges that connect

({st}_{k}^{l}, \cdot, \neg D, P (A_{k}^{l}), cost (A_{k}^{l}))

({st}_{k^{'}}^{l + 1}, a, D, P (A_{k^{'}}^{l + 1}), cost (A_{k^{'}}^{l + 1}))

and to

({st}_{k^{'}}^{l + 1}, a, \neg D, P (A_{k^{'}}^{l + 1}), cost (A_{k^{'}}^{l + 1}))

, respectively, with

A_{k^{'}}^{l + 1} = A_{k}^{l} \cup {a}

. If any action in

A_{k}^{l}

is detected, i.e. a vertex has

status = D

, then the branch of

T

corresponding to that action does not further grow below the vertex.

Suppose $Q (a)$ represents the probability of the belief held by the attacker that action a will be detected by the defender. This probability can, for instance, be derived from the exploitability score and remediation levels defined in the National Vulnerability Database. We recursively define $P (A_{k^{'}}^{l + 1})$ as follows: $\begin{matrix} P (A_{k^{'}}^{l + 1}) = \{\begin{matrix} 0, & if l = 0, \\ P (A_{k}^{l}) \oplus Q (a_{k}^{l + 1}), & otherwise . \end{matrix} \end{matrix}$ The first condition says that the probability of detecting an action at level 0 in the tree is 0 (as nothing has been done). The second condition says that the probability of detecting an action at level $(l + 1)$ in the ABET is the logical OR of the current action ( $a_{k}^{l + 1}$ ) being detected or a previous action being detected. Here, ⊕ is any triangular co-norm [9] which is a well-known method to merge the probabilities of two events together in order to get the probability of the disjunction (or) of the two events.10

¹⁰

Many possible triangular co-norms exist in the literature. These include: maximum: $\max (x, y)$ ; Probabilistic sum: $x + y - x \cdot y$ ; Lukasiewicz co-norm: $\min (1, x + y)$ ; Einstein sum: $\frac{x + y}{1 + x \cdot y}$ . Though all results in this paper apply to all t-conorms, we will use $\min (1, x + y)$ in all examples in the paper.

Figure 3 shows the ABET for our running example. In the rest of this paper, when clear from the context, we write $exploit (n, s, v)$ as $exploit (n, v)$ or $exploit (v)$ . Observe that each path from the root (i) corresponds to an attacker strategy and (ii) ends when either at least one of the actions that have been taken till that time has been detected or when the attacker has expended her budget $\hat{c}$ . For example the root $(S_{0}, ⊥, \neg D, 0, 0)$ represents the fact that the probability of being detected and the cost are all zeros because the attacker did nothing. After a scan on $n_{3}$ , there are two possible cases: the scan action is detected, denoted represented by the node $({st}_{0}^{1}, scan (n_{3}), D, P (A_{0}^{1}), cost (A_{0}^{1}))$ , or is not detected, represented by the node $({st}_{0}^{1}, scan (n_{3}), \neg D, P (A_{0}^{1}), cost (A_{0}^{1}))$ . In the first case there is no further action, because detection implies removal from the network.

Fig. 3.

A schematic diagram of the attacker belief evolution tree for our running example. Green (blue) vertices correspond to states where the action sequence do (not) contain a detected action.

4.2. Action selection strategy

Given an attacker state ${st}_{k}^{l}$ , we assume that the probability of selecting action $a \in VA ({st}_{k}^{l})$ by the attacker is $\begin{matrix} P (a | {st}_{k}^{l}) = \frac{util (a)}{\sum_{a^{'} \in VA ({st}_{k}^{l})} util (a^{'})} \end{matrix}$ where $util (a)$ denotes the “utility” of a.

It should be observed that the denominator of this expression looks at the set of all actions that are valid for the attacker to perform in the given state, so the attacker’s probability of taking action a depends upon what actions are feasible in that state and their utility.

Our proposed model allows the security officer to assume different expected attacker behaviors through different definitions of function $util$ . For instance, we could look at the Common Vulnerability Scoring System (CVSS) of the National Vulnerability Database that associates, with each vulnerability, a quantitative impact and exploitability score, each of which is on a 0–10 scale. For instance, if we look at the sample vulnerability with CVE number CVE-2014-0160, it has a CVSS impact score of 2.9 and a CVSS exploitability score of 10. These scores can be easily converted to a utility in many ways. For instance, we might set $util (a)$ to something that is monotonic in both impact and exploitability, e.g. $impact (a) * exploitability (a)$ or $impact (a) + exploitability (a)$ . Alternatively, by just taking $util (a) = impact (a)$ , we can model an attacker who is just driven by the potential impact of her actions. On the other hand, $\begin{matrix} util (a) = \frac{1}{P (A_{k}^{l} \cup {a})} \end{matrix}$ models the case where the attacker seeks to minimize the likelihood of discovery. In the experiments reported in Section 7 we show the results we obtained using the latter two definitions.

Example 3.
Suppose each scan action in our running example has utility 0.3, and each exploit action has utility $k / 10$ . Moreover, suppose the detection probability of each scan action is 0.4, while the detection probability for each exploit action $exploit (v_{k})$ is $(4 - k) / 10$ . If the attacker uses the utility function based on the impact, then at node $n_{1}$ she randomly chooses a scan action in ${scan (n_{2}), scan (n_{h}), scan (n_{3})}$ , since all have the same utility. Assume that she chooses $scan (n_{3})$ and moves to $n_{3}$ . Let s be the attacker state at that point. The probabilities are $P (scan (n_{2}) | s) = P (scan (n_{h}) | s) = 0.3 / 1.5 = 0.2$ , $P (exploit (v_{2}) | s) = 0.2 / 1.5 = 0.13$ , $P (exploit (v_{3}) | s) = 0.3 / 1.5 = 0.2$ , $P (exploit (v_{4}) | s) = 0.4 / 1.5 = 0.27$ . But if the attacker uses the utility function based on the detection probability, then after reaching $n_{3}$ at time $t = 1$ , the overall probability of being detected is $P (A = {scan (n_{3})}) = Q (scan (n_{3})) = 0.4$ . The attacker now has five possible actions, with the following detection probabilities: $P ({scan (n_{3}), scan (n_{2})}) = \min (1, (0.4 + 0.4)) = 0.8$ , $P ({scan (n_{3}), scan (n_{h})}) = \min (1, (0.4 + 0.4)) = 0.8$ , $P ({scan (n_{3}), exploit (v_{2})}) = \min (1, (0.4 + 0.2)) = 0.6$ , $P ({scan (n_{3}), exploit (v_{3})}) = \min (1, (0.4 + 0.1)) = 0.5$ , and $P ({scan (n_{3}), exploit (v_{4})}) = \min (1, (0.4 + 0)) = 0.4$ . So the probability of choosing the various actions are $P (scan (n_{2}) | s) = P (scan (n_{h}) | s) = \frac{(1 / 0.8)}{(2 / 0.8) + (1 / 0.6) + (1 / 0.5) + (1 / 0.4)} = 0.144$ , $P (exploit (v_{2}) | s) = 0.192$ , $P (exploit (v_{3}) | s) = 0.231$ , and $P (exploit (v_{4}) | s) = 0.285$ .

Modeling Subrational Attackers. Note that by appropriately defining function $util$ , we can model subrational attackers that do not behave in a fully optimal way, i.e. they do not carry out attacks that maximize their utility. In the real world, this may happen for many reasons (see, e.g., the work by Tom Schelling [40]). For instance, attackers may not exactly know what the objective function is and hence, even if they act in broad agreement with it, they would not necessarily optimize. Attackers may also not have the technical capabilities needed to act optimally. We experimentally show in Section 7 that our defensive mechanisms are robust with respect to subrational attackers as well.
5. Analysis of the defender

In the real world, the defender can do many different things in order to protect her enterprise from attack. In this paper, we consider the case when the defender can perform two actions:

Adding a honeypot node to the enterprise network, that in turn changes the SVDG. We make the simplistic assumption that any transaction that targets a honeypot node is by an attacker who will therefore be detected. This is a common assumption. For instance, there is little reason for a legitimate user to access a node called all_passwords or something like that [1,7].

Adding traditional defensive software to an existing node, which does not change the SVDG. In this paper, we use this general action as a proxy for many more specific actions such as adding multi-factor authentication, firewalls, intrusion detection systems, and more.

We denote the set of honeypot nodes as $H$ . $AH$ denotes the set of all possible honeypot node setups: a pair $(h, n) \in AH$ represents that honeypot node $h \in H$ replicates the connections of node $n \in N_{T}$ . Moreover, we assume that traditional defensive software ensures that any attacker action on a node protected by such software will be detected.

Definition 8 (Defensive Strategy).

A defensive strategy δ is a pair $(A H, A T)$ where $A H \subseteq AH$ and $A T \subseteq N_{T}$ . $A H$ (resp. $A T$ ) means deploying a honeypot (resp. installing a defensive software on a node).

We assume the existence of a cost function for the defender, denoted ${cost}_{d}$ , that assigns a cost to each element of $AH \cup N_{T}$ . For instance, the cost for a particular defense could be the cost of that software (plus estimated labor costs involved in setting it up), and likewise, the labor cost involved in developing and deploying a honeypot node. Several techniques to estimate such costs have been proposed in the past (see, e.g., [26,47]) – however, the different attempts lead to different estimations and a standardized method is still a difficult open problem. Our idea is that we can use many estimation approaches at the same time and handle them by simply taking the worst-case scenario. This preserves all our results and does not impact the scalability of our approach.

Definition 9 (Valid Defensive Strategy).

Given a defender budget ${\hat{c}}_{d}$ , a defensive strategy $δ = (A H, A T)$ is valid if ${cost}_{d} (δ) = \sum_{x \in A H \cup A T} {cost}_{d} (x) ⩽ {\hat{c}}_{d}$ .

The defender’s belief w.r.t. the ABET is designed similarly to the ABET itself. However, the defender is usually unaware of (i) the transition probability from a node to another, (ii) the total budget of the attacker $\hat{c}$ , and (iii) the attacker’s belief $Q (a)$ of being detected for a particular action a.

Therefore, the defender assumes an infinite budget for the attacker and tries to enumerate all possible actions the attacker can take. This is a worst case situation commonly considered in cybersecurity where it is assumed that the attacker will wreak maximal havoc. In addition, the defender knows the probability that an attacker is detected at a node after a sequence of actions, as this probability depends on the difficulty of exploiting the vulnerability and on the defender’s skills.

Definition 10 (Defender’s Belief on Attacker Belief Evolution Tree (DABET)).

The defender’s belief w.r.t. an attacker belief evolution tree $T$ is a tree $T_{D} = (TV, TE)$ built from $T$ where:

Each vertex of $TV$ is a tuple $({st}_{k}^{l}, a_{k}^{l}, status)$ where ${st}_{k}^{l}$ is the attacker’s state, $a_{k}^{l}$ is the attacker action that leads to the vertex (both are the same as the ones in the $T$ ), and $status \in {D, \neg D}$ is the actual detection status.

Every edge $({tv}_{k}^{l}, {tv}_{k^{'}}^{l + 1})$ is labeled with the action that leads from ${tv}_{k}^{l}$ to ${tv}_{k^{'}}^{l + 1}$ .

The transition probability from ${tv}_{k}^{l} = ({st}_{k}^{l}, a_{k}^{l}, status)$ to ${tv}_{k^{'}}^{l + 1} = ({st}_{k^{'}}^{l + 1}, a_{k^{'}}^{l + 1}, {status}^{'})$ , denoted $P_{d} ({tv}_{k}^{l} \to {tv}_{k^{'}}^{l + 1})$ , is defined as follows: $\begin{matrix} P_{d} ({tv}_{k}^{l} \to {tv}_{k^{'}}^{l + 1}) = \{\begin{matrix} P_{d} (a_{k^{'}}^{l + 1} | {st}_{k}^{l}) \cdot (1 - \prod_{a \in A_{k}^{l}} (1 - Q_{d} (a))), & if {status}^{'} = D; \\ P_{d} (a_{k^{'}}^{l + 1} | {st}_{k}^{l}) \cdot \prod_{a \in A_{k}^{l}} (1 - Q_{d} (a)), & otherwise, \end{matrix} \end{matrix}$ where $P_{d} (a | st)$ is the defender’s belief about the attacker action selection probability, $Q_{d} (a)$ is the detection probability of the defender, and $A_{k}^{l}$ is the sequence of actions that lead to ${st}_{k}^{l}$ .

When needed, the DABET for a defensive strategy δ is denoted as $T_{D} (δ)$ . Figure 4 shows a partial DABET for our running example.

Modeling Unskilled Defenders. It should be observed that by appropriately defining the function $Q_{d}$ , the model allows taking into account defenders that do not behave in a fully optimal way, i.e. the cases where the defender’s skills and experience have an actual impact on the detection process. We experimentally show in Section 7 that our framework is also capable of effectively supporting defenders with relatively limited skills.

Fig. 4.

A schematic diagram of the DABET for our running example. Green (blue) vertices correspond to states where the action sequence do (not) contain a detected action.

5.1. The defensive strategy problem

The defender constructs a DABET in order to estimate the expected damage. This quantity, which the defender wants to minimize given a certain budget, depends on the actions performed by the attacker, and these are in turn limited to the kind of network that the defender creates by applying the defender strategy.

Definition 11 (Expected damage).

Given a DABET $T_{D}$ , the expected damage $Exp D (T_{D})$ is defined as $\begin{matrix} Exp D (T_{D}) = \sum_{path p in T_{D}} P_{d} (p) \cdot damage (p) \end{matrix}$ where:

we only consider paths starting from the root vertex in $T_{D}$ ;

$damage (tv)$ , where $tv = (\cdot, a_{k}^{l}, \cdot)$ , is the damage yielded by action $a_{k}^{l}$ (that can be defined by considering the impact from NVD or the value of data leaked though the action);

if $p = {{tv}_{0}, \dots, {tv}_{z}}$ , then $P_{d} (p) = \prod_{i \in [0, z - 1]} P_{d} ({tv}_{i} \to {tv}_{i + 1})$ and $damage (p) = \sum_{i \in [0, z]} damage ({tv}_{i})$ .

Example 4.
Consider the partial ABET for our running example shown in Fig. 5 and assume that (i) the damage of a scan is 0 and that of an exploit is 10, and (ii) $Q_{d} (exploit (v_{4})) = 0.4$ , $Q_{d} (scan (n_{2})) = 0$ (note that because of the latter, we omitted the vertex at $l = 1$ with $status = D$ ). There are two possible paths, $p_{1} = {{tv}_{1}, {tv}_{2}, {tv}_{3}}$ and $p_{2} = {{tv}_{1}, {tv}_{2}, {tv}_{4}}$ . We have that $P_{d} (p_{1}) = P_{d} ({tv}_{1} \to {tv}_{2}) \cdot P_{d} ({tv}_{2} \to {tv}_{3}) = 0.4$ and $P_{d} (p_{2}) = P_{d} ({tv}_{1} \to {tv}_{2}) \cdot P_{d} ({tv}_{2} \to {tv}_{4}) = 0.6$ . Therefore, $Exp D (T_{D}) = P_{d} (p_{1}) \cdot damage (p_{1}) + P_{d} (p_{2}) \cdot damage (p_{2}) = 0.4 \cdot 0 + 0.6 \cdot 10 = 6$ .
Fig. 5.
A fragment of the enterprise network of our running example (a) and its corresponding DABET (b).

Definition 12 (Undamaged Asset Value).

Suppose the total asset value of an enterprise is $T_{a}$ and δ is a defensive strategy. The undamaged asset value provided by δ is: $\begin{matrix} U_{a} (δ) = T_{a} - Exp D (T_{D} (δ)) . \end{matrix}$

It should be observed that evaluating $T_{a}$ requires the defender to estimate the value of her enterprise’s data and/or the value of “taking down” part of the enterprise network. The estimation of costs/values in a standardized manner is an important research problem by itself and is outside the scope of this paper. There are many works in the literature that have found ways to define asset criticality [8,14,21,22]. Another method, used by the US Department of Defense [24] defines the criticality of an asset in the DoD’s enterprise network by using the hierarchy of relations present in the DoD.

$U_{a} (δ)$ is obviously an expected value that the defender wishes to maximize.

Definition 13 (Defensive Strategy Problem).

Given a defender budget ${\hat{c}}_{d}$ , find the valid defensive strategy δ that maximizes $U_{a} (δ)$ .

Theorem 1.
The defensive strategy problem is in EXPTIME and NP -hard.
Proof.
The problem is in EXPTIME because all the defensive strategies can be enumerated in exponential time and, for each strategy δ, it suffices to verify whether it satisfies the constraints and then compute $U_{a} (δ)$ . We prove NP-hardness by reducing the NP-hard Knapsack problem [15] to the decision problem “Given a defender budget ${\hat{c}}_{d}$ and a threshold $u_{a}$ , does there exist a valid defensive strategy δ such that $U_{a} (δ) ⩾ u_{a}$ ?”. Given two real numbers C and V, a set of objects $O = {o_{1}, \dots, o_{p}}$ , and for each object $o_{i}$ , a cost $c_{i}$ and a value $v_{i}$ , the Knapsack problem consists in checking whether there exists a subset $O^{'} \subseteq O$ such that $\sum_{o_{i} \in O^{'}} v_{i} ⩾ V$ and $\sum_{o_{i} \in O^{'}} c_{i} ⩽ C$ . In our reduction we assume that $AH = \emptyset$ and that the network contains exactly p nodes that are all disconnected and potentially accessible to the attacker. Each node $n_{i}$ is associated to object $o_{i}$ . The damage associated with each node $n_{i}$ is equal to $v_{i}$ and the cost to protect $n_{i}$ is equal to $c_{i}$ . Moreover, we set ${\hat{c}}_{d} = C$ and $u_{a} = V$ . In addition, we assume that the utilities for the attacker are all the same, $\hat{c} = \infty$ , and there is no possibility that the attacker is detected. According to these assumptions, the paths of the attacker are uniformly distributed, so the resulting undamaged asset value is the sum of the damages associated with all the protected nodes. This is exactly equivalent to the constraint $\sum_{o_{i} \in O^{'}} v_{i} ⩾ V$ of the Knapsack problem. Moreover, the cost constraint for valid defender strategies is equivalent to the constraint $\sum_{o_{i} \in O^{'}} c_{i} ⩽ C$ of the Knapsack problem. □

Because a DABET can be huge, we introduce the notion of a compacted DABET (CDABET) $T_{C D}$ that is a tree built from a DABET $T_{D}$ where all the descendants of each vertex $tv$ related to an action over a honeypot node are merged together along with $tv$ without losing information in terms of expected damage – in other words, the DABET ensures that $Exp D (T_{D}) = Exp D (T_{C D})$ . More formally, a compaction operation merges a set M of vertices with one of the vertices in ${{tv}_{p} | \forall {tv}_{M} \in M, {tv}_{p} is an ancestor of {tv}_{m}}$ if the latter vertex is related to an action over a honeypot node.
Definition 14 (Compacted DABET).

Given a DABET $T_{D}$ , the corresponding compacted DABET is the maximally compacted tree $T_{C D}$ that can be built from $T_{D}$ .

We use $T_{C D} (δ)$ to denote the CDABET of a defensive strategy δ. In the partial DABET from our running example shown in Fig. 6 we can merge ${tv}_{5}$ , ${tv}_{8}$ , ${tv}_{7}$ , ${tv}_{9}$ , and ${tv}_{10}$ into one leaf vertex because there is no damage after ${tv}_{5}$ .

Fig. 6.

An example of DABET compaction.

5.2. Properties

In this section we show some properties that our objective function and the set of valid defensive strategies enjoy. We first show that the $U_{a}$ function increases monotonically with the size of our defensive strategy δ.

Proposition 1.
$U_{a} (δ)$ is monotonic with respect to δ.

To see why the above result is true, assume that $δ_{1} = (A H_{1}, A T_{1})$ and $δ_{2} = (A H_{2}, A T_{2})$ are two defensive strategies with $δ_{1} \subseteq δ_{2}$ (i.e. $A H_{1} \subseteq A H_{2}$ and $A T_{1} \subseteq A T_{2}$ ). This means there are more nodes protected in $T_{C D} (δ_{2})$ than $T_{C D} (δ_{1})$ , so $U_{a} (δ_{1}) ⩽ U_{a} (δ_{2})$ . Observe also that, given two strategies $δ_{1}$ and $δ_{2}$ with $δ_{1} \subset δ_{2}$ , $T_{C D} (δ_{2})$ has more compacted vertices than $T_{C D} (δ_{1})$ , whereas the set of non-compacted vertices are exactly same.

The next two results provide important submodularity properties both when traditional defensive software is added and when honeypot nodes are added to a defensive strategy.
Theorem 2.
$U_{a} (δ)$ is submodular with respect to adding defensive softwares to δ.
Proof.
Assume that $δ_{1} = (A H, A T_{1})$ and $δ_{2} = (A H, A T_{2})$ are two defensive strategies with $δ_{1} \subseteq δ_{2}$ , and let $δ_{i}^{'} = (A H, A T_{i} \cup {n})$ with $n \in N_{T} ∖ A T_{2}$ . In order to be submodular, the undamaged asset value must satisfy the inequality $T_{a} - Exp D (T_{C D} (δ_{1}^{'})) - T_{a} + Exp D (T_{C D} (δ_{1})) ⩾ T_{a} - Exp D (T_{C D} (δ_{2}^{'})) - T_{a} + Exp D (T_{C D} (δ_{2}))$ , or equivalently, $Exp D (T_{C D} (δ_{1})) - Exp D (T_{C D} (δ_{1}^{'})) ⩾ Exp D (T_{C D} (δ_{2})) - Exp D (T_{C D} (δ_{2}^{'}))$ . Let $tv$ be the vertex with $status = \neg D$ whose action is related to node n (observe that vertices with $status = D$ and compacted vertices can be ignored because there is no associated damage). After installing a defensive software on n, all attacks that reach $tv$ will be detected. Now let $P (δ, tv)$ be the probability that the attacker will reach $tv$ without being detected under δ. Since $δ_{1} \subseteq δ_{2}$ , $P (δ_{1}, tv) ⩾ P (δ_{2}, tv)$ , i.e. more attack attempts will reach $tv$ under $δ_{1}$ than under $δ_{2}$ – for the same reason we have $P (δ_{1}^{'}, tv) ⩾ P (δ_{2}^{'}, tv)$ as well. The statement follows. □
Theorem 3.
$U_{a} (δ)$ is submodular with respect to adding honeypot nodes to δ.
Proof.
Assume that $δ_{1} = (A H_{1}, A T)$ and $δ_{2} = (A H_{2}, A T)$ are two defensive strategies with $δ_{1} \subseteq δ_{2}$ , and let $δ_{i}^{'} = (A H_{i} \cup {h}, A T)$ with $h \in AH ∖ A H_{2}$ . Let $tv = ({st}_{k}^{l}, \cdot, \neg D)$ be the vertex in the CDABET whose $VA ({st}_{k}^{l})$ becomes larger after adding h (i.e. the attacker has more options for her next action). As in the proof of Theorem 2, it holds that $Exp D (T_{C D} (δ_{1})) - Exp D (T_{C D} (δ_{1}^{'})) ⩾ Exp D (T_{C D} (δ_{2})) - Exp D (T_{C D} (δ_{2}^{'}))$ . Now let $P (δ, tv)$ be the probability that the attacker will reach $tv$ without being detected under δ. Again, we have that $P (δ_{1}, tv) ⩾ P (δ_{2}, tv)$ , i.e. more attack attempts will reach $tv$ under $δ_{1}$ than under $δ_{2}$ . Since $VA ({st}_{k}^{l})$ contains more actions after adding h and $δ_{2}^{'}$ subsumes $δ_{1}^{'}$ , if $a_{h}$ is an action related to honeypot node h, we can derive that $P (a_{h} | {st}_{k}^{l})$ is higher under $δ_{1}$ than under $δ_{2}$ . This implies that more attacks will reach $tv$ and be captured in h after adding it to $δ_{1}$ . □

The following result shows that, under the assumption of uniform costs, the set of all valid defensive strategies form a matroid.
Proposition 2.
Assume that, for a fixed cost c, it is the case that $\forall x \in AH \cup N_{T}$ , ${cost}_{d} (x) = c$ . Then, the pair $(AH \cup N_{T}, I)$ , where $I$ is the set of all subsets of $AH \cup N_{T}$ that correspond to valid defensive strategies, is a matroid.

To see why the above result is true, observe that (i) deploying nothing is a valid defensive strategy, thus $\emptyset \in I$ ; (ii) a subset of a valid defensive strategy is valid as well, so for each $A^{'} \subset A \subset (AH \cup N_{T})$ , if $A \in I$ , then $A^{'} \in I$ ; (iii) if $A \in I$ , $A^{'} \in I$ , and $| A | > | A^{'} |$ , then there exists an element $a \in A$ such that $A^{'} \cup {a}$ is also a valid defensive strategy.

The submodularity results in this section also allow us to use a famous result [33] that asserts that a simple greedy algorithm can be used to optimize a submodular function on a matroid with an approximation guarantee. More details will be provided in the next section.

As a final remark, we recall that the DABET depends on the defender strategy, and in the defender simulation involving the DABET the attacker is doing her best to maximize her utility. The attacker revisits her actions in the light of the considered defenses, but cannot reason on her tree to find the best path because we assume that she does not know the structure of the network from the beginning. In order to know the structure of the network, the attacker needs to explore it, and the goal of the defender is to make this exploration (and the following exploitation) as difficult as possible. In this context, understanding that a node is a honeypot does not allow the attacker to recognize other honeypots, so it is difficult to model the benefit of this for the attacker. A possible extension would be that of considering the possibility that the attacker runs test to verify whether a host is real or not – then, if the test is successful, the attacker will not consider any of the other vulnerabilities on that host. Of course, this kind of test will have a cost for the attacker. Another simple procedure (that is a worst case scenario for the defender) is that the attacker, after exploiting a certain number of vulnerabilities on a fake host, “automatically” recognizes that the host is fake.
6. Finding good defensive strategies

In this section, we present two methods. The first, H_Exact, finds an optimal solution to the defensive strategy problem. However, finding an optimal defensive strategy is NP-hard and hence, H_Exact is inefficient. The second algorithm applies a greedy approach which finds suboptimal solutions in an efficient way.

6.1. Exact algorithm

We propose the H_Exact algorithm that uses a branch-and-bound approach to compute the optimal solution. The pseudo-code is shown in Algorithm 1. The algorithm uses the notion of sole contribution to undamaged asset value provided by strategy δ, denoted $scu (δ)$ , which is defined as $scu (δ) = U_{a} (δ) - U_{a} (\emptyset)$ .11

¹¹
We sometimes abuse notation and write $δ = \emptyset$ in place of $δ = (\emptyset, \emptyset)$ .

Moreover,

U

denotes the set

AH \cup N_{T}

and

ext (δ = (A H, A T))

denotes the set

U ∖ (A H \cup A T)

Algorithm 1

Branch-and-bound algorithm

Initially, all defender actions are sorted in descending order of $scu$ values – in the hope that the algorithm will quickly converge. Lines 13 and 14 in the branch-and-bound step (procedure BBS) are the branching steps, corresponding to the choices of skipping the $(i + 1)$ th action (line 13) or including it (line 14). For the bounding part, on line 7 we compare the current best undamaged asset value $best$ with an upper bound on $U_{a} (δ)$ , denoted $UB (δ)$ . If this upper bound is not promising, we prune δ immediately. The bound is computed as follows: $\begin{matrix} UB (δ) = U_{a} (\emptyset) + \sum_{x \in δ} scu (x) + \frac{{\hat{c}}_{d} - {cost}_{d} (δ)}{min {{cost}_{d} (x) | x \in ext (δ)}} \cdot max {scu (x) | x \in ext (δ)} . \end{matrix}$

In the above formula, $\frac{{\hat{c}}_{d} - {cost}_{d} (δ)}{min {{cost}_{d} (x) | x \in ext (δ)}}$ is the maximum number of defender actions that can still be added to δ, whereas $max {scu (x) | x \in ext (δ)}$ is the best $scu$ among all remaining actions.

Lemma 1.

Given two valid defensive strategies $δ_{1}$ and $δ_{2}$ , it holds that $scu (δ_{1} \cup δ_{2}) ⩽ scu (δ_{1}) + scu (δ_{2})$ .

Proof.

Since $U_{a}$ is submodular, it also holds that $U_{a} (δ_{1}) + U_{a} (δ_{2}) ⩾ U_{a} (δ_{1} \cup δ_{2}) + U_{a} (δ_{1} \cap δ_{2})$ . By subtracting $U_{a} (\emptyset) ⩾ 0$ from each element of the above equation, we obtain $U_{a} (δ_{1}) - U_{a} (\emptyset) + U_{a} (δ_{2}) - U_{a} (\emptyset) ⩾ U_{a} (δ_{1} \cup δ_{2}) - U_{a} (\emptyset) + U_{a} (δ_{1} \cap δ_{2}) - U_{a} (\emptyset)$ , that is, $scu (δ_{1}) + scu (δ_{2}) ⩾ scu (δ_{1} \cup δ_{2}) + scu (δ_{1} \cap δ_{2})$ . Since $scu (δ_{1} \cap δ_{2}) ⩾ 0$ , the statement follows. □

Theorem 4.

The H_Exact algorithm is correct.

Proof.

Correctness of the branching part is guaranteed by the fact that, for each defender action in $U$ , we consider both skipping (line 13) and including it (line 14). This ensures that the algorithm will not miss any possible solution. To show that the bounding part is correct as well, we prove that $UB (δ)$ returns a correct upper bound on $U_{a} (δ)$ . From Lemma 1 we know that $scu (δ) ⩽ \sum_{x \in δ} scu (x)$ , or equivalently, $U_{a} (δ) - U_{a} (\emptyset) ⩽ \sum_{x \in δ} scu (x)$ . Thus, we obtain $U_{a} (δ) ⩽ U_{a} (\emptyset) + \sum_{x \in δ} scu (x)$ , and as a consequence $U_{a} (δ) ⩽ UB (δ)$ . □

Given the set $U$ of all possible defender actions, the theoretical worst-case time complexity of the H_Exact algorithm is exponential in the cardinality of $U$ – this is because the algorithm, in the worst case, explores a complete binary search tree of depth $| U |$ . This is expected, given our result about the NP-hardness of computing the defender’s strategy. In practice, H_Exact is an opportunistic algorithm whose practical runtime cannot be described via a simple closed formula – its runtime actually depends on how many branches are pruned in the bounding part.

6.2. Greedy (inexact) algorithm

Our H_Greedy algorithm leverages our submodularity theorem to compute a suboptimal solution using a greedy hill-climbing approach, i.e. we start with the empty set and, at each iteration, we choose the defender action that provides the maximum gain in $scu$ value. The pseudo-code is shown in Algorithm 2.

Algorithm 2

Greedy algorithm

Proposition 3.

The H_Greedy algorithm produces a valid defensive strategy.

It should also be observed that the worst-case time complexity of the H_Greedy algorithm is quadratic in the cardinality of $U$ . In fact, the algorithm executes the while loop $| U |$ times in the worst case and, at each iteration, it needs to find the best local choice – that means it verifies $| U |$ possibilities.

In addition, as discussed in Section 5.2, in the cases where the assumption about uniform costs holds, we can immediately apply a result in [33] to ensure that the hill-climbing approach used by H_Greedy provides the guarantee stated below.

Proposition 4.

Assume that, for a fixed cost c, it holds that $\forall x \in AH \cup N_{T}$ , ${cost}_{d} (x) = c$ . Then, the H_Greedy algorithm approximates the optimum to within a factor of $(1 - \frac{1}{e})$ .

The experimental results described in Section 7 will validate our assertion that the H_Greedy algorithm is able to quickly compute high-quality solutions in general.

Table 3

Networks used in the experiments

Network	$\| N \|$	$\| E \|$	$\| S V \|$	$\| E V \|$
Cauldron	39	157	198	2,636
NS-2 (medium)	100	177	1,270	30,759
Synthetic (medium)	118	2,343	624	11,754
Synthetic (large)	217	3,943	1,185	23,269
NS-2 (very large)	600	1,228	6,291	131,442

Table 4

Parameter values used in the experiments

Parameter	Value(s)
$impact (v) \in [0, 10]$	Extracted directly from NVD
$Q (exploit (n, s, v))$	Ranges from $0 %$ to $0.1 %$ of $diff (v)$
$Q (scan (n))$	Fixed to $0.05 %$
$util (a)$ (option U1)	$impact (a)$
$util (a)$ (option U2)	$\frac{1}{P (A_{k}^{l} \cup {a})}$
$\hat{c}$	Fixed to ∞
${cost}_{d} (x)$	Ranges from 0.5 to 1 if $x \in AH$
	and from 0.05 to 0.1 if $x \in N_{T}$
$Q_{d} (exploit (n, s, v))$	Ranges from 0% to 0.1% of $diff (v)$
$Q_{d} (scan (n))$	Fixed to $0.05 %$
$P_{d} (a \| st)$	Defined as $P (a \| st)$ , but using $Q_{d} (a)$
	and the U1 and U2 options for $util (a)$
$damage (exploit (n, s, v))$	$impact (v)$
$damage (scan (n))$	Fixed to 0
${\hat{c}}_{d}$	Ranges from 1 to 9

It should be observed that the use of submodularity in order to guarantee the approximation ratio of Proposition 4 is well known and other works do the same in many different domains, ranging from water distribution network design [25] to vaccine distribution [48]. However, we do not fully rely on submodularity in our proposed algorithms. In fact, (i) the proposed H_Exact algorithm is able to shortly provide an exact optimal solution, (ii) its H_Exact95 variant, which will be introduced in the experimental assessment, proves useful in providing good quality results with even faster execution times, and (iii) the H_Greedy algorithm works very well even with non-uniform costs.

Fig. 7.

Results on the Cauldron dataset.

Fig. 8.

Results on the NS-2 (medium) dataset.

As a final remark, we point out that the actual short/long term impact of defensive strategies, their cost, and the need to re-compute them periodically is a very importan issue, given the extremely rapid growth of the number of vulnerabilities and patches (see, e.g., the NVD data about the number of vulnerabilities12

¹²

https://nvd.nist.gov/general/visualizations/vulnerability-visualizations/cvss-severity-distribution-over-time

). We move in a really dynamic scenario, and the defensive strategies must change with it. The ideal solution would be that of patching everything (most of the network penetration attacks are due to unpatched known vulnerabilities). Unfortunately, even the most skilled security team struggles to keep up with this huge dynamism. The costs are often mainly connected to the patching operations – configuration of honeypots is generally a less relevant cost. Consequently, we believe that, in principle, the re-computation of strategies should be done each time that there are resources available and new vulnerability/patches show up – as the experiments will show, our proposed approximation procedure is very efficient and highly scalable. In addition, it often happens that either a vulnerability exists but is not reported in NVD yet, or the patch is not available yet – for such cases, we demand the protection of our network to our deception principle, by adding temporary honeypots. An interesting and promising future work is based on the idea of using the historical evolution of the vulnerabilities and patches reported in NVD (by using, e.g., temporal data mining or sequence prediction techniques) to try to come up with a prediction their future evolution. This could provide our system with a form of “forecasting” capabilities, and hopefully save some time to a better allocation of the available resources.

7. Experimental assessment

In this section we describe the results of experiments we performed to evaluate the performance of H_Exact and H_Greedy in terms of both processing time and quality of the defensive strategies returned. All experiments were performed on a cluster of 64 Xeon 2.4 GHz machines with 24 GB RAM running RedHat Linux.

Networks. We used 5 different enterprise networks, whose properties are shown in Table 3. “Cauldron” is a real network with the vulnerability information associated with each node [19]. The “NS-2” networks were generated by one of the most widely-used open-source network simulators [18]. In order to include vulnerability information in these networks, we randomly picked vulnerabilities from the Cauldron network and matched them onto the NS-2 networks in a one-to-many fashion. Finally, the “Synthetic” networks were generated by extracting and replicating subnetworks from the Cauldron network.

Parameters. Table 4 shows how we chose parameter values. In the table, $diff (v) \in [0, 10]$ represents the difficulty of exploiting vulnerability v – this quantity was extracted directly from the National Vulnerability Database. When defining $Q (exploit (n, s, v))$ , we took into account the fact that in real world scenarios, an attacker’s action is not easy to detect without honeypot nodes or defensive software. So we expect $Q (exploit (n, s, v))$ to be very low. For ${cost}_{d} (x)$ , we assumed that the cost of deploying a honeypot node is much higher than installing a defensive software as the former requires developing a credible honeypot.

Fig. 9.

Results on the synthetic (medium) dataset.

Fig. 10.

Results on the synthetic (large) dataset.

Algorithms. In addition to the H_Exact and H_Greedy algorithms, we experimented with a variant of the H_Exact algorithm named H_Exact95 that underestimates upper bounds by 5% (i.e. we replaced $UB (δ)$ with $0.95 \cdot UB (δ)$ in line 7 of H_Exact). We introduced this variant in order to verify whether by relaxing the optimality property ensured by H_Exact, but still using a branch-and-bound approach, we can obtain significantly better processing times without losing much in terms of quality of the results. Moreover, we used Montecarlo simulation in order to reduce the overhead due to the computation of $U_{a} (δ)$ . By fixing the number of iterations to 500K (where at each iteration we choose the defender action according to $P (a | st)$ and simulate the detection according to $P (A)$ ), we were able to accurately estimate the actual values of $U_{a} (δ)$ .

We measured the run-times and quality obtained with all combinations of networks and algorithms using the U1 and U2 options for $util (a)$ and $P_{d} (a | st)$ described in Table 4. The results are reported in Figs 7–11.

Fig. 11.

Results on the NS-2 (very large) dataset.

Fig. 12.

Average performance improvement provided by the H_Greedy algorithm.

Fig. 13.

Average relative quality.

Fig. 14.

Percentage of damage on the synthetic (large) and NS-2 (very large) datasets.

Fig. 15.

Average relative value of $U_{a} (δ)$ .

Run-time. As the results show, H_Greedy is much faster than both H_Exact and H_Exact95. H_Exact timed out in 12 out of 18 cases13

¹³

Timeouts were fixed at 3 weeks.

on the synthetic (medium) network (Fig. 9), and in all cases on the synthetic (large) and NS-2 (very large) networks (Figs 10 and 11). Overall, the average performance improvements provided by H_Greedy (reported in Fig. 12) are extremely satisfying – they range from 35.8% to 84.2% with respect to H_Exact95 and from 37.6% to 99.2% with respect to H_Exact (even without counting the times H_Exact did not finish).

Accuracy. H_Greedy produces very high quality results in the vast majority of cases. The relative quality of its results, averaged over all defender budgets (which is reported in Fig. 13) ranges from 93.3% to 98.7% of optimal. As expected, H_Exact95 provides higher average quality because of its higher average processing times.

Absolute Damage. In addition to relative quality, we also analyzed the percentage of damage (w.r.t. the maximal possible damage) caused by the attacker when the defender uses a suboptimal method (Fig. 14). H_Exact95 shows lower damage than H_Greedy in most cases. One interesting observation is that attackers following option U2 create more damage than U1 in most cases (there are very few exceptions). For instance, U2 obtains higher damage than U1 on the NS-2 (very large) network after a fluctuation until ${\hat{c}}_{d} = 4$ . This is because attackers with U2 can survive longer than in the other case.

Subrational Attackers. Next, we assessed the sensitivity of our framework when the attacker shows a subrational approach by not always choosing higher utility vulnerabilities. We introduced an “attacker factor” ranging from $- 20 %$ to $+ 20 %$ in the definition of $util (a)$ (option U1). Intuitively, if the attacker factor is $+ 20 %$ , this means that the attacker overestimates the utility of her actions by $20 %$ . Figure 15(top) reports the results (averaged over the different datasets) obtained with ${\hat{c}}_{d} = 2$ . In particular, the figure shows the ratio between the actual value of $U_{a} (δ)$ obtained in each case and the best value obtained over all cases. Again, the results are extremely satisfying. The differences between the case where the factor is set to zero (i.e. fully rational behavior) and the other cases are very limited – this means that our framework is robust with respect to a variety of possible behaviors the attacker might show.

Unskilled Defenders. We also considered those cases where the defender’s skills and experience have an actual impact on the detection process. To this aim, we introduced a “defender factor” ranging from $- 20 %$ to $20 %$ in the definition of $Q_{d} (exploit (n, s, v))$ . Intuitively, if the defender factor is $+ 20 %$ , this means that the defender’s skills allow her to detect the action with a probability that is $20 %$ higher than the one defined in Table 4. The results with option U2 and ${\hat{c}}_{d} = 2$ are reported in Fig. 15 (bottom). The differences between the $+ 20 %$ case (maximum defender skills) and the other cases are very low – this tells us that our framework is also capable of effectively supporting defenders with relatively limited skills.

Honeypots vs. Traditional Defensive Software vs. Joint Optimization. Finally, we address two questions: (1) In which cases do honeypots or traditional defensive software outperform the other? (2) To which extent the proposed joint optimization outperforms an optimization based on just one of the two methods?

Table 5 shows the average degree of honeypots and nodes where TDS is installed under the optimal strategies (together with their neighbor’s average degree) with U2 and ${\hat{c}}_{d} = 4$ (other options provide similar results).

The results show that honeypots generally outperform traditional defenses in higher degree nodes. In fact, a honeypot can protect all its connected neighbors and thus, higher degree of honeypots can provide higher cost-efficiency. Moreover, honeypots’ neighbors have lower degrees than those of the nodes with TDS. This implies a higher influx of attackers through the links between a honeypot and its neighbors, as they are more likely to choose the honeypot as their next target. In general, these results clearly suggest possible preferred choices for both available options: honeypots should be located on nodes that have high degrees and whose neighbors have low degrees.

Regarding the second question, we start by observing that the proposed joint optimization scheme theoretically subsumes both single optimization schemes, so our framework cannot show worse performances. Table 6 shows the relative quality of optimal single optimization strategies w.r.t. optimal joint optimization ones with U2 and ${\hat{c}}_{d} = 4$ (other options provide similar results).

The results show that, in the majority of the cases analyzed, joint optimization yields significantly better results – on average, it improves quality by around 22%.

8. Conclusions

Though there has been much work on honeypots (how to design them and where to place them), there has been little or no work on (i) how to simultaneously place both honeypots and traditional cyber defensive software (e.g. intrusion detection systems, firewalls) (ii) in the presence of an rational or subrational adversary. In this paper, we present a game-theoretic framework that addresses both these problems. We first model an attacker who can scan network nodes and exploit vulnerabilities and show how her beliefs about the network can be represented via a novel Attacker Belief Evolution Tree (ABET). We then formally define a Defender ABET (DABET) for the defender to reason about the attacker’s beliefs/behavior and use this to show the defender’s optimal strategy and an NP-hardness result. We propose the H_Exact algorithm to compute the optimal defensive strategy which is expectedly exponential. We then introduce the polynomial-time H_Greedy algorithm. We show that H_Greedy provides much better run-time than H_Exact, with a performance improvement that is over 37% to 99% (even if we ignore the numerous cases when H_Exact does not finish) while generating solutions that are over 93% to 98% of optimal. We further show that our system is robust in the presence of a subrational adversary and/or a defender with limited skills.

Table 5

Average degree of honeypots and nodes with TDS (average degree of their neighbors in parentheses)

Network	Honeypots	Nodes with TDS
Cauldron	14.0 (17.0)	14.0 (17.0)
NS-2 (medium)	3.4 (4.3)	3.3 (4.6)
Synthetic (medium)	20.3 (18.6)	17.5 (19.2)
Synthetic (large)	17.7 (16.6)	17.7 (17.2)
NS-2 (very large)	5.6 (5.3)	4.5 (5.0)

Table 6

Relative quality of single optimization strategies

Network	Honeypot-only	TDS-only
Cauldron	92%	98%
NS-2 (medium)	51%	93%
Synthetic (medium)	64%	83%
Synthetic (large)	81%	87%
NS-2 (very large)	79%	89%

Footnotes

Acknowledgments

This work was partially funded by the Army Research Office under grants W911NF-13-1-0421, W911NF-15-1-0576, and W911NF-14-1-0358, by the Office of Naval Research under grants N00014-15-1-2007, N00014-16-1-2896, and N00014-15-1-2742, by the National Science Foundation under grant IIP-1266147, and by the Ramanujan Faculty Fellowship.

References

F.H.

Abbasi,

R.J.

Harris,

Moretti,

Haider and

Anwar, Classification of malicious network streams using honeynets, in: GLOBECOM, 2012.

Ablon,

M.C.

Libicki and

A.A.

Golay, Markets for Cybercrime Tools and Stolen Data: Hackers’ Bazaar, RAND Corporation, 2014.

Aggarwal,

Maqbool,

Grover,

Pammi,

Singh and

Dutt, Cyber security: A game-theoretic analysis of defender and attacker strategies in defacing-website games, in: CyberSA, 2015.

E.S.

Al-Shaer and

H.H.

Hamed, Discovery of policy anomalies in distributed firewalls, in: INFOCOM, 2004.

Bercovitch,

Renford,

Hasson,

Shabtai,

Rokach and

Elovici, HoneyGen: An automated honeytokens generator, in: ISI, 2011.

Brewer, Advanced persistent threats: Minimising the damage, Network Security 2014(4) (2014), 5–9. doi:10.1016/S1353-4858(14)70040-6.

C.-M.

Chen,

S.-T.

Cheng and

R.-Y.

Zeng, A proactive approach to intrusion detection and malware collection, Security and Communication Networks 6(7) (2013), 844–853. doi:10.1002/sec.619.

K.H.

Chung and

Jo, The impact of security analysts’ monitoring and marketing roles on the market value of firms, Journal of Financial and Quantitative Analysis 31(4) (1996), 493–512. doi:10.2307/2331357.

Cintula,

Esteva,

Gispert,

Godo,

Montagna and

Noguera, Distinguished algebraic semantics for t-norm based fuzzy logics: Methods and algebraic equivalencies, Annals of Pure and Applied Logic 160(1) (2009), 53–81. doi:10.1016/j.apal.2009.01.012.

10.

Clark,

Sun,

Bushnell and

Poovendran, A game-theoretic approach to IP address randomization in decoy-based cyber defense, in: GameSec, 2015.

11.

W.R.

Claycomb, Detecting insider threats: Who is winning the game?, in: International Workshop on Managing Insider Security Threats, 2015.

12.

Dewri,

Poolsappasit,

Ray and

Whitley, Optimal security hardening using multi-objective optimization on attack tree models of networks, in: CCS, 2007.

13.

Dewri,

Ray,

Poolsappasit and

Whitley, Optimal security hardening on attack tree models of networks: A cost-benefit analysis, Int. J. of Information Security 11(3) (2012), 167–188. doi:10.1007/s10207-012-0160-y.

14.

Dhillon and

Torkzadeh, Value-focused assessment of information system security in organizations, Information Systems Journal 16(3) (2006), 293–314. doi:10.1111/j.1365-2575.2006.00219.x.

15.

M.R.

Garey and

D.S.

Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman & Co., New York, NY, USA, 1979.

16.

Grimes, Why patching is still a problem – and how to fix it, CSO Magazine (2016). http://www.csoonline.com/article/3025807/data-protection/why-patching-is-still-a-problem-and-how-to-fix-it.html.

17.

Han,

Marina,

Debbah and

Hjørungnes, Physical layer security game: How to date a girl with her boyfriend on the same table, in: GameNets, 2009.

18.

Issariyakul and

Hossain, Introduction to Network Simulator NS2, Springer Publishing Company, Incorporated, 2008.

19.

Jajodia,

Noel,

Kalapa,

Albanese and

Williams, Cauldron: Mission-centric cyber situational awareness with defense in depth, in: MILCOM, 2011.

20.

Jajodia,

Shakarian,

V.S.

Subrahmanian,

Swarup and

Wang (eds), Cyber Warfare – Building the Scientific Foundation, Advances in Information Security, Vol. 56, Springer, 2015.

21.

R.L.

Keeney, Value-Focused Thinking: A Path to Creative Decisionmaking, Harvard University Press, 1996.

22.

R.L.

Keeney, Value-focused thinking: Identifying decision opportunities and creating alternatives, European Journal of Operational Research 92(3) (1996), 537–549. doi:10.1016/0377-2217(96)00004-5.

23.

Kiekintveld,

Lisý and

Píbil, Game-theoretic foundations for the strategic use of honeypots in network security, in: Cyber Warfare – Building the Scientific Foundation, 2015, pp. 81–101.

24.

Kim and

M.H.

Kang, Determining Asset Criticality for Cyber Defense, 2011, www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA550373.

25.

Krause,

Leskovec,

Guestrin,

VanBriesen and

Faloutsos, Efficient sensor placement optimization for securing large water distribution networks, Journal of Water Resources Planning and Management 134(6) (2008), 516–526. doi:10.1061/(ASCE)0733-9496(2008)134:6(516).

26.

R.P.-

Lippmann,

J.F.

Riordan,

T.H.

Yu and

K.K.

Watson, Continuous Security Metrics for Prevalent Network Threats: Introduction and First Four Metrics, Technical Report, MIT Lincoln Laboratory, 2012.

27.

K.J.R.

Liu and

Wang, Cognitive Radio Networking and Security: A Game-Theoretic View, Cambridge University Press, New York, NY, USA, 2010. doi:10.1017/CBO9780511778773.

28.

T.F.

Lunt, A survey of intrusion detection techniques, Computers & Security 12(4) (1993), 405–418. doi:10.1016/0167-4048(93)90029-5.

29.

K.-w.

Lye and

J.M.

Wing, Game strategies in network security, Int. J. of Information Security 4(1–2) (2005), 71–86. doi:10.1007/s10207-004-0060-x.

30.

M.H.

Manshaei,

Zhu,

Alpcan,

Bacşar and

J.-P.

Hubaux, Game theory meets network security and privacy, ACM Comput. Surv. 45(3) (2013), 25.

31.

Mell,

Bergeron and

Henning, Creating a Patch and Vulnerability Management Program, NIST Sp. Publ. 800-40, Version 2.0, 2005.

32.

MITRE, Common Weakness Scoring System (CWSS™), 2016, http://cwe.mitre.org/cwss.

33.

G.L.

Nemhauser,

L.A.

Wolsey and

M.L.

Fisher, An analysis of approximations for maximizing submodular set functions – I, Math. Program. 14(1) (1978), 265–294. doi:10.1007/BF01588971.

34.

NIST, National Vulnerability Database, 2016, http://nvd.nist.gov.

35.

Osborn,

Sandhu and

Munawer, Configuring role-based access control to enforce mandatory and discretionary access control policies, ACM Transactions on Information and System Security 3(2) (2000), 85–106. doi:10.1145/354876.354878.

36.

Poolsappasit,

Dewri and

Ray, Dynamic security risk management using Bayesian attack graphs, IEEE Trans. Dependable Secur. Comput. 9(1) (2012), 61–74. doi:10.1109/TDSC.2011.34.

37.

Pouget and

Dacier, Honeypot, Honeynet: A comparative survey, in: Institut Eurecom, 2003.

38.

Rasouli,

Miehling and

Teneketzis, A supervisory control approach to dynamic cyber-security, in: GameSec, 2014.

39.

Raya,

M.H.

Manshaei,

Félegyhazi and

J.-P.

Hubaux, Revocation games in ephemeral networks, in: CCS, 2008.

40.

Schelling, The Strategy of Conflict, Harvard University Press, 1992.

41.

Serra,

Jajodia,

Pugliese,

Rullo and

V.S.

Subrahmanian, Pareto-optimal adversarial defense of enterprise systems, ACM Trans. Inf. Syst. Secur. 17(3) (2015), 11. doi:10.1145/2699907.

42.

Shabtai,

Elovici and

Rokach, Data leakage detection/prevention solutions, in: A Survey of Data Leakage Detection and Prevention Solutions, Springer, 2012, pp. 17–37. doi:10.1007/978-1-4614-2053-8_4.

43.

Shakarian,

Paulo,

Albanese and

Jajodia, Keeping intrudors at large: A graph-theoretic approach to reducing the probability of successful network intrusions, in: SECRYPT, 2014.

44.

G.F.

Stocco and

Cybenko, Exploiting adversary’s risk profiles in imperfect information security games, in: GameSec, 2011.

45.

Xiao,

Chen,

W.S.

Lin and

K.J.R.

Liu, Indirect reciprocity security game for large-scale wireless networks, Trans. Info. For. Sec. 7(4) (2012), 1368–1380. doi:10.1109/TIFS.2012.2202228.

46.

Yan,

Kucuk,

Slocum and

D.C.

Last, A Bayesian cogntive approach to quantifying software exploitability based on reachability testing, in: ISC, 2016.

47.

Younis,

Y.K.

Malaiya and

Ray, Assessing vulnerability exploitability risk using software properties, Software Quality Journal 24(1) (2016), 159–202. doi:10.1007/s11219-015-9274-6.

48.

Zhang and

B.A.

Prakash, DAVA: Distributing vaccines over networks under prior information, in: ICDM, 2014.

49.

Zhu,

Li,

Han and

Basar, A stochastic game model for jamming in multi-channel cognitive radio systems, in: ICC, 2010.

Hybrid adversarial defense: Merging honeypots and traditional security methods 1

Abstract

Keywords

1. Introduction

2. Related work

Table 1 Comparison with related work Hybrid adv. defense Attacker model Attacker model w/detection probability [12,13,36,37,41] – – – [5 ,20 ,23 ,43] – √ – This paper √ √ √

2 https://www.infosecurity-magazine.com/news/companies-average-120-days-patch

6 https://www.open-scap.org/tools/scap-workbench

Definition 3 (Attacker State).

7 As our notion of attacker state is really about what the attacker knows, we include the history of actions A in it so that the attacker can reason about the effects of her past actions.

Definition 5 (State Transition after an Action).

Definition 6 (Valid Attacker Strategy).

Definition 7 (Attacker Belief Evolution Tree (ABET)).

9 We assume that the detection of an action leads to immediate removal of the attacker.

Definition 8 (Defensive Strategy).

Definition 9 (Valid Defensive Strategy).

Definition 10 (Defender’s Belief on Attacker Belief Evolution Tree (DABET)).

Definition 11 (Expected damage).

Definition 13 (Defensive Strategy Problem).

6.1. Exact algorithm

11 We sometimes abuse notation and write δ = ∅ in place of δ = ( ∅ , ∅ ) .

Footnotes

Acknowledgments

References

Table 1
Comparison with related work

Hybrid adv. defense Attacker model Attacker model w/detection probability

[12,13,36,37,41] – – –

[5 ,20 ,23 ,43] – √ –

This paper √ √ √

²
https://www.infosecurity-magazine.com/news/companies-average-120-days-patch

⁶
https://www.open-scap.org/tools/scap-workbench

⁷
As our notion of attacker state is really about what the attacker knows, we include the history of actions A in it so that the attacker can reason about the effects of her past actions.

⁹
We assume that the detection of an action leads to immediate removal of the attacker.

¹¹
We sometimes abuse notation and write $δ = \emptyset$ in place of $δ = (\emptyset, \emptyset)$ .