Extensive–form games with heterogeneous populations: solution concepts,equilibria characterization,learning dynamics

Abstract

The adoption of Nash equilibrium (NE) in real–world settings is often impractical due to its too restrictive assumptions. Game theory and artificial intelligence provide alternative (relaxed) solution concepts. When knowledge about opponents’ utilities and types are not available, the appropriate solution concept for extensive–form games is the self–confirming equilibrium (SCE), which relaxes NE allowing agents to have wrong beliefs off the equilibrium path. In this paper, we provide the first computational and learning study of the situations in which a two–agent extensive–form game is played by heterogeneous populations of individuals that repeatedly match (e.g., eBay): we extend the SCE concept, we study the equilibrium computation problem, and we study how these equilibria affect learning dynamics. We show that SCEs are crucial for characterizing both stable states of learning dynamics and the dynamics themselves.

Keywords

Artificial Intelligence Multi–agent systems Algorithms Economics Game Theory (cooperative and non-cooperative)

1 Introduction

The study of strategic interactions has recently deserved a lot of attention in artificial intelligence. The design of rational agents is usually pursued by exploiting models from microeconomics [20] and by algorithmic tools to find strategies [20]. The central solution concept is Nash equilibrium (NE). Although in principle NE can be applied to an enormous range of situations to prescribe strategies to agents, it presents two drawbacks that make its prescriptive power unsatisfactory. The first drawback concerns its epistemic requirements (e.g., common and complete information over payoffs) that are rarely met in real–world situations. The second drawback is the multiplicity of equilibria: with multiple equilibria, agents cannot coordinate on which equilibrium to play unless they are somehow correlated, but in this case a correlated equilibrium should be played. Even when NE is used as descriptive tool to study what are stable states of learning agents, some problems arise, learning agents having non–NE stable states, and therefore NE may have a unsatisfactory descriptive power [7].

The aforementioned drawbacks of NE pushed researchers to design alternative (relaxed and/or non–equilibrium) solution concepts taking also into account learning dynamics, e.g., the CURB set [2] curbsandholm is a non–equilibrium concept defined as a set of strategies that contains the best responses to any mixture over itself and any best–response dynamics (even a best–response dynamics of mixed strategies) will stay within it [11]. With extensive–form games, relaxing the epistemic requirements of NE, it is possible to have stable (equilibrium) states that are not NEs [7]. These equilibria, called self–confirming equilibria (SCEs), require that each agent plays best response strategies to her beliefs, but the beliefs can be wrong off the equilibrium path (while they are confirmed on). This concept is perfectly suitable for learning agents: if agents can entirely explore the strategy profile space, they would have correct beliefs everywhere on the game tree and they would play a subgame perfect equilibrium (SPE) or a sequential equilibrium (SE), but, in practice, learning agents cannot explore the whole space and therefore they can have wrong beliefs over some portion of the tree and the strategy profile they play is an SCE.

To the best of our knowledge, no work investigates how to compute SCEs and how these equilibria affect learning dynamics. The only work dealing with SCEs [10] studies two–agent games, but it does not address the problem to find equilibria in challenging situations [17] where heterogeneous finite or infinite populations of individuals repeatedly match and play (e.g., a population of buyers and a population of sellers in eBay, each individual with potentially different beliefs and strategies) and does not characterize the learning dynamics in terms of SCEs. This last issue is of extraordinary prominence. Indeed in many practical situations the study of the learning dynamics may be more important than the study of the equilibrium points: when learning dynamics are long, the overall agents’ performance (in terms of expected utility) is strongly conditioned by their performance during the learning dynamics and these cannot be neglected. Our aim is to provide models and tools to forecast these dynamics and allow, in future, a designer to exploit this information in the design of game mechanisms, e.g., by moving equilibrium points to speed up the learning transitory or, in the case of auctions (in particular when no dominant bidding strategy exists, as for instance in continuous double auctions [3]), to push the dynamics towards outcomes that are better for an auctioneer.

The main original contributions provided in this paper are as follow.

Solution concepts. We design some solution concepts extending the SCE to capture situations where agents may be of different (Bayesian) types and individuals.

Equilibrium computation. With two agents, we formulate the problems to find a solution with finite and infinite populations as two Mixed–Integer Linear mathematical Programs (MILP). Furthermore, we show that the problem to verify whether or not a solution is an SCE extension is in $P$ even when the input of the verification problem specifies only the strategies and not the beliefs (this is common when we observe learning agents and we ask whether or not the observed strategies are a solution for some given solution concept), while searching for an SCE is $PPAD$ –complete.

Equilibrium characterization. We study the relationships between different SCE extensions as the number of individuals and types change and we discuss how the equilibria can be enumerated.

Learning dynamics characterization. We study the replicator dynamics with multiple individuals and we show how SCE extensions affect learning dynamics in games with perfect information: they are reachable states only when learning agents are purely greedy, otherwise they are saddle points attracting and then repelling the learning dynamics.

2 Extensive–form games and equilibrium computation

2.1 Game definition and strategies

A perfect–information extensive–form game [8] is a tuple (N, A, V, T, ι, ρ, χ, u), where: N is the set of agents (i ∈ N denotes a generic agent), A is the set of actions (A_i ⊆ A denotes the set of actions of agent i and a ∈ A denotes a generic action), V is the set of decision nodes (V_i ⊆ V denotes the set of decision nodes of i), T is the set of terminal nodes (w ∈ V ∪ T denotes a generic node and w₀ is root node), ι : V → N returns the agent that acts at a given decision node, ρ : V → ℘ (A) returns the actions available to agent ι (w) at w, χ : V × A → V ∪ T assigns the next (decision or terminal) node to each pair 〈w, a〉 where a is available at w, and u = (u₁, …, u_|N|) is the set of agents’ utility functions $u_{i} : T \to ℝ$ . Games with imperfect information extend those with perfect information, allowing one to capture situations in which some agents cannot observe some actions undertaken by other agents. We denote by V_i,h the h–th information set of agent i. An information set is a set of decision nodes such that when an agent plays at one of its nodes she cannot distinguish the node in which she is playing. For the sake of simplicity, we assume that every information set has a different index h, thus we can univocally identify an information set by h. Furthermore, since the available actions at all nodes w belonging to the same information set h are the same, with abuse of notation, we write ρ (h) in place of ρ (w) with w ∈ V_i,h. An imperfect–information game is a tuple (N, A, V, T, ι, ρ, χ, u, H) where (N, A, V, T, ι, ρ, χ, u) is a perfect–information game and H = (H₁, …, H_|N|) induces a partition V_i = ⋃ _{h∈H_i}V_i,h such that for all w, w′ ∈ V_i,h we have ρ (w) = ρ (w′). We focus on games with perfect recall where each agent recalls all the own previous actions and the ones of the opponents [8].

A behavioral strategy σ_i is represented as a probability distribution σ_i,a over the actions a ∈ A_i available at each information set h. Working with behavioral strategies is computationally hard, requiring highly non–linear optimization tools. With two agents the sequence form [12] allows one not to resort to non–linear programming. A sequence q ∈ Q_i is a set of consecutive actions a ∈ A_i, where Q_i ⊆ Q is the set of sequences of agent i and Q is the set of all the sequences. A sequence can be terminal, if, combined with some opponents’ sequence, it leads to a terminal node ( $\bar{Q}$ denotes the subset of terminal sequences), or non–terminal, if it cannot lead to any terminal node for every opponents’ sequence. The initial sequence of every agent, denoted by q₀, is said empty sequence and, given sequence q ∈ Q_i leading to some information set h ∈ H_i, we say that q′ extends q (denoted by q′ = q|a′) if the last action of q′ (denoted by a (q′) = a′) is some action a′ ∈ ρ (w) with w ∈ V_i,h. We denote a sequence–form strategy profile by a vector x = [x₁, …, x_|N|] where x_i is the strategy of agent i and we denote by x_i (q) the probability associated with sequence q ∈ Q_i. Well–defined sequence–form strategies are such that, for every information set h ∈ H_i, the probability x_i (q) assigned to the sequence q leading to h is equal to the sum of the probabilities x_i (q′), for all the sequences q′ that extend q at h. The agent i’s utility is represented as a sparse matrix U_i.

2.2 Solution concepts and computation

It is well known that the NE concept is not satisfactory for extensive–form games, allowing agents to play non–credible threats. The concept of SPE refines the concept of NE, constraining a strategy profile to be an NE in every subgame [8], where a subgame is a portion of the game tree defined as follows: it has a root and for every node w ∈ V_i,h belonging to thesubgame the whole information set V_i,h belongs to the subgame. An SPE can be easily found by applying backward induction [8]. The concept of SPE is not satisfactory with imperfect information. In this case, SE [13] is used. An SE is a pair $(σ, \hat{σ})$ , where σ are the agents’ strategies and $\hat{σ}$ are the agents’ beliefs over the opponents’ strategies, such that σ are sequentially rational to $\hat{σ}$ and $\hat{σ}$ are consistent to σ.

The concept of SCE relaxes NE, capturing settings where opponents’ utilities and types are unknown. Similarly to an SE, an SCE is a pair $(σ, \hat{σ})$ where σ are best responses to $\hat{σ}$ . Differently, $\hat{σ}$ can be wrong off the equilibrium path (instead they must be correct/confirmed on). Thus, every NE is an SCE, while an SCE may be not an NE.

Now, we briefly survey the main computational results. Finding an NE is $PPAD$ –complete [5]. The problem to search for an NE is formulated as a linear–complementarity problem (LCP) and solved by employing the Lemke algorithm [14], a generalization of the algorithm presented in [15]. Other algorithms are based on support enumeration [18] and MILP [19]. The problem to verify whether a strategy profile, both in sequence form and agent form, is an NE can be easily solved in polynomial time by checking whether or not the constraints are satisfied. Finding an SE in $PPAD$ and can be achieved by a variation of the Lemke’s algorithm [16]. Verifying whether or not a given solution is an SE is in $P$ [9]. The unique result on the computation of SCEs shows that finding a basic SCE can be formulated as an MILP [10].

3 Extensive–form games with populations

3.1 Game model

We focus on two–agent extensive–form games that are repeatedly played by different individuals as described in [7]. More precisely, for each agent (representing a role), there is a population of individuals and, at each repetition of the game, one individual is drawn from each population and the two drawn individuals are matched and then play the game. At each repetition, we may have individuals different from those of the previous repetitions. A common example is a market in which bilateral negotiations are carried out: there are two agents/roles (i.e., buyer and seller), but different buyers and different sellers can match. Other economic examples are given by auctions.

The game model proposed in [7] presents the following two main limitations that we remove in this paper.

Infinite populations: the number of individuals of all the populations is infinite. While this model can be a fine approximation with large populations, it is not when the populations contain few individuals.

Identical individuals: all the individuals of a population have the same preferences. This is unrealistic given that individuals can have different preferences.

In our model, we allow populations to be finite or infinite and to be heterogeneous including individuals of different types, where types differentiate for their preferences. We denote by Θ = (Θ₁, …, Θ_|N|) the set of all the types (θ denotes a generic type), where Θ_i is the set of types of agent i. We assume that the number of types is finite. We denote by Θ_–i = × _j≠i Θ_j the set of all the possible profiles of types of all the agents except agent i. Utility functions are defined also over types (in addition to the strategies of the agents). When types are non–interdependent, we have u_i = u_i (θ, σ_i, σ_–i) where θ ∈ Θ_i; instead when types are interdependent we have u_i = u_i (θ, θ′, σ_i, σ_–i) where θ ∈ Θ_i and θ′ ∈ Θ_–i. For each type θ ∈ Θ_i, there is a (possibly infinite) population of individuals Λ_θ (we denote by λ a generic individual and by Λ_i the set of all the individuals of agent i). Each type θ ∈ Θ_i is associated with a probability ω_i,θ with which an individual of the pertinent population is drawn. Obviously, ∑_{θ∈Θ_i}ω_i,θ = 1.Similarly, each individual λ ∈ Λ_θ is associated with a probability ω_i,θ,λ with which the individual is drawn such that ∑_{λ∈Λ_θ}ω_i,θ,λ = ω_i,θ. For the sake of presentation, we assume that each individual has a different index λ and therefore we can refer to an individual λ ∈ Λ_θ by using λ in place of 〈i, θ, λ〉. Similarly, we assume each type has a different θ, therefore we can refer to a type θ ∈ Θ_i by using θ in place of 〈i, θ〉.

Different individuals may adopt different strategies. σ_λ denotes the strategy of individual λ. The aggregate strategy of the individuals of type θ ∈ Θ_i is σ_θ = ∑_{λ∈Λ_θ}σ_λ · ω_λ, and the one of agent i is σ_i = ∑_{θ∈Θ_i}σ_θ · ω_θ.

As in [7], we assume that agents have no information about the opponents. Specifically, we assume:

each individual has no information about the utility functions of the other agents;

each individual has no information about the individuals of the other agents;

when utilities are interdependent, each individual knows the types of the opponents, but she does not know the pertinent probabilities and utilities;

when utilities are not interdependent, no assumption about the knowledge of the opponents’ types is made (this is because it can be proved, with a simple extension of [6], that knowing or not the opponents’ types (without knowing the utilities and probabilities) leads to the same set of equilibria).

Customarily, a game with types is said Bayesian. Each individual forms a belief over the opponents and adjusts it during the play. More precisely, when utilities are not interdependent, each individual λ of agent i has a (potentially different) belief ${\hat{σ}}_{λ}^{j}$ over the aggregate strategy of agent j for every j ≠ i. This is because each individual λ does not know the types and the individuals of the opponents and therefore she cannot form any belief over the single individual or type of the opponents. When utilities are interdependent, each individual λ of agent i has a (potentially different) belief ${\hat{σ}}_{λ}^{θ}$ over the aggregate strategy of type θ ∈ Θ_j for every j ≠ i and a (potentially different) belief ${\hat{ω}}_{λ}^{θ}$ over the pertinent probability ω_θ (also in this case individuals have not beliefs over the single individuals of the opponents).

Example 3.1. Agent 1 is of three types θ_1.1, θ_1.2, θ_1.3, each with probability ω_1.1, ω_1.2, ω_1.3. Two individuals λ_1.1.1, λ_1.1.2 are of type θ_1.1 (with ω_1.1.1 + ω_1.1.2 = ω_1.1), while only one individual λ_1.2.1 is of type θ_1.2 (with ω_1.2.1 = ω_1.2) and only one individual λ_1.3.1 is of type θ_1.3 (with ω_1.3.1 = ω_1.3). Notice that without interdependency the separation between types and individuals is not necessary and all the information can be coded by individuals (and not by types as discussed in Section 4), whereas with interdependency the separation is necessary, utilities being defined on types.

3.2 Solution concepts

Different solution concepts extending the SCE can be provided according to each specific situation, see Tab. 1.

The basic solution concept, introduced in [7], is the unitary SCE (USCE) that captures situations with a unique type per agent and a unique individual per type. A USCE constrains the agent’s strategy to be best response to the belief over opponent’s strategy and constrains beliefs to be correct (w.r.t. the strategies) on the equilibrium path.

Definition 3.2. A USCE is a pair $(σ, \hat{σ})$ such that:

each agent i has single type θ and single individual λ;

for every agent i, σ_λ is a best response to ${\hat{σ}}_{λ}^{- i}$ , where λ is the agent i’s individual;

for every agent i, ${\hat{σ}}_{λ}^{- i}$ is equal to σ_λ′ on the equilibrium path, where λ and λ′ are the individuals of agent i and agent –i, respectively.

Upon the USCE concept, we build the solution concepts for other (more complex) situations. We consider the situation with one type per agent and multiple finite individuals: the solution concept is finite heterogeneous SCE (FHSCE).

Definition 3.3. An FHSCE is a pair $(σ, \hat{σ})$ such that:

each agent i has a single type θ and a finite number of individuals λ;

for every agent i, σ_λ is a best response to ${\hat{σ}}_{λ}^{- i}$ , where λ is an agent i’s individual;

for every agent i, ${\hat{σ}}_{λ}^{- i}$ is equal to σ_–i on the equilibrium path identified by σ_λ and σ_–i, where λ is an individual of agent i;

for every agent i, σ_i is the aggregate strategy of all the agent i’s individuals.

Notice that the constraints over the beliefs of an individual λ and the equilibrium path she observes depend on σ_λ. Thus, different individuals may have different strategies, each supported by different beliefs.

When there is an infinite number of individuals, the above definition is not operative, requiring one to specify an infinite number of strategies and beliefs. In this case, according to [7], we can work at the level of the single action a and we can state that a ∈ A_i could be played by agent i if there is a belief (confirmed on the equilibrium path) such that this action is a best response. As a result, for every action a ∈ A_i we need to define a belief ${\hat{σ}}_{i, a}^{- i}$ . As done above, we use ${\hat{σ}}_{a}^{- i}$ in place of ${\hat{σ}}_{i, a}^{- i}$ , under the assumption that each action a has a different label. The solution concept is the infinite heterogeneous SCE (IHSCE).

Definition 3.4. An IHSCE is a pair $(σ, \hat{σ})$ such that:

each agent i has a single type θ and an infinite number of individuals λ;

for every agent i, σ_i (a) >0 with a ∈ A_i if a is best response to some belief ${\hat{σ}}_{a}^{- i}$ ;

for every action a ∈ A_i and agent i, ${\hat{σ}}_{a}^{- i}$ equals σ_–i on the equilibrium path identified by (σ_a, σ_–i).

We focus on Bayesian games. We report only the definition of the Bayesian USCE concept (BUSCE) because the redefinitions of FHSCE and IHSCE with Bayesian games (BFHSCE and BIHSCE, respectively) are similar.

Definition 3.5. A BUSCE is a pair $(σ, \hat{σ})$ such that:

each agent i has a finite number of types θ and a single individual λ per type;

for every agent i, σ_λ is a best response to ${\hat{σ}}_{λ}^{- i}$ , where λ is the agent i’s individual and ${\hat{σ}}_{λ}^{- i}$ is the aggregate belief of λ over agent –i defined as $\sum_{θ \in Θ_{- i}} {\hat{ω}}_{θ} {\hat{σ}}_{λ}^{θ}$ ;

for every agent i, ${\hat{σ}}_{λ}^{- i}$ is equal to σ_–i on the equilibrium path, where λ is an individual of agent i and σ_–i is the aggregate strategy of agent –i.

Finally, we remark that every game admits at least one SCE per extension, since the SCE concept relaxes the NE concept and every game admits at least an NE.

3.3 Computational results

We study both the problem to compute an equilibrium according to the solution concepts defined above and to verify whether a given solution is one of such solution concepts.

We initially consider the equilibrium computation problem. The solution concept definitions provided in the previous section are based on behavioral strategies, but it is known that behavioral strategies pose severe computational problems [20]. However, since those constraints are defined exclusively on the equilibrium path, it is possible to redefine them in terms of sequence–form strategies without perturbations (as instead it is needed for SEs to capture the off–the–equilibrium–path behavior). As a result, we have x_λ, ${\hat{x}}_{λ}^{- i}$ , ${\hat{x}}_{λ}^{θ^{'}}$ for SCEs with finite populations, and x_θ, x_θ,q, ${\hat{x}}_{θ, q}^{- i}$ , ${\hat{x}}_{θ, q}^{θ^{'}}$ for SCEs with infinite populations.

A specific sample of SCE extensions can be computed by searching for an NE and thus it can be solved by using the Lemke algorithm applied to the sequence form (in the case all the individuals have correct beliefs, all the individuals of the same type behave in the same way and therefore the game is essentially a Bayesian two–agent game). Hence, the problem to compute an SCE extension with two agents is $PPAD$ –complete. However, the computation of a specific sample is not interesting when dealing with learning agents that can potentially achieve any equilibrium, while the aim is the derivation of the equilibrium conditions and the enumeration of all the equilibria. Indeed, knowing all the equilibria and their stability properties in terms of attractors, saddle points, and repellers, we can have a complete picture about all the possible learning dynamics of the agents.

The constraint expressing that a belief is confirmed on the equilibrium path is not linear and neither expressible as a linear complementarity constraint problem (LCP). Instead, it can be formulate as a non–linear complementary constraint problem (NLCP). Indeed, the constraint over the belief on a sequence (or an action indifferently) must be active only if such sequence (or action) is on the equilibrium path. More precisely, sequence q|a ∈ Q_–i is on the equilibrium path if both q|a is played with strictly positive probability and there is at least a sequence q′ ∈ Q_i leading to h ∈ H_–i, where a ∈ ρ (h), that is played with strictly positive probability. Given sequence q|a ∈ Q_–i, we denote by f (q|a) ⊆ Q_i the set of sequences leading to h ∈ H_–i such that a ∈ ρ (h). A possible way to code the above constraint with USCEs is: $x_{λ} (q^{'}) \cdot x_{λ^{'}} (q | a) \cdot ({\hat{x}}_{λ}^{- i} (q | a) - x_{λ^{'}} (q | a)) = 0$ for every q′ ∈ f (q|a), where λ is an individual of agent i and λ′ is an individual of agent –i (notice that this constraint is not expressed as a NLCP, but it can be easily cast to such form). These non–linear constraints can be linearized by using binary variables.

We define the binary variables as s_λ (q) such that s_λ (q) =1 when q ∈ Q_i is played with positive probability by λ. Similarly, we use binary variables s_θ (q) such that s_θ (q) =1 when q is played with positive probability by at least one individual of θ, and we use binary variables s_i (q) such that s_i (q) =1 when q is played with positive probability by at least one individual of i. The previous non–linear constraint can be formulated as a pair of constraints: ${\hat{x}}_{λ}^{- i} (q | a) - x_{λ^{'}} (q | a) \leq M \cdot (2 - s_{λ} (q^{'}) - s_{λ^{'}} (q | a))$ and $x_{λ^{'}} (q | a) - {\hat{x}}_{λ}^{- i} (q | a) \leq M \cdot (2 - s_{λ} (q^{'}) - s_{λ^{'}} (q | a))$ for every q′ ∈ f (q|a), where M is an arbitrarily large constant. The problem to compute a BFHSCE when types are not interdependent can be formulated as the MILP (1)–(12), while the program for an FHSCE is a trivial simplification obtained by setting |Θ_i|=1 for every i.

Here, constraints (1) and (2) force v_λ (h) at each h of individual λ to be equal to the expected (w.r.t. the individual’s beliefs ${\hat{x}}_{λ}^{- i}$ ) utility of the best sequence available at h (similar constraints are used in MILP Nash formulation [19]); constraints (3) and (4) force the belief ${\hat{x}}_{λ}^{- i} (q)$ of individual λ over the aggregate opponent’s strategy x_–i to be correct on the equilibrium path (notice that, when 2 – s_λ (f (q)) – s_–i (q) =0, q is on the equilibrium path); constraints (5) and (6) assure that x_λ is a well defined strategy; constraints (7) and (8) assure that ${\hat{x}}_{λ}^{- i}$ is a well–defined belief; constraints (9) assure that s_λ (q) =1 if q are played with positive probability by individual λ; constraints (10) assure that s_i (q) are strictly positive if at least one individual of i plays q with positive probability; constraints (11) and (12) define the domains of the variables. Differently from the case of NE or its refinements, both strategies and beliefs are variables.

Notice that the above program (1)–(12) is mixed–integer linear independently of the number of individuals per type. This is because each individual plays the best response to her beliefs, her beliefs are equal to the aggregate opponent’s strategy on the equilibrium path, and each agent aggregate strategy is given by the sum of the single individual strategies. In this perspective, the number of individuals behaves as the number of types. However, as we show in the next section, types and individuals have radically different effects on the set of equilibria.

The MILP for computing a BIHSCE is more involved. For each pair θ, q, we need a strategy x_θ,q (·) defined over the whole game and the corresponding values v_θ,q (·) per information set, and we need a belief ${\hat{x}}_{θ, q}^{- i} (\cdot)$ over the aggregate opponent’s strategy. The MILP for computing a BIHSCE with non–interdependent types is program (13)–(26), while the program for IHSCE can be obtained with |Θ_i|=1 for every i.

Here, constraints (13) and (14) force v_θ,q (h) at each h of type θ related to q to be equal to the expected utility of the best sequence available at h; constraints (15) and (16) ensure that ${\hat{x}}_{θ, q}^{- i} (\cdot)$ are correct on the equilibrium path identified by x_θ,q (·) and x_–i (·); constraints (17) and (18) guarantee that x_θ,q (·) is well defined and not played with a probability larger than ω_θ; constraints (19) and (20) make the same with strategy x_θ (·); constraints (21) and (22) ensure that ${\hat{x}}_{θ, q}^{- i} (\cdot)$ are well defined; constraints (23) force s_θ (q) =1 if there is a belief of type θ such that q is a best response; constraints (24) force s_i (q) =1 if at least a type of i plays q with positive probability; constraints (25) and (26) define the domain of the variables.

We focus on the problem to verify whether a given solution is an SCE extension.

Proposition 3.6 The problem to verify whether or not a solution $(σ, \hat{σ})$ (even expressed in sequence form) is one of the above SCE extensions is in $P$ .

$\begin{matrix} v_{λ} (h) \geq \sum_{h^{'} \in H_{i} (q | a)} v_{λ} (h^{'}) + \sum_{q^{'} \in Q_{- i}} u_{i} (θ, q | a, q^{'}) \cdot {\hat{x}}_{λ}^{- i} (q^{'}) \forall i \in N, q \in Q_{i}, h \in H_{i} (q), a \in ρ (h), \\ θ \in Θ_{i}, λ \in Λ_{θ} \end{matrix}$ (1) $\begin{matrix} v_{λ} (h) \leq \sum_{h^{'} \in H_{i} (q | a)} v_{λ} (h^{'}) + \sum_{q^{'} \in Q_{- i}} u_{i} (θ, q | a, q^{'}) \cdot {\hat{x}}_{λ}^{- i} (q^{'}) + M (1 - s_{λ} (q | a)) \forall i \in N, q \in Q_{i}, h \in H_{i} (q), \\ a \in ρ (h), θ \in Θ_{i}, λ \in Λ_{θ} \end{matrix}$ (2) $\begin{matrix} {\hat{x}}_{λ}^{- i} (q | a) \geq \sum_{λ^{'} \in Λ_{θ^{'}}, θ^{'} \in Θ_{- i}} x_{λ^{'}} (q | a) - M \cdot (2 - s_{λ} (q^{'}) - s_{- i} (q | a)) \forall i \in N, q \in Q_{- i}, q^{'} \in f (q | a), \\ λ \in Λ_{θ}, θ \in Θ_{i} \end{matrix}$ (3) $\begin{matrix} {\hat{x}}_{λ}^{- i} (q | a) \leq \sum_{λ^{'} \in Λ_{θ^{'}}, θ^{'} \in Θ_{- i}} x_{λ^{'}} (q | a) + M \cdot (2 - s_{λ} (q^{'}) - s_{- i} (q | a)) \forall i \in N, q \in Q_{- i}, q^{'} \in f (q | a), \\ λ \in Λ_{θ}, θ \in Θ_{i} \end{matrix}$ (4) $x_{λ} (q_{0}) = ω_{λ} \forall i \in N, λ \in Λ_{θ}, θ \in Θ_{i}$ (5) $x_{λ} (q) = \sum_{a \in ρ (h)} x_{λ} (q | a) \forall i \in N, λ \in Λ_{θ}, q \in Q, h \in H_{i, q}, θ \in Θ_{i}$ (6) ${\hat{x}}_{λ}^{- i} (q_{0}) = 1 \forall i \in N, λ \in Λ_{θ}, θ \in Θ_{i}$ (7) ${\hat{x}}_{λ}^{- i} (q) = \sum_{a \in ρ (h)} {\hat{x}}_{λ}^{- i} (q | a) \forall i \in N, λ \in Λ_{θ}, q \in Q, h \in H_{i, q}, θ \in Θ_{i}$ (8) $s_{λ} (q) \geq x_{λ} (q) \forall i \in N, λ \in Λ_{θ}, θ \in Θ_{i}, q, \in Q_{i}$ (9) $s_{i} (q) \geq s_{λ} (q) \forall i \in N, λ \in Λ_{θ}, q \in Q_{i}, θ \in Θ_{i}$ (10) $s_{i} (q), s_{λ} (q) \in {0, 1} \forall i \in N, q \in Q_{i}, θ \in Θ_{i}, λ \in Λ_{θ}$ (11) $x_{λ} (q), {\hat{x}}_{λ}^{- i} (q) \geq 0 \forall i \in N, q \in Q_{i}, θ \in Θ_{i}, λ \in Λ_{θ}$ (12) $\begin{matrix} v_{θ, q | a} (h) \geq \sum_{h^{'} \in H_{i} (q | a)} v_{θ, q | a} (h^{'}) + \sum_{q^{'} \in Q_{- i}} u_{i} (θ, q | a, q^{'}) \cdot {\hat{x}}_{θ, q | a}^{- i} (q^{'}) \forall i \in N, q \in Q_{i}, h \in H_{i} (q), \\ θ \in Θ_{i}, a \in ρ (h) \end{matrix}$ (13) $\begin{matrix} v_{θ, q | a} (h) \leq \sum_{h^{'} \in H_{i} (q | a)} v_{θ, q | a} (h^{'}) + \sum_{q^{'} \in Q_{- i}} u_{i} (θ, q | a, q^{'}) \cdot {\hat{x}}_{θ, q | a}^{- i} (q^{'}) + \forall i \in N, q \in Q_{i}, h \in H_{i} (q), \\ θ \in Θ_{i}, a \in ρ (h) M (1 - s_{θ, q | a} (q | a)) \end{matrix}$ (14) $\begin{matrix} {\hat{x}}_{θ, q}^{- i} (q^{'} | a) \geq \sum_{θ^{'} \in Θ_{- i}} x_{θ^{'}} (q^{'} | a) - M \cdot (2 - s_{θ, q} (q^{″}) - s_{- i} (q^{'} | a)) \forall i \in N, q \in Q_{i}, q^{″} \in f (q^{'} | a), \\ q^{'} \in Q_{- i}, θ \in Θ_{i} \end{matrix}$ (15) $\begin{matrix} {\hat{x}}_{θ, q}^{- i} (q^{'} | a) \leq \sum_{θ^{'} \in Θ_{- i}} x_{θ^{'}} (q^{'} | a) + M \cdot (2 - s_{θ, q} (q^{″}) - s_{- i} (q^{'} | a)) \forall i \in N, q \in Q_{i}, \\ q^{″} \in f (q^{'} | a), q^{'} \in Q_{- i}, θ \in Θ_{i} \end{matrix}$ (16) $x_{θ, q^{'}} (q_{0}) = ω_{θ} \forall i \in N, θ \in Θ_{i}, q^{'} \in Q_{i}$ (17) $x_{θ, q^{'}} (q) = \sum_{a \in ρ (h)} x_{θ, q^{'}} (q | a) \forall i \in N, θ \in Θ_{i}, q, q^{'} \in Q_{i}, h \in H_{i, q}$ (18) $x_{θ} (q_{0}) = ω_{θ} \forall i \in N, θ \in Θ_{i}$ (19) $x_{θ} (q) = \sum_{a \in ρ (h)} x_{θ} (q | a) \forall i \in N, θ \in Θ_{i}, q \in Q_{i}, h \in H_{i, q}$ (20) ${\hat{x}}_{θ, q^{'}}^{- i} (q_{0}) = 1 \forall i \in N, θ \in Θ_{i}$ (21) ${\hat{x}}_{θ, q^{'}}^{- i} (q) = \sum_{a \in ρ (h)} {\hat{x}}_{θ, q^{'}}^{- i} (q | a) \forall i \in N, θ \in Θ_{i}, q \in Q_{i}, h \in H_{i, q}$ (22) $s_{θ, q^{'}} (q) \geq x_{θ, q^{'}} (q) \forall i \in N, θ \in Θ_{i}, q, q^{'} \in Q_{i}$ (23) $s_{i} (q) \geq s_{θ, q^{'}} (q) \forall i \in N, θ \in Θ_{i}, q, q^{'} \in Q_{i}$ (24) $s_{i} (q), s_{θ, q^{'}} (q) \in {0, 1} \forall i \in N, q, q^{'} \in Q_{i}, θ \in Θ_{i}$ (25) $x_{θ} (q), x_{θ, q^{'}} (q), {\hat{x}}_{θ, q^{'}}^{- i} (q) \geq 0 \forall i \in N, q, q^{'} \in Q_{i}, θ \in Θ_{i}$ (26)

Proof. Trivially, the verification problem requires one to check whether or not the pertinent above constraints are satisfied. Being the number of those constraints polynomial in the size of the game, the verification problem can be solved in polynomial time.□

More interesting is the verification problem when the input solution is partially specified, the beliefs being omitted. This problem is common when we question whether learning dynamics, e.g. approaching some attractor, are converging to some solution concepts.

Theorem 3.7. The problem to verify whether or not there is a solution $(σ, \hat{σ})$ with a given strategy profile σ (even expressed in sequence form) that is one of the above SCE extensions is in $P$ .

Proof. We focus on FHSCE (the proof for the other SCE extensions is similar). The verification problem when the input specifies only the strategies is equivalent to the problem, for each individual λ, to search for a belief confirmed on the equilibrium path such that the strategy of λ is best response. This problem is an overconstrained version (with a linear number of additional constraints) of the problem to verify whether or not a strategy is a never–best response, where the additional constraints are linear constraints. Since the problem to verify whether a strategy is a never–best response can be formulated as a linear mathematical programming problem and the additional constraints are linear, the resulting program is linear and therefore solvable with polynomial time.□

Summarily, these results show that finding an SCE extension is (in the worst case) as hard as finding an SE [16] and that verifying an SCE extension is easier than an SE [9].

3.4 Equilibrium characterization and enumeration

We characterize the relationships between different SCE extensions. For clarity, we use the following example.

Example 3.8. In the game in Fig. 1, there are two agents, each with two types. Fig. 2.a and Fig.2.b report the equilibria in the space of the agents’ utilities when types are only θ_1.1, θ_2.1: A, B, C, D, and all the points between B and C are USCEs, while A is the unique SPE. Fig. 2.c and Fig. 2.d report the equilibria when types are only θ_1.2, θ_2.1: A, B, C, and all the points between B and C are USCEs, while A and all the points between D and C are SPEs.

While it is known that with two agents USCEs and NEs are the same in terms of expected utility, see [9], the presence of individuals makes HSCEs (both FHSCEs and IHSCEs) and NEs dramatically different. We state the following, whose proof is given as counterexample below.

Proposition 3.9 Types and individuals have a different expressivity.

Example 3.10 Different types can act in different ways only if their utilities are different, instead different individuals can act in different ways even with the same utility leading to new equilibria as shown as follows. In Fig. 2 the light gray polytopes contain all the IHSCEs. In Fig. 2.b and in Fig. 2.d, we report the FHSCEs with different combination of individuals per agent (in this case U₁ and U₂ are the aggregate expected utilities of agent 1 and 2, respectively). With ■ we mark FHSCEs (non–USCE) when agent 1 has two individuals (with ω_λ = .5) and agent 2 has one individual. With ▴ we mark FHSCEs (non–USCE) when agent 1 has one individual and agent 2 has two individuals (with ω_λ = .5). When both agents have two individuals (with ω_λ = .5) new FHSCEs arise marked by ♦.

Thus, the presence of individuals introduces new equilibria and in some case a continuum of equilibria. Now, we derive some interesting properties among the solution concepts that we use below for the equilibrium enumeration problem. We initially provide the following definition.

Definition 3.11. The convex combination of two USCEs (σ₁, σ₂), $(σ_{1}^{'}, σ_{2}^{'})$ is defined as $(α \cdot σ_{1} + (1 - α) \cdot σ_{1}^{'}, α \cdot σ_{2} + (1 - α) \cdot σ_{2}^{'})$ with α ∈ [0, 1].

We can show that FHSCEs and IHSCEs are convex combinations of different USCEs. We state the following theorem for the IHSCE concept, the same result can be derived for the FHSCE concept.

Theorem 3.12. Every IHSCE is equivalent to a probability distribution over a set of USCEs, and vice–versa.

Proof. Given an IHSCE (x₁, x₂), we partition the set of terminal sequences $q \in {\bar{Q}}_{i}$ such that x_i (q) >0 by using the following equivalent relation: $[q] = {q^{'} \in {\bar{Q}}_{i} | {\hat{x}}_{q^{'}}^{- i} = {\hat{x}}_{q}^{- i}}$ . For each equivalent class, we normalize the probability x_i such that $x_{i}^{*} (q_{0}) = 1$ where $x_{i}^{*}$ is the normalized probability and q₀ is the empty sequence. We obtain: $x_{i}^{*} (q) = \frac{x_{i} (q)}{\sum_{q^{'} \in [q]} x_{i} (q^{'})}$ for every q ∈ [q]. Each equivalent class is a USCE because: (a) each q ∈ [q] is best–response given ${\hat{x}}_{q}^{- i}$ (by IHSCE definition) and (b) there exists a unique ${\hat{x}}_{q}^{- i}, \forall q \in [q]$ (by partition rule). We define a probability distribution over the set of [q] such that: Pr ([q]) = ∑_q′∈[q]x_i (q′). Notice that $x_{i} (q) = x_{i}^{*} (q) \cdot \Pr ([q])$ for every $q \in {\bar{Q}}_{i}$ .□

With infinite populations we can obtain every strategy profile (∑_kα_1,k · σ_1,k, ∑_kα_2,k · σ_2,k) with ∑_kα_i,k = 1 and α_i,k ≥ 0, and where σ_i,k is the strategy of agent i in the k–th USCE. Therefore, once we have all the USCEs, we can derive all the IHSCEs simply as convex combinations of USCEs. In the finite case, the set of possible FHSCEs is limited to a finite set of strategies, that are obtained by constraining the coefficients of the convex combination as follows: α_i,k = ∑_{σ_λ=σ_i,k}ω_λ.

We state the following theorem for the FHSCE concept, whose proof is given as counterexample below.

Theorem 3.13. Given a finite set of individuals, a convex combination of USCEs feasible w.r.t. probabilities ω _λ may be not an FHSCE.

Example 3.14. In Fig. 2.b, some possible convex combinations, e.g., the ones between A and C, do not correspond to any FHSCE. In Fig. 2.d, the unique two convex combinations that correspond to FHSCEs are between A and C and between B and D.

Therefore, the derivation of the FHSCEs cannot be performed on the basis of only the USCEs.

We use the above results for the equilibrium enumeration problem. The enumeration of the USCEs can be performed by enumerating all the basic solutions of the corresponding mathematical program. This can be performed by exhaustively visiting the branch–and–bound tree generated by the MILP solver. The size of the tree is O (2^{|Q₁|+|Q₂|}). The problem is $# P$ and, in the case of degeneracy, the same approach adopted in [1] can be employed here. By the enumeration of the USCEs, we are enumerating all the IHSCEs, given that these can be derived as convex combinations of USCEs. The enumeration of FHSCEs can be accomplished similarly: enumerating all the USCEs, computing all the possible finite (with the given ω) convex combinations, and then verifying whether each convex combination is an FHSCE. Denoting with # USCEs the number of USCEs, this approach has a complexity O (2^{|Q₁|+|Q₂|} + 2^{log₂(#USCEs)·(|Λ₁|+|Λ₂|)}) much faster than enumerating the basic solutions of the mathematical program for FHSCE when log ₂ (# USCEs) is smaller than |Q₁| and |Q₂|, this last problem having complexity O (2^{|Q₁|·|Λ₁|+|Q₂|·|Λ₂|}).

4 SCE and Learning dynamics

In order to formally study the dynamics of learning agents when applied to an extensive–form game, we focus on the replicator dynamics with mutation simulating the Q–learning algorithm [21]. The first issue we study concerns the appropriate form of the replicator dynamics when individuals are present.

The literature provides replicators dynamics only for normal–form games. For this reason we first translate the extensive–form game in normal form and then we apply the replicator equations [4]. Given that each individual can learn independently of the others we obtain the following replicator dynamics (for reasons of space we omit the mutation term): $\begin{matrix} {\dot{σ}}_{λ} (a) = σ_{λ} (a) \cdot [\sum_{a_{2} \in A_{2}} u_{1} (a, a_{2}) \cdot σ_{2} (a_{2}) \\ - \sum_{a_{1} \in A_{1}} \sum_{a_{2} \in A_{2}} u_{1} (a_{1}, a_{2}) \cdot σ_{λ} (a_{1}) \cdot σ_{2} (a_{2})] \\ \forall λ \in Λ_{1}, a \in A_{1} \end{matrix}$ (27) $\begin{matrix} {\dot{σ}}_{λ} (a) = σ_{λ} (a) \cdot [\sum_{a_{1} \in A_{1}} u_{2} (a_{1}, a) \cdot σ_{1} (a_{1}) \\ - \sum_{a_{1} \in A_{1}} \sum_{a_{2} \in A_{2}} u_{2} (a_{1}, a_{2}) \cdot σ_{1} (a_{1}) \cdot σ_{λ} (a_{2})] \\ \forall λ \in Λ_{2}, \forall a \in A_{2} \end{matrix}$ (28) It can be observed that each FHSCE is a rest point of (27)–(28), the derivatives over all the actions of each individual being zero. Interestingly, when each individual does not learn independently from the others, but all the individuals of the same agent behave in the same way, we have that FHSCEs may be not rest point. We show this in the following proposition.

Proposition 4.1 The replicator dynamics

$\begin{matrix} {\dot{σ}}_{1} (a) = σ_{1} (a) \cdot [\sum_{a_{2} \in A_{2}} u_{1} (a, a_{2}) \cdot σ_{2} (a_{2}) \\ \forall a \in A_{1} \\ - \sum_{a_{1} \in A_{1}} \sum_{a_{2} \in A_{2}} u_{1} (a_{1}, a_{2}) \cdot σ_{1} (a_{1}) \cdot σ_{2} (a_{2})] \end{matrix}$ (29)

$\begin{matrix} {\dot{σ}}_{2} (a) = σ_{2} (a) \cdot [\sum_{a_{1} \in A_{1}} u_{2} (a_{1}, a) \cdot σ_{1} (a_{1}) \\ \forall a \in A_{2} \\ - \sum_{a_{1} \in A_{1}} \sum_{a_{2} \in A_{2}} u_{2} (a_{1}, a_{2}) \cdot σ_{1} (a_{1}) \cdot σ_{2} (a_{2})] \end{matrix}$ (30) is not equivalent to the replicator (27)–(28).

Proof. Consider the game described in Fig. 1 where the types are θ_1.1, θ_2.1. Suppose that there is only one individual for agent 1 and four individuals for agent 2. The strategies of the individuals are

$\begin{matrix} σ_{1} & = & (σ_{1} (L_{1}) = σ_{1} (R_{2}) = σ_{1} (L_{3}) = σ_{1} (R_{4}) \\ = σ_{1} (L_{5}) = 1) \\ σ_{2.1} & = & (σ_{2.1} (l_{1}) = σ_{2.1} (r_{1}) = 1) \\ σ_{2.2} & = & (σ_{2.2} (l_{1}) = σ_{2.2} (r_{2}) = 1) \\ σ_{2.3} & = & (σ_{2.3} (l_{2}) = σ_{2.3} (r_{1}) = 1) \\ σ_{2.4} & = & (σ_{2.4} (l_{2}) = σ_{2.4} (r_{2}) = 1) \end{matrix}$

where σ _x.y denote the strategy of y–th individual of x–agent. Now suppose that the individual of agent 2 are drawn from a uniform distribution, the aggregate strategy is

$σ_{2} = (σ_{2} (l_{1}) = σ_{2} (l_{2}) = σ_{2} (r_{1}) = σ_{2} (r_{2}) = \frac{1}{2})$ (31)

It is easy to show that the strategy of each individual of agent 2 is the best response w.r.t. some beliefs about the opponent and the strategy of agent 1 is the best response to (31). So ( σ ₁, σ ₂) is a FHSCE. But in (29)–(30) the strategy profile ( σ ₁, σ ₂) is not a rest point. □

Given each SCE found as described in the previous section, it is possible to study its stability by linearizing (27)–(28) by means of the Jacobian matrix as follows:

$\begin{matrix} J = [\begin{matrix} \frac{\partial {\dot{σ}}_{λ_{1}}}{\partial σ_{λ_{1}}} & \dots & \frac{\partial {\dot{σ}}_{λ_{1}}}{\partial σ_{λ_{n}}} & \frac{\partial {\dot{σ}}_{λ_{1}}}{\partial σ_{λ_{1}^{'}}} & \dots & \frac{\partial {\dot{σ}}_{λ_{1}}}{\partial σ_{λ_{m}^{'}}} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ \frac{\partial {\dot{σ}}_{λ_{n}}}{\partial σ_{λ_{1}}} & \dots & \frac{\partial {\dot{σ}}_{λ_{n}}}{\partial σ_{λ_{n}}} & \frac{\partial {\dot{σ}}_{λ_{n}}}{\partial σ_{λ_{1}^{'}}} & \dots & \frac{\partial {\dot{σ}}_{λ_{n}}}{\partial σ_{λ_{m}^{'}}} \\ \frac{\partial {\dot{σ}}_{λ_{1}^{'}}}{\partial σ_{λ_{1}}} & \dots & \frac{\partial {\dot{σ}}_{λ_{1}^{'}}}{\partial σ_{λ_{n}}} & \frac{\partial {\dot{σ}}_{λ_{1}^{'}}}{\partial σ_{λ_{1}^{'}}} & \dots & \frac{\partial {\dot{σ}}_{λ_{1}^{'}}}{\partial σ_{λ_{m}^{'}}} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ \frac{\partial {\dot{σ}}_{λ_{m}^{'}}}{\partial σ_{λ_{1}}} & \dots & \frac{\partial {\dot{σ}}_{λ_{m}^{'}}}{\partial σ_{λ_{n}}} & \frac{\partial {\dot{σ}}_{λ_{m}^{'}}}{\partial σ_{λ_{1}^{'}}} & \dots & \frac{\partial {\dot{σ}}_{λ_{m}^{'}}}{\partial σ_{λ_{m}^{'}}} \end{matrix}] & \begin{matrix} λ_{1}, \dots, λ_{n} \in Λ_{1}, \\ λ_{1}^{'}, \dots, λ_{m}^{'} \in Λ_{2} \end{matrix} \end{matrix}$

where

$\begin{matrix} \frac{\partial {\dot{σ}}_{λ_{i}}}{\partial σ_{λ_{j}}} = {\begin{matrix} [\begin{matrix} \frac{\partial {\dot{σ}}_{λ_{i}} (a_{1})}{\partial σ_{λ_{j}} (a_{1})} & \dots & \frac{\partial {\dot{σ}}_{λ_{i}} (a_{1})}{\partial σ_{λ_{j}} (a_{n})} \\ ⋮ & ⋮ \\ \frac{\partial {\dot{σ}}_{λ_{i}} (a_{m})}{\partial σ_{λ_{j}} (a_{1})} & \dots & \frac{\partial {\dot{σ}}_{λ_{i}} (a_{m})}{\partial σ_{λ_{j}} (a_{n})} \end{matrix}] & \begin{matrix} if λ_{i}, λ_{j} \in Λ_{k} and i = j or \\ if λ_{i} \in Λ_{k}, λ_{j} \in Λ_{k^{'}}, k \neq k^{'} \end{matrix} \\ 0 & if λ_{i}, λ_{j} \in Λ_{k} and i \neq j \end{matrix} \end{matrix}$ From the Jacobian we can compute the eigenvalues of the linearized system: if all the eigenvalues are negative the rest point is an attractor, if there are positive and negative eigenvalues it is a saddle, and in the case where all of them are positive it is a repeller. Given the eigenvalues we can compute the eigenvectors that represent the directions of the dynamics of the system close to the points.

5 A simple case study

We apply the tools described in the previous section to the game described in Fig. 1, where the types are θ_1.1, θ_2.1 . First we compute all USCEs of the game using (1)–(12), then we study their stability computing their eigenvalues and eigenvectors by (27)–(28) from which we can forecast the learning dynamics of the system.

Consider Fig. 2.a (one type/one individual) that displays a learning trajectory projected onto the utility space. Notice that such representation may hide some information since we are interested to study the stability properties of strategy profiles that may map into the same point of the utility space. An example is reported in Tab. 2, where x_|y| denotes y eigenvalues of value x. The strategy profile σ = (R₁R₂L₃R₄L₅, l₁r₂) is the unique SPE, it has all negative eigenvalues except some eigenvalues equal to zeros due to the normal–form replication of payoffs. A is a stable state, instead the other USCEs, B, C and D are saddle points because they have also positive eigenvalues. From the study of the eigenvectors (omitted for reasons of space), it is possible to observe that the dynamics starting close to A are initially attracted by D, then repulsed and attracted by B, and, finally, repulsed and attracted by A.

Consider Fig. 2.c (one type/one individual). The trajectory that starts outside the convex hull of USCEs quickly moves into such region, and after being attracted by the saddle B it converges to A. According to the initial strategy, the learning system converges to the SPE (either to A or to one of the points between D–C).

When multiple individuals and types are considered, the learning dynamics is influenced also by the presence of FHSCEs. Like non–SPE USCEs, these equilibria are rest points with, in general, eigenvalues positive and negative, so they are saddle points. They can be reachable saddle points only by nullifying the probability with which some individual plays some sequences. All these equilibria determine slight changes to the trajectory produced in the single–type single–individual case, but they considerably slow down the convergence of the system.

In Fig. 2.b (one type/two individuals), two trajectories converging to FHSCEs (different from SPE) are shown. Both trajectories start close to a USCE placed on the B–C line and end into equilibria that are not SPEs. An example of eigenvalues of some of these equilibria are reported in Table 3, where individuals are drawn by a uniform distribution over Λ₁. As previously said, this behavior has been obtained by forcing to zero the probabilities of some sequences. The white trajectory starts moving due to the attraction of the FHSCE placed between C and D, and then it converges to D. The dynamics of the black strategy is more complex, moving between FHSCEs and USCEs until it converges to the FHSCE between A and B.

6 Conclusions

In this paper, we studied extensive–form games in which heterogeneous finite or infinite populations of individuals repeatedly play without common information. We extended the concept of SCE to this setting, we provided two mathematical programming formulations to compute an equilibrium, and we discuss the problem to verify whether a given solution is an equilibrium. Furthermore, we discussed how all the equilibria can be enumerated and how their dynamical properties (i.e. attractor, saddle, repeller) can be evaluated. Finally, we applied all these instruments to a simple case study showing that the learning dynamics are dramatically conditioned by SCEs and therefore that our tools play a prominent role in the study of the learning dynamics.

The most interesting future work is the adoption of our tools to design game mechanisms. A designer could remove equilibria to change or remove learning dynamics, e.g. providing small utility, or to change the dynamical properties of some equilibrium, strengthening the attracting power to speed up the dynamics towards such equilibrium or weakening the attracting power to make the dynamics move farther from such equilibrium. Other future works consist in the extension of our tools with more than two agents where USCEs and NEs can be radically different.

References

Avis

, Rosenberg

G.D.

, Savani

and von Stengel

, Enumeration of Nash equilibria for two-player games, Econ Theory, 42(1) (2010), 9–37.

Benisch

, Davis

G.B.

and Sandholm

, Algorithms for closed under rational behavior (CURB) sets, J Artif Intell Res, 38 (2010), 513–534.

Cai

, Niu

and Parsons

, Using evolutionary game-theory to analyse the performance of trading strategies in a continuous double auction market, Adaptive Agents and Multi-Agent Systems III, Adaptation and Multi-Agent Learning, (2008), 44–59.

Cressman

, Evolutionary dynamics and extensive form games, MIT Press, Cambridge, Mass., 2003.

Daskalakis

, Goldberg

and Papadimitriou

, The complexity of computing a Nash equilibrium, In Stoc, (2006), 71–78.

Dekel

, Fudenberg

and Levine

D.K.

, Learning to play Bayesian games, Game Econ Behav, 46 (2004), 282–303.

Fudenberg

and Levine

D.K.

, Self-confirming equilibrium, Econometrica, 61(3) (1993), 523–545.

Fudenberg

and Tirole

, Game Theory, The MIT Press, 1991.

Gatti

and Panozzo

, New results on the verification of Nash refinements for extensive-form games, In AAMAS, (2012), 813–820.

10.

Gatti

, Panozzo

and Ceppi

, Computing a self-confirming equilibrium in two-player extensive-form games, In AAMAS, (2011), 981–988.

11.

Hurkens

, Learning by forgetful players, Game Econ Behav, 11 (1995), 304–329.

12.

Koller

, Megiddo

and von Stengel

, Efficient computation of equilibria for extensive two-person games, Game Econ Behav, 14 (1996), 247–259.

13.

Kreps

D.R.

and Wilson

, Sequential equilibria, Econometrica, 50(4) (1982), 863–894.

14.

Lemke

C.E.

, Some pivot schemes for the linear complementarity problem, Math Program Stud, 7 (1978), 15–35.

15.

Lemke

C.E.

and Howson

J.J.T.

, Equilibrium points of bimatrix games, Siam J Appl Math, 12(2) (1964), 413–423.

16.

Miltersen

P.B.

and Sørensen

T.B.

, Computing sequential equilibria for two-player games, In SODA, (2006), 107–116.

17.

Osepayshvili

, Wellman

M.P.

, Reeves

D.M.

and MacKie-Mason

J.K.

, Self-confirming price prediction for bidding in simultaneous ascending auctions, In UAI, (2005), 441–449.

18.

Porter

, Nudelman

and Shoham

, Simple search methods for finding a Nash equilibrium, In AAAI, (2004), 664–669.

19.

Sandholm

, Gilpin

and Conitzer

, Mixed-integer programming methods for finding Nash equilibria, In AAAI, (2005), 495–501.

20.

Shoham

and Leyton-Brown

, Multiagent Systems: Algorithmic, Game Theoretic and Logical Foundations, Cambridge University Press, (2008).

21.

Tuyls

, Jan’t Hoen

and Vanschoenwinkel

, An evolutionary dynamical analysis of multi-agent learning in iterated games, Auton Agent Multi-AG, 12(1) (2006), 115–153.