Abstract
The adoption of Nash equilibrium (NE) in real–world settings is often impractical due to its too restrictive assumptions. Game theory and artificial intelligence provide alternative (relaxed) solution concepts. When knowledge about opponents’ utilities and types are not available, the appropriate solution concept for extensive–form games is the self–confirming equilibrium (SCE), which relaxes NE allowing agents to have wrong beliefs off the equilibrium path. In this paper, we provide the first computational and learning study of the situations in which a two–agent extensive–form game is played by heterogeneous populations of individuals that repeatedly match (e.g., eBay): we extend the SCE concept, we study the equilibrium computation problem, and we study how these equilibria affect learning dynamics. We show that SCEs are crucial for characterizing both stable states of learning dynamics and the dynamics themselves.
Keywords
Introduction
The study of strategic interactions has recently deserved a lot of attention in artificial intelligence. The design of rational agents is usually pursued by exploiting models from microeconomics [20] and by algorithmic tools to find strategies [20]. The central solution concept is Nash equilibrium (NE). Although in principle NE can be applied to an enormous range of situations to prescribe strategies to agents, it presents two drawbacks that make its prescriptive power unsatisfactory. The first drawback concerns its epistemic requirements (e.g., common and complete information over payoffs) that are rarely met in real–world situations. The second drawback is the multiplicity of equilibria: with multiple equilibria, agents cannot coordinate on which equilibrium to play unless they are somehow correlated, but in this case a correlated equilibrium should be played. Even when NE is used as descriptive tool to study what are stable states of learning agents, some problems arise, learning agents having non–NE stable states, and therefore NE may have a unsatisfactory descriptive power [7].
The aforementioned drawbacks of NE pushed researchers to design alternative (relaxed and/or non–equilibrium) solution concepts taking also into account learning dynamics, e.g., the CURB set [2] curbsandholm is a non–equilibrium concept defined as a set of strategies that contains the best responses to any mixture over itself and any best–response dynamics (even a best–response dynamics of mixed strategies) will stay within it [11]. With extensive–form games, relaxing the epistemic requirements of NE, it is possible to have stable (equilibrium) states that are not NEs [7]. These equilibria, called self–confirming equilibria (SCEs), require that each agent plays best response strategies to her beliefs, but the beliefs can be wrong off the equilibrium path (while they are confirmed on). This concept is perfectly suitable for learning agents: if agents can entirely explore the strategy profile space, they would have correct beliefs everywhere on the game tree and they would play a subgame perfect equilibrium (SPE) or a sequential equilibrium (SE), but, in practice, learning agents cannot explore the whole space and therefore they can have wrong beliefs over some portion of the tree and the strategy profile they play is an SCE.
To the best of our knowledge, no work investigates how to compute SCEs and how these equilibria affect learning dynamics. The only work dealing with SCEs [10] studies two–agent games, but it does not address the problem to find equilibria in challenging situations [17] where heterogeneous finite or infinite populations of individuals repeatedly match and play (e.g., a population of buyers and a population of sellers in eBay, each individual with potentially different beliefs and strategies) and does not characterize the learning dynamics in terms of SCEs. This last issue is of extraordinary prominence. Indeed in many practical situations the study of the learning dynamics may be more important than the study of the equilibrium points: when learning dynamics are long, the overall agents’ performance (in terms of expected utility) is strongly conditioned by their performance during the learning dynamics and these cannot be neglected. Our aim is to provide models and tools to forecast these dynamics and allow, in future, a designer to exploit this information in the design of game mechanisms, e.g., by moving equilibrium points to speed up the learning transitory or, in the case of auctions (in particular when no dominant bidding strategy exists, as for instance in continuous double auctions [3]), to push the dynamics towards outcomes that are better for an auctioneer.
The main original contributions provided in this paper are as follow. Solution concepts. We design some solution concepts extending the SCE to capture situations where agents may be of different (Bayesian) types and individuals. Equilibrium computation. With two agents, we formulate the problems to find a solution with finite and infinite populations as two Mixed–Integer Linear mathematical Programs (MILP). Furthermore, we show that the problem to verify whether or not a solution is an SCE extension is in even when the input of the verification problem specifies only the strategies and not the beliefs (this is common when we observe learning agents and we ask whether or not the observed strategies are a solution for some given solution concept), while searching for an SCE is –complete. Equilibrium characterization. We study the relationships between different SCE extensions as the number of individuals and types change and we discuss how the equilibria can be enumerated. Learning dynamics characterization. We study the replicator dynamics with multiple individuals and we show how SCE extensions affect learning dynamics in games with perfect information: they are reachable states only when learning agents are purely greedy, otherwise they are saddle points attracting and then repelling the learning dynamics.
Extensive–form games and equilibrium computation
Game definition and strategies
A perfect–information extensive–form game [8] is a tuple (N, A, V, T, ι, ρ, χ, u), where: N is the set of agents (i ∈ N denotes a generic agent), A is the set of actions (A i ⊆ A denotes the set of actions of agent i and a ∈ A denotes a generic action), V is the set of decision nodes (V i ⊆ V denotes the set of decision nodes of i), T is the set of terminal nodes (w ∈ V ∪ T denotes a generic node and w0 is root node), ι : V → N returns the agent that acts at a given decision node, ρ : V → ℘ (A) returns the actions available to agent ι (w) at w, χ : V × A → V ∪ T assigns the next (decision or terminal) node to each pair 〈w, a〉 where a is available at w, and u = (u1, …, u|N|) is the set of agents’ utility functions . Games with imperfect information extend those with perfect information, allowing one to capture situations in which some agents cannot observe some actions undertaken by other agents. We denote by Vi,h the h–th information set of agent i. An information set is a set of decision nodes such that when an agent plays at one of its nodes she cannot distinguish the node in which she is playing. For the sake of simplicity, we assume that every information set has a different index h, thus we can univocally identify an information set by h. Furthermore, since the available actions at all nodes w belonging to the same information set h are the same, with abuse of notation, we write ρ (h) in place of ρ (w) with w ∈ Vi,h. An imperfect–information game is a tuple (N, A, V, T, ι, ρ, χ, u, H) where (N, A, V, T, ι, ρ, χ, u) is a perfect–information game and H = (H1, …, H|N|) induces a partition V i = ⋃ h∈H i Vi,h such that for all w, w′ ∈ Vi,h we have ρ (w) = ρ (w′). We focus on games with perfect recall where each agent recalls all the own previous actions and the ones of the opponents [8].
A behavioral strategy σ
i
is represented as a probability distribution σi,a over the actions a ∈ A
i
available at each information set h. Working with behavioral strategies is computationally hard, requiring highly non–linear optimization tools. With two agents the sequence form [12] allows one not to resort to non–linear programming. A sequence q ∈ Q
i
is a set of consecutive actions a ∈ A
i
, where Q
i
⊆ Q is the set of sequences of agent i and Q is the set of all the sequences. A sequence can be terminal, if, combined with some opponents’ sequence, it leads to a terminal node ( denotes the subset of terminal sequences), or non–terminal, if it cannot lead to any terminal node for every opponents’ sequence. The initial sequence of every agent, denoted by q0, is said empty sequence and, given sequence q ∈ Q
i
leading to some information set h ∈ H
i
, we say that q′ extends q (denoted by q′ = q|a′) if the last action of q′ (denoted by a (q′) = a′) is some action a′ ∈ ρ (w) with w ∈ Vi,h. We denote a sequence–form strategy profile by a vector
Solution concepts and computation
It is well known that the NE concept is not satisfactory for extensive–form games, allowing agents to play non–credible threats. The concept of SPE refines the concept of NE, constraining a strategy profile to be an NE in every subgame [8], where a subgame is a portion of the game tree defined as follows: it has a root and for every node w ∈ Vi,h belonging to thesubgame the whole information set Vi,h belongs to the subgame. An SPE can be easily found by applying backward induction [8]. The concept of SPE is not satisfactory with imperfect information. In this case, SE [13] is used. An SE is a pair , where σ are the agents’ strategies and are the agents’ beliefs over the opponents’ strategies, such that σ are sequentially rational to and are consistent to σ.
The concept of SCE relaxes NE, capturing settings where opponents’ utilities and types are unknown. Similarly to an SE, an SCE is a pair where σ are best responses to . Differently, can be wrong off the equilibrium path (instead they must be correct/confirmed on). Thus, every NE is an SCE, while an SCE may be not an NE.
Now, we briefly survey the main computational results. Finding an NE is –complete [5]. The problem to search for an NE is formulated as a linear–complementarity problem (LCP) and solved by employing the Lemke algorithm [14], a generalization of the algorithm presented in [15]. Other algorithms are based on support enumeration [18] and MILP [19]. The problem to verify whether a strategy profile, both in sequence form and agent form, is an NE can be easily solved in polynomial time by checking whether or not the constraints are satisfied. Finding an SE in and can be achieved by a variation of the Lemke’s algorithm [16]. Verifying whether or not a given solution is an SE is in [9]. The unique result on the computation of SCEs shows that finding a basic SCE can be formulated as an MILP [10].
Extensive–form games with populations
Game model
We focus on two–agent extensive–form games that are repeatedly played by different individuals as described in [7]. More precisely, for each agent (representing a role), there is a population of individuals and, at each repetition of the game, one individual is drawn from each population and the two drawn individuals are matched and then play the game. At each repetition, we may have individuals different from those of the previous repetitions. A common example is a market in which bilateral negotiations are carried out: there are two agents/roles (i.e., buyer and seller), but different buyers and different sellers can match. Other economic examples are given by auctions.
The game model proposed in [7] presents the following two main limitations that we remove in this paper. Infinite populations: the number of individuals of all the populations is infinite. While this model can be a fine approximation with large populations, it is not when the populations contain few individuals. Identical individuals: all the individuals of a population have the same preferences. This is unrealistic given that individuals can have different preferences.
In our model, we allow populations to be finite or infinite and to be heterogeneous including individuals of different types, where types differentiate for their preferences. We denote by Θ = (Θ1, …, Θ|N|) the set of all the types (θ denotes a generic type), where Θ i is the set of types of agent i. We assume that the number of types is finite. We denote by Θ–i = × j≠i Θ j the set of all the possible profiles of types of all the agents except agent i. Utility functions are defined also over types (in addition to the strategies of the agents). When types are non–interdependent, we have u i = u i (θ, σ i , σ–i) where θ ∈ Θ i ; instead when types are interdependent we have u i = u i (θ, θ′, σ i , σ–i) where θ ∈ Θ i and θ′ ∈ Θ–i. For each type θ ∈ Θ i , there is a (possibly infinite) population of individuals Λ θ (we denote by λ a generic individual and by Λ i the set of all the individuals of agent i). Each type θ ∈ Θ i is associated with a probability ωi,θ with which an individual of the pertinent population is drawn. Obviously, ∑θ∈Θ i ωi,θ = 1.Similarly, each individual λ ∈ Λ θ is associated with a probability ωi,θ,λ with which the individual is drawn such that ∑λ∈Λ θ ωi,θ,λ = ωi,θ. For the sake of presentation, we assume that each individual has a different index λ and therefore we can refer to an individual λ ∈ Λ θ by using λ in place of 〈i, θ, λ〉. Similarly, we assume each type has a different θ, therefore we can refer to a type θ ∈ Θ i by using θ in place of 〈i, θ〉.
Different individuals may adopt different strategies. σ λ denotes the strategy of individual λ. The aggregate strategy of the individuals of type θ ∈ Θ i is σ θ = ∑λ∈Λ θ σ λ · ω λ , and the one of agent i is σ i = ∑θ∈Θ i σ θ · ω θ .
As in [7], we assume that agents have no information about the opponents. Specifically, we assume: each individual has no information about the utility functions of the other agents; each individual has no information about the individuals of the other agents; when utilities are interdependent, each individual knows the types of the opponents, but she does not know the pertinent probabilities and utilities; when utilities are not interdependent, no assumption about the knowledge of the opponents’ types is made (this is because it can be proved, with a simple extension of [6], that knowing or not the opponents’ types (without knowing the utilities and probabilities) leads to the same set of equilibria).
Customarily, a game with types is said Bayesian. Each individual forms a belief over the opponents and adjusts it during the play. More precisely, when utilities are not interdependent, each individual λ of agent i has a (potentially different) belief over the aggregate strategy of agent j for every j ≠ i. This is because each individual λ does not know the types and the individuals of the opponents and therefore she cannot form any belief over the single individual or type of the opponents. When utilities are interdependent, each individual λ of agent i has a (potentially different) belief over the aggregate strategy of type θ ∈ Θ j for every j ≠ i and a (potentially different) belief over the pertinent probability ω θ (also in this case individuals have not beliefs over the single individuals of the opponents).
Solution concepts
Different solution concepts extending the SCE can be provided according to each specific situation, see Tab. 1.
The basic solution concept, introduced in [7], is the unitary SCE (USCE) that captures situations with a unique type per agent and a unique individual per type. A USCE constrains the agent’s strategy to be best response to the belief over opponent’s strategy and constrains beliefs to be correct (w.r.t. the strategies) on the equilibrium path.
each agent i has single type θ and single individual λ; for every agent i, σ
λ
is a best response to , where λ is the agent i’s individual; for every agent i, is equal to σλ′ on the equilibrium path, where λ and λ′ are the individuals of agent i and agent –i, respectively.
Upon the USCE concept, we build the solution concepts for other (more complex) situations. We consider the situation with one type per agent and multiple finite individuals: the solution concept is finite heterogeneous SCE (FHSCE).
each agent i has a single type θ and a finite number of individuals λ; for every agent i, σ
λ
is a best response to , where λ is an agent i’s individual; for every agent i, is equal to σ–i on the equilibrium path identified by σ
λ
and σ–i, where λ is an individual of agent i; for every agent i, σ
i
is the aggregate strategy of all the agent i’s individuals.
Notice that the constraints over the beliefs of an individual λ and the equilibrium path she observes depend on σ λ . Thus, different individuals may have different strategies, each supported by different beliefs.
When there is an infinite number of individuals, the above definition is not operative, requiring one to specify an infinite number of strategies and beliefs. In this case, according to [7], we can work at the level of the single action a and we can state that a ∈ A i could be played by agent i if there is a belief (confirmed on the equilibrium path) such that this action is a best response. As a result, for every action a ∈ A i we need to define a belief . As done above, we use in place of , under the assumption that each action a has a different label. The solution concept is the infinite heterogeneous SCE (IHSCE).
each agent i has a single type θ and an infinite number of individuals λ; for every agent i, σ
i
(a) >0 with a ∈ A
i
if a is best response to some belief ; for every action a ∈ A
i
and agent i, equals σ–i on the equilibrium path identified by (σ
a
, σ–i).
We focus on Bayesian games. We report only the definition of the Bayesian USCE concept (BUSCE) because the redefinitions of FHSCE and IHSCE with Bayesian games (BFHSCE and BIHSCE, respectively) are similar.
each agent i has a finite number of types θ and a single individual λ per type; for every agent i, σ
λ
is a best response to , where λ is the agent i’s individual and is the aggregate belief of λ over agent –i defined as ; for every agent i, is equal to σ–i on the equilibrium path, where λ is an individual of agent i and σ–i is the aggregate strategy of agent –i.
Finally, we remark that every game admits at least one SCE per extension, since the SCE concept relaxes the NE concept and every game admits at least an NE.
Computational results
We study both the problem to compute an equilibrium according to the solution concepts defined above and to verify whether a given solution is one of such solution concepts.
We initially consider the equilibrium computation problem. The solution concept definitions provided in the previous section are based on behavioral strategies, but it is known that behavioral strategies pose severe computational problems [20]. However, since those constraints are defined exclusively on the equilibrium path, it is possible to redefine them in terms of sequence–form strategies without perturbations (as instead it is needed for SEs to capture the off–the–equilibrium–path behavior). As a result, we have x λ , , for SCEs with finite populations, and x θ , xθ,q, , for SCEs with infinite populations.
A specific sample of SCE extensions can be computed by searching for an NE and thus it can be solved by using the Lemke algorithm applied to the sequence form (in the case all the individuals have correct beliefs, all the individuals of the same type behave in the same way and therefore the game is essentially a Bayesian two–agent game). Hence, the problem to compute an SCE extension with two agents is –complete. However, the computation of a specific sample is not interesting when dealing with learning agents that can potentially achieve any equilibrium, while the aim is the derivation of the equilibrium conditions and the enumeration of all the equilibria. Indeed, knowing all the equilibria and their stability properties in terms of attractors, saddle points, and repellers, we can have a complete picture about all the possible learning dynamics of the agents.
The constraint expressing that a belief is confirmed on the equilibrium path is not linear and neither expressible as a linear complementarity constraint problem (LCP). Instead, it can be formulate as a non–linear complementary constraint problem (NLCP). Indeed, the constraint over the belief on a sequence (or an action indifferently) must be active only if such sequence (or action) is on the equilibrium path. More precisely, sequence q|a ∈ Q–i is on the equilibrium path if both q|a is played with strictly positive probability and there is at least a sequence q′ ∈ Q i leading to h ∈ H–i, where a ∈ ρ (h), that is played with strictly positive probability. Given sequence q|a ∈ Q–i, we denote by f (q|a) ⊆ Q i the set of sequences leading to h ∈ H–i such that a ∈ ρ (h). A possible way to code the above constraint with USCEs is: for every q′ ∈ f (q|a), where λ is an individual of agent i and λ′ is an individual of agent –i (notice that this constraint is not expressed as a NLCP, but it can be easily cast to such form). These non–linear constraints can be linearized by using binary variables.
We define the binary variables as s λ (q) such that s λ (q) =1 when q ∈ Q i is played with positive probability by λ. Similarly, we use binary variables s θ (q) such that s θ (q) =1 when q is played with positive probability by at least one individual of θ, and we use binary variables s i (q) such that s i (q) =1 when q is played with positive probability by at least one individual of i. The previous non–linear constraint can be formulated as a pair of constraints: and for every q′ ∈ f (q|a), where M is an arbitrarily large constant. The problem to compute a BFHSCE when types are not interdependent can be formulated as the MILP (1)–(12), while the program for an FHSCE is a trivial simplification obtained by setting |Θ i |=1 for every i.
Here, constraints (1) and (2) force v λ (h) at each h of individual λ to be equal to the expected (w.r.t. the individual’s beliefs ) utility of the best sequence available at h (similar constraints are used in MILP Nash formulation [19]); constraints (3) and (4) force the belief of individual λ over the aggregate opponent’s strategy x–i to be correct on the equilibrium path (notice that, when 2 – s λ (f (q)) – s–i (q) =0, q is on the equilibrium path); constraints (5) and (6) assure that x λ is a well defined strategy; constraints (7) and (8) assure that is a well–defined belief; constraints (9) assure that s λ (q) =1 if q are played with positive probability by individual λ; constraints (10) assure that s i (q) are strictly positive if at least one individual of i plays q with positive probability; constraints (11) and (12) define the domains of the variables. Differently from the case of NE or its refinements, both strategies and beliefs are variables.
Notice that the above program (1)–(12) is mixed–integer linear independently of the number of individuals per type. This is because each individual plays the best response to her beliefs, her beliefs are equal to the aggregate opponent’s strategy on the equilibrium path, and each agent aggregate strategy is given by the sum of the single individual strategies. In this perspective, the number of individuals behaves as the number of types. However, as we show in the next section, types and individuals have radically different effects on the set of equilibria.
The MILP for computing a BIHSCE is more involved. For each pair θ, q, we need a strategy xθ,q (·) defined over the whole game and the corresponding values vθ,q (·) per information set, and we need a belief over the aggregate opponent’s strategy. The MILP for computing a BIHSCE with non–interdependent types is program (13)–(26), while the program for IHSCE can be obtained with |Θ i |=1 for every i.
Here, constraints (13) and (14) force vθ,q (h) at each h of type θ related to q to be equal to the expected utility of the best sequence available at h; constraints (15) and (16) ensure that are correct on the equilibrium path identified by xθ,q (·) and x–i (·); constraints (17) and (18) guarantee that xθ,q (·) is well defined and not played with a probability larger than ω θ ; constraints (19) and (20) make the same with strategy x θ (·); constraints (21) and (22) ensure that are well defined; constraints (23) force s θ (q) =1 if there is a belief of type θ such that q is a best response; constraints (24) force s i (q) =1 if at least a type of i plays q with positive probability; constraints (25) and (26) define the domain of the variables.
We focus on the problem to verify whether a given solution is an SCE extension.
More interesting is the verification problem when the input solution is partially specified, the beliefs being omitted. This problem is common when we question whether learning dynamics, e.g. approaching some attractor, are converging to some solution concepts.
Summarily, these results show that finding an SCE extension is (in the worst case) as hard as finding an SE [16] and that verifying an SCE extension is easier than an SE [9].
We characterize the relationships between different SCE extensions. For clarity, we use the following example.
While it is known that with two agents USCEs and NEs are the same in terms of expected utility, see [9], the presence of individuals makes HSCEs (both FHSCEs and IHSCEs) and NEs dramatically different. We state the following, whose proof is given as counterexample below.
Thus, the presence of individuals introduces new equilibria and in some case a continuum of equilibria. Now, we derive some interesting properties among the solution concepts that we use below for the equilibrium enumeration problem. We initially provide the following definition.
We can show that FHSCEs and IHSCEs are convex combinations of different USCEs. We state the following theorem for the IHSCE concept, the same result can be derived for the FHSCE concept.
With infinite populations we can obtain every strategy profile (∑ k α1,k · σ1,k, ∑ k α2,k · σ2,k) with ∑ k αi,k = 1 and αi,k ≥ 0, and where σi,k is the strategy of agent i in the k–th USCE. Therefore, once we have all the USCEs, we can derive all the IHSCEs simply as convex combinations of USCEs. In the finite case, the set of possible FHSCEs is limited to a finite set of strategies, that are obtained by constraining the coefficients of the convex combination as follows: αi,k = ∑σ λ =σi,kω λ .
We state the following theorem for the FHSCE concept, whose proof is given as counterexample below.
Therefore, the derivation of the FHSCEs cannot be performed on the basis of only the USCEs.
We use the above results for the equilibrium enumeration problem. The enumeration of the USCEs can be performed by enumerating all the basic solutions of the corresponding mathematical program. This can be performed by exhaustively visiting the branch–and–bound tree generated by the MILP solver. The size of the tree is O (2|Q1|+|Q2|). The problem is and, in the case of degeneracy, the same approach adopted in [1] can be employed here. By the enumeration of the USCEs, we are enumerating all the IHSCEs, given that these can be derived as convex combinations of USCEs. The enumeration of FHSCEs can be accomplished similarly: enumerating all the USCEs, computing all the possible finite (with the given ω) convex combinations, and then verifying whether each convex combination is an FHSCE. Denoting with # USCEs the number of USCEs, this approach has a complexity O (2|Q1|+|Q2| + 2log2(#USCEs)·(|Λ1|+|Λ2|)) much faster than enumerating the basic solutions of the mathematical program for FHSCE when log 2 (# USCEs) is smaller than |Q1| and |Q2|, this last problem having complexity O (2|Q1|·|Λ1|+|Q2|·|Λ2|).
SCE and Learning dynamics
In order to formally study the dynamics of learning agents when applied to an extensive–form game, we focus on the replicator dynamics with mutation simulating the Q–learning algorithm [21]. The first issue we study concerns the appropriate form of the replicator dynamics when individuals are present.
The literature provides replicators dynamics only for normal–form games. For this reason we first translate the extensive–form game in normal form and then we apply the replicator equations [4]. Given that each individual can learn independently of the others we obtain the following replicator dynamics (for reasons of space we omit the mutation term):
where
It is easy to show that the strategy of each individual of agent 2 is the best response w.r.t. some beliefs about the opponent and the strategy of agent 1 is the best response to (31). So (
Given each SCE found as described in the previous section, it is possible to study its stability by linearizing (27)–(28) by means of the Jacobian matrix as follows:
where
We apply the tools described in the previous section to the game described in Fig. 1, where the types are θ1.1, θ2.1 . First we compute all USCEs of the game using (1)–(12), then we study their stability computing their eigenvalues and eigenvectors by (27)–(28) from which we can forecast the learning dynamics of the system.
Consider Fig. 2.a (one type/one individual) that displays a learning trajectory projected onto the utility space. Notice that such representation may hide some information since we are interested to study the stability properties of strategy profiles that may map into the same point of the utility space. An example is reported in Tab. 2, where x|y| denotes y eigenvalues of value x. The strategy profile
Consider Fig. 2.c (one type/one individual). The trajectory that starts outside the convex hull of USCEs quickly moves into such region, and after being attracted by the saddle B it converges to A. According to the initial strategy, the learning system converges to the SPE (either to A or to one of the points between D–C).
When multiple individuals and types are considered, the learning dynamics is influenced also by the presence of FHSCEs. Like non–SPE USCEs, these equilibria are rest points with, in general, eigenvalues positive and negative, so they are saddle points. They can be reachable saddle points only by nullifying the probability with which some individual plays some sequences. All these equilibria determine slight changes to the trajectory produced in the single–type single–individual case, but they considerably slow down the convergence of the system.
In Fig. 2.b (one type/two individuals), two trajectories converging to FHSCEs (different from SPE) are shown. Both trajectories start close to a USCE placed on the B–C line and end into equilibria that are not SPEs. An example of eigenvalues of some of these equilibria are reported in Table 3, where individuals are drawn by a uniform distribution over Λ1. As previously said, this behavior has been obtained by forcing to zero the probabilities of some sequences. The white trajectory starts moving due to the attraction of the FHSCE placed between C and D, and then it converges to D. The dynamics of the black strategy is more complex, moving between FHSCEs and USCEs until it converges to the FHSCE between A and B.
Conclusions
In this paper, we studied extensive–form games in which heterogeneous finite or infinite populations of individuals repeatedly play without common information. We extended the concept of SCE to this setting, we provided two mathematical programming formulations to compute an equilibrium, and we discuss the problem to verify whether a given solution is an equilibrium. Furthermore, we discussed how all the equilibria can be enumerated and how their dynamical properties (i.e. attractor, saddle, repeller) can be evaluated. Finally, we applied all these instruments to a simple case study showing that the learning dynamics are dramatically conditioned by SCEs and therefore that our tools play a prominent role in the study of the learning dynamics.
The most interesting future work is the adoption of our tools to design game mechanisms. A designer could remove equilibria to change or remove learning dynamics, e.g. providing small utility, or to change the dynamical properties of some equilibrium, strengthening the attracting power to speed up the dynamics towards such equilibrium or weakening the attracting power to make the dynamics move farther from such equilibrium. Other future works consist in the extension of our tools with more than two agents where USCEs and NEs can be radically different.
