Abstract
Decisions in many dilemmas are based on a combination of factors, including as incentive, punishment, reputation, and memory. The impact of memory information on cooperative evolution in multi-round games is a decision-making process in group evolution. The iterated prisoner’s dilemma is an excellent model for the development of cooperation amongst the payoff-maximizing individuals. Since tit-for-tat proved successful in Axelrod’s repeated prisoner’s dilemma tournaments, there has been a great deal of interest in creating new strategies. Every iterative prisoner’s dilemma method bases its decision-making on a specific duration of past contacts with the opponent, which is referred to as the memory’s size. This study examines the impact of strategy memory size on the evolutionary stability of n-person iterated prisoner’s dilemma strategies. In this paper, we address the role that memory plays in decision-making. We interested in the model of the Iterated Prisoner’s Dilemma game for three players with memory two, and we will look at strategies with similar behavior, such as Tit-For-Tat (TFT) strategies as well as Win Stay-Lose Shift (WSLS) strategies. As a result of this paper, we have shown that the effect of memory length is almost non-existent in the competitions of strategies that we studied.
Keywords
Introduction
Game theory does not tell us how to play the game; rather, it discusses some possible strategies for playing that you could find appealing. Additionally, the area of applied arithmetic is utilized to develop the best possible approach for contending with uncertainty and inadequate knowledge like most real-life scenarios. It is the mathematical analysis of conflict modeling and decision-making that occurs in all kinds of fields and companies on a daily basis.
Since the game theory is the theory of “trategic thinking”, it has been developed for military purposes and defense. In the past, it has also been used as an alternative and complementary approach to deal with robustness in the presence of worst-case uncertainties or disturbances in many areas such as economics [1, 4], engineering [5], computer science, politics [8, 15], and sociology. As well as to the biological sciences such as evolutionary biology [3, 12]. Moreover, scientists tend to use game-theoretic tools to avoid traffic flows or predict blackouts in power networks. In the 21 st century, game theory applies to a wide range of behavioral relations.
The mathematician John Von Neumann and the economist Oskar Morgenstern wrote a book [6] in 1944 that contained the first mention of game theory. This book’s second edition offered an axiomatic theory of expected utility that enabled economists and statisticians to study decision-making in the face of uncertainty. In the 1950s, a large number of academics greatly expanded the field of game theory. In the 1970s, it was specifically used in relation to evolution.
There are two types of games: simultaneous games [2, 9] and alternating games [10, 13]. In a simultaneous game, both players make their choices in the same time without knowing the other player’s choice. While in the alternating game, the first player makes his choice and then the other player makes his choice after knowing the first player’s choice. In this paper, we will discuss one of these two types, which is the simultaneous game through a three-player competition model in the iterative game.
The Prisoner’s Dilemma is one of the most popular and simple strategy models developed by Merrill Flood and Melvin Dresher in 1950. In this model, no matter how many participants, each participant has only two decisions. Either he chooses to cooperate and plays C or defect and then plays D. Accordingly, in two-player Iterated Prisoner’s Dilemma games, we have four outcomes (C, C) , (C, D) , (D, C) and (D, D), so there are 24 possibilities for the transition arrows made a total of 16 different strategies denoted by S0, S1, . . . , S15. Whereas, in three-player Iterated Prisoner’s Dilemma games, we have eight outcomes (C, (C, C)), (C, (C, D)), (C, (D, C)), (C, (D, D)), (D, (C, C)), (D, (C, D)), (D, (D, C)) and (D, (D, D)). In our study, we look at symmetric games, so the outcomes become six only (C, (C, C)), (C, (C, D)), (C, (D, D)), (D, (C, C)), (D, (C, D)) and (D, (D, D)). Wherefore, there are 26 possibilities for the transition arrows made a total of 64 different strategies denoted by S0, S1, . . . , S63. We can categorize strategies based on their behavior, such as Tit-For-Tat, Win Stay-Lose Shift Strategies and other Strategies.
Tit for tat is a highly effective strategy in game theory. An agent using this strategy will first cooperate, then repeat opponent’s previous decision. If the opponent previously was cooperative, the agent is cooperative. If the opponent previously was defective, the agent is defective. In Win Stay-Lose Shift strategies, if the game in the previous round has a succeeds, then the player plays the same strategy in the next round. Alternatively, if the game resulted in a failure, the player switches to another action.
Previous studies focused on studying the repeated prisoner’s dilemma game for two players only, and recently we studied the repeated prisoner’s dilemma game for three players but with memory-one [2,17, 2,17], meaning that the player relies in his decision on the previous round only. In this paper, we will discuss the repeated prisoner’s dilemma game for three players but with memory-two. We will focus on Tit-For-Tat and Win Stay-Lose Shift strategies and study the competition between them. At the end of this paper, we will answer the question of what does memory two do in the competition of Tit-For-Tat strategies with each other, and also the competition of Win Stay-Lose Shift strategies with each other.
Payoff and transition matrices
We will study the simultaneous game on Iterated Prisoner’s Dilemma model. We assume that both players play simultaneously without knowing the other player’s decision. Both players get a reward,
Based on the aforementioned, we have four outcomes O1 = (C, C) , O2 = (C, D) , O3 = (D, C) and O4 = (D, D) and if we assume that player I plays with probability P = (p1, p2, p3, p4) and player II with probability Q = (q1, q2, q3, q4), where p i is the probability of player I for playing C after outcome O i . Therefore, the Markov transition matrix for two players [9] is given by
where p1 (1 - q1) is the transition probability from state O1 to O2, where player I plays C by probability p1 after state O1 and (1 - q1) probability of player II playing D after state O1.
As well, in Three-player Iterated Pensioner’s Dilemma games, the player’s payoffs are
The three-player Iterated Prisoner’s Dilemma, we have six outcomes O1 = (C, (C, C)), O2 = (C, (C, D)), O3 = (C, (D, D)), O4 = (D, (C, C)), O5 = (D, (C, D)) and O6 = (D, (D, D)) and if we assume that player I plays with probability P = (p1, p2, p3, p4, p5, p6), player II with probability Q = (q1, q2, q3, q4, q5, q6) and player III with W = (w1, w2, w3, w4, w5, w6). Therefore, the Markov transition matrix for three-player [17] is given by
The purpose of this section is to explain the significance of the Tit-For-Tat and Win Stay-Lose Shift strategies with memory-two in the context of the three-player Iterated Prisoner’s Dilemma game. A two-state automata is adopted to represent the strategies where each player is automaton which can be in one of two states, namely C and D. The sixty-four strategies will be represented by six coordinates of zeros and ones using a binary scheme. Each digit denotes the player’s response when any of the six known round outcomes occur

Automates of TFT strategies.

Automates of WSLS strategies.
We will introduce four different strategies S36, S38, S52 and S54, which are called TFT 1, TFT 2, TFT 3 and TFT 4, respectively. These four strategies are presented by the automatons in Fig. 1, such that:
We study an example for the three players S36, S38 and S52, such as player I (S36) against player III (S52) with player II (S38). We will get the following eight cases, and every case has eight sub-cases as follows:
Case 1: in round 1, players I, II and III play C, and in round 2, we discuss all eight possible cases.
Case 2: in round 1, players I and II play C, but player III plays D and in round 2, we discuss all eight possible cases.
Case 3: in round 1, players I and III play C, but player II plays D and in round 2, we discuss all eight possible cases.
Case 4: in round 1, player I plays C, but players II and III play D and in round 2, we discuss all eight possible cases.
Case 5: in round 1, player I plays D, but players II and III play C and in round 2, we discuss all eight possible cases.
Case 6: in round 1, players I and III play D, but player II plays C and in round 2, we discuss all eight possible cases.
Case 7: in round 1, players I and II plays D, but player III plays C and in round 2, we discuss all eight possible cases.
Case 8: in round 1, players I, II and III play D, and in round 2, we discuss all eight possible cases.
•Case 1.1: If in state 1, the three players (S36, S38, S52) start with C and also in state 2, the three players (S36, S38, S52) start with C.
•Case 1.2: If in state 1, the three players (S36, S38, S52) start with C and in state 2, players I and II (S36, S38) start with C but player III (S52) starts with D.
•Case 1.3: If in state 1, the three players (S36, S38, S52) start with C and in state 2, players I and III (S36, S52) start with C but player II (S38) starts with D.
•Case 1.4: If in state 1, the three players (S36, S38, S52) start with C and in state 2, players I (S36) starts with C but player II and III (S38, S52) start with D.
•Case 1.5: If in state 1, the three players (S36, S38, S52) start with C and in state 2, players II and III (S38, S52) start with C but player I (S38) starts with D.
•Case 1.6: If in state 1, the three players (S36, S38, S52) start with C and in state 2, players II (S38) starts with C but players I and III (S36, S52) start with D.
•Case 1.7: If in state 1, the three players (S36, S38, S52) start with C and in state 2, player III (S52) starts with C but players I and II (S36, S38) start with D.
•Case 1.8: If in state 1, the three players (S36, S38, S52) start with C and in state 2, the three players (S36, S38, S52) start with D.
From the previous case, we have three regimes
When we do the perturbation, we get the following: In regime R1, if S36 and S38 play D instead of C, then regime R1 will transfer to R3, and if S52 plays D instead of C, then regime R1 will transfer to R2. In regime R2, if S36 and S38 play D instead of C in column 1, then regime R2 will transfer to R5, and if S52 plays D instead of C in column 1, then regime R2 will transfer to R4. But in column 2, if S36, S38 and S52 play C instead of D, then regime R2 will not change in all these cases. In regime R3, if S36 and S38 play D instead of C in columns 1 and 3, then regime R3 will transfer to R6, and if S52 plays D instead of C in columns 1 and 3, then regime R3 will transfer to R5. But in column 2, if S36 and S52 play D instead of C, then regime R3 will transfer to R2 and if S38 plays C instead of D, then regime R3 will transfer to R1. While in column 4, if S36 plays C instead of D, then regime R3 will transfer to R1 and if S38 and S52 play D instead of C, then regime R3 will transfer to R2. In regime R4, S36, S38 and S52 play C instead of D, then regime R4 will not change in all these cases. In regime R5, if S36 plays C instead of D in column 1, then regime R5 will transfer to R2, and if S38 and S52 play D instead of C in column 1, then regime R5 will transfer to R4. In columns 2 and 4, S36, S38 and S52 play C instead of D, then regime R5 will not change in all these cases. While in column 3, if S36 and S52 play D instead of C, then regime R5 will transfer to R4, and if S38 plays C instead of D, then regime R5 will transfer to R2. In regime R6, if S36 and S52 play D instead of C in column 1, then regime R6 will transfer to R5, and if S38 plays C instead of D in column 1, then regime R6 will transfer to R3. But in column 2, if S36 plays C instead of D, then regime R6 will transfer to R3, and if S38 and S52 play D instead of C, then regime R6 will transfer to R5. Thus, the corresponding transition matrix becomes
By calculate the left eigenvectors for the eigenvalue 1, we get the following equations:
By solving the linear system of the previous equations from Equations (14) to (19) with the equation v1 + v2 + v3 + v4 + v5 + v6 = 1, then we obtain the eigenvector V as
Now, we can get the payoff values by
According to Equation (21), the payoff vector Π = (π1, π2, π3, π4, π5, π6) is equal to (0, 0, 0, 0, 0, 1). Using the same approach depending on the algorithm in the Appendix section; we examined all possibilities for studying three strategies from the four TFT strategies, which are 64 possibilities times eight sequences. Therefore, we obtained the results shown in the following tables from Tables 1 to 4.
The payoff for player I against player III when player II fixed with the strategy TFT 1 (S36)
The payoff for player I against player III when player II fixed with the strategy TFT 2 (S38)
The payoff for player I against player III when player II fixed with the strategy TFT 3 (S52)
The payoff for player I against player III when player II fixed with the strategy TFT 4 (S54)
We will introduce five different strategies S3, S35, S33, S49 and S48, which are called WSLS 1, WSLS 2, WSLS 3, WSLS 4, and WSLS 5, respectively. These five strategies are presented by the automatons in Fig. 2, such that:
Now, we study an example for the three players S3, S35 and S33, such as player I (S3) against player III (S33) with player II (S35). We will get the following eight cases and every case has 8 sub-cases as follows:
Case 1: in round 1, players I, II and III play C, and in round 2, we discuss all eight possible cases.
Case 2: in round 1, players I and II play C, but player III plays D and in round 2, we discuss all eight possible cases.
Case 3: in round 1, players I and III play C, but player II plays D and in round 2, we discuss all eight possible cases.
Case 4: in round 1, player I plays C, but players II and III play D and in round 2, we discuss all eight possible cases.
Case 5: in round 1, player I plays D, but players II and III play C and in round 2, we discuss all eight possible cases.
Case 6: in round 1, players I and III play D, but player II plays C and in round 2, we discuss all eight possible cases.
Case 7: in round 1, players I and II plays D, but player III plays C and in round 2, we discuss all eight possible cases.
Case 8: in round 1, players I, II and III play D, and in round 2, we discuss all eight possible cases.
•Case 1.1: If in state 1, the three players (S3, S35, S33) start with C and also in state 2, the three players (S3, S35, S33) start with C.
•Case 1.2: If in state 1, the three players (S3, S35, S33) start with C and in state 2, players I and II (S3, S35) start with C but player III (S33) starts with D.
•Case 1.3: If in state 1, the three players (S3, S35, S33) start with C and in state 2, players I and III (S3, S33) start with C but player II (S35) starts with D.
•Case 1.4: If in state 1, the three players (S3, S35, S33) start with C and in state 2, players I (S3) starts with C but player II and III (S35, S33) start with D.
•Case 1.5: If in state 1, the three players (S3, S35, S33) start with C and in state 2, players II and III (S35, S33) start with C but player I (S3) starts with D.
•Case 1.6: If in state 1, the three players (S3, S35, S33) start with C and in state 2, players II (S35) starts with C but players I and III (S3, S33) start with D.
•Case 1.7: If in state 1, the three players (S3, S35, S33) start with C and in state 2, player III (S33) starts with C but players I and II (S3, S35) start with D.
•Case 1.8: If in state 1, the three players (S3, S35, S33) start with C and in state 2, the three players (S3, S35, S33) start with D.
From the previous case, we have two regimes
When we do the perturbation, we get the following: In regime R1, in columns 1 and 7, if S3, S35 and S33 play D instead of C, then regime R1 will not change in all these cases. In columns 2, 6 and 10, if S3 plays D instead of C and if S35 and S33 play C instead of D then regime R1 will transfer to R2.
In columns 3 and 9, if S3 plays C instead of D and if S35 plays D instead of C, then regime R1 will not change, but if S33 plays D instead of C, then regime R1 will transfer to R2. In columns 4, 8 and 12, if S3 and S33 play C instead of D and if S35 plays D instead of C, then regime R1 in all these cases will transfer to R2. In columns 5 and 11, if S3 and S35 play C instead of D, then regime R1 will transfer to R2, but and if S33 plays C instead of D, then regime R1 will not change. In regime R2, in column 1, if S3, S35 and S33 play D instead of C, then regime R2 in all these cases will not change. But in column 2, if S3 and S35 play C instead of D, then regime R2 will transfer to R1, and if S33 plays C instead of D, then regime R2 will not change. While in column 3, if S3 plays C instead of D and if S35 plays D instead of C, then regime R2 will not change, but if S33 plays D instead of C, then regime R2 will transfer to R1. In regime R3, in column 1, if S3 and S33 play C instead of D and S35 play D instead of C, then regime R3 in all these cases will transfer to R1. Also in column 2, if S3 plays D instead of C and if S35 and S33 play C instead of D, then regime R3 in all these cases will transfer to R1. Thus, the corresponding transition matrix becomes
By calculating the left eigenvectors for the eigenvalue 1, we get the following equations:
By solving the linear system of the previous equations from Equations (27) to (29) with the equation v1 + v2 + v3 = 1, then we obtain the eigenvector V as
Now, we can get the payoff values by
According to Equation (31), the payoff vector Π = (π1, π2, π3, π4, π5, π6) is equal to
Using the same approach depending on the algorithm in the Appendix section, we examined all possibilities for studying three strategies from the five WSLS strategies, which are 64 possibilities times eight sequences. Therefore, we obtained the results shown in the following tables from Tables 5 to 9.
The payoff for player I against player III when player II fixed with the strategy WSLS 1 (S3)
The payoff for player I against player III when player II fixed with the strategy WSLS 2 (S35)
The payoff for player I against player III when player II fixed with the strategy WSLS 3 (S33)
The payoff for player I against player III when player II fixed with the strategy WSLS 4 (S49)
The payoff for player I against player III when player II fixed with the strategy WSLS 5 (S48)
In this section, we will discuss the domination between the Tit-For-Tat strategies and the domination between the Win Stay-Lose Shift strategies for (3P-IPD). We note that S n is outcompeted by S m if both a nm > a nn and a mm > a nm , where a nn , a nm , a mn and a mm are elements of the payoff matrix. If the strategy S n is outcompeted by S m , we can write S n << S m .
The domination between TFT strategies according to Tables 1–4, is given as and the domination between WSLS strategies, according to Tables 5–9, is given as
Conclusion
We studied the effect of memory length on the strategies and payoffs of players in a repeated three-player prisoner’s dilemma game to examine the extent of its effect. This is done by studying the competition between strategies that have the same behavior (TFT and WSLS strategies) to each other. Based on Tables 1–11, and comparing these results with their similarity with memory-one in other papers [2, 17], we conclude that memory-two does not affect on the behavior of strategies in the three-player prisoner’s dilemma game.
A list of strategies outcompeting S n
A list of strategies outcompeting S n
Conflicts of interest
The authors declare no conflicts of interest regarding the publication of this paper.
Footnotes
Appendix
The following is the pseudo-code for our proposed parallel learning algorithm for simulating the iterated prisoner dilemma game with three players using the notion of memory.
Strategy1 ← Dic2bin(S1),
Strategy2 ← Dic2bin(S2),
Strategy3 ← Dic2bin(S3),
Strategies ← [S1,S2,S3],
Prop ← 8,
G-Index ← 25,
Cases ← Zeros(3,G-Index),
Results ← [ ],
Positions ← [5,3,2;6,4,1;7,1,4;8,2,3;1,7,6;2,8,5;3,5,8;4,6,7],
Payoffs ← [’P’, ’L’, ’l’, ’T’, ’S’, ’K’, ’k’, ’R’],
Payoffs ← Zeros(20,20);
Cases ← Play (C-1,Cases);
Cases ← Next (G,strategies,Cases);
Mem[C] ← Cases;
Blocks ← Merge (Mem[First],Mem[Second]);
Pay[R] ← Payoff (Blocks[R]);
Duplicate (Results [i,j]);
N ← Count (Regimes);
Size ← Length (Regimes (u));
First ← Getpos (Payoffs, Regimes [u,v]);
Second ← Getpos (Payoffs, Regimes [u,v+1]);
Change (First,Second);
Change (Second,First);
T[u,v]=Transition [u,v]/(3* Length (Regimes [u]));
Dist - V ← Solve (T);
Row ← Fpos;
Col ← Positions (Spos, w);
New ← Results (Row, col);
M ← Mem (Regimes, New);
Transition (u,M) ← Transition (u,M)+1;
where
- Zeros (n, m): initiate the elements of the given matrix nxm by zeros,
- Dic2bin (X): convert the input decimal X to its equivalent binary sequence of bits,
- Play (C): initialize the player’s moves for each probability index C,
- Next (I): return the moves of the next round of the given game index I,
- Merge (B1, B2): integrate the values of the given two sequences B1 and B2 into one sequence,
- Redundant (B): eliminate redundant payoffs from the block B,
- Payoff (r): compute the payoff vector from the moves of round r,
- Unique (R): delete repeated strings from the input set R,
- Count (R): count the number of strings constituting the given set R,
- Length (s): evaluate the length of the string s,
- Getpos (P, R): get the position of the payoff P for the regime R,
