Abstract
Previous findings on punishment have focused on deterministic environments in which the outcomes are known with certainty. In this article, we conduct experiments to investigate how punishment affects cooperation in stochastic social dilemmas where each person can decide whether to cooperate, when the outcomes of alternative strategies are specified probabilistically. Two types of punishment mechanisms are studied: (1) an unrestricted punishment mechanism—both persons can punish—and (2) a restricted punishment mechanism—only cooperators can punish noncooperators. We compare behavior in a two-person deterministic prisoner’s dilemma game (DPD) with a two-person stochastic prisoner’s dilemma (SPD). In all treatments, participants are given information on the other person’s actions. We find that in both games, the restricted punishment mechanism promotes more cooperative behavior than unrestricted punishment. However, the difference in the degree of effectiveness between the two mechanisms is smaller in the SPD game than in the DPD game because noncooperative behavior is less likely to be punished when there is outcome uncertainty. Our findings provide useful information for designing efficient incentive mechanisms to induce cooperation in a stochastic social dilemma environment.
Punishment can be used to enforce cooperation in social dilemma situations (Yamagishi 1986; Ostrom, Walker, and Gardner 1992). Controlled laboratory experiments reveal that individuals are often willing to incur costs to punish defectors, even in non-repeated games, and that this willingness to punish can be strong enough to enforce cooperation by others (Fehr and Gächter 2000). These studies of punishment have focused on deterministic outcome environments where agents’ actions determine the outcomes with certainty. In reality, however, outcomes in social dilemmas are often determined not only by agents’ actions but also by external uncertainty (Bereby-Meyer and Roth 2006; Kunreuther et al. 2009; Bendor 1987). Such stochastic social dilemmas include problems involving interdependent risks in naturally occurring environments such as airlines investing in security measures or countries deciding whether to implement the Kyoto protocol to reduce carbon emissions (Heal and Kunreuther 2005). In these stochastic social dilemmas, an agent often must decide whether to incur the costs of reducing its risk of experiencing a negative outcome, knowing that even if the agent invested in a risk reducing measure, it still faces the chance of an indirect loss from others who have chosen not to follow suit. Climate change does not respect political boundaries so that countries which do not reduce carbon emissions increase the likelihood of global warming.
An important feature of stochastic social dilemmas is that when an agent does not undertake risk reducing measures, the resulting outcome is uncertain. For example, there is uncertainty today on the impact that carbon emissions will have on temperature change, resulting sea level rise and flood damage. 1 In cases where a flood does not occur, the losses will be zero. We hypothesize that this feature of a stochastic social dilemma is important when considering the design of institutions to enforce cooperation, as we will elaborate subsequently.
To understand how people punish others when there is external outcome uncertainty, we study the impact of punishment on cooperative behavior in a two-person stochastic prisoner’s dilemma (SPD) game where the outcomes of alternative strategies are specified probabilistically. In this SPD game, two agents determine whether to incur a cost to invest in protection so as to reduce the risk of losses due to the occurrence of a particular negative event. We compare behavior in this environment with actions taken in a two-person deterministic prisoner’s dilemma (DPD) game where the certain loss in the DPD game is approximately the same as the expected loss from the negative event in the SPD game, given the actions of each player.
In both games, the negative event will not occur if both players invest. In the DPD game, both players will suffer losses whenever one player does not invest. In the SPD game, there is a well-specified probability that neither player will experience a loss even if they both decide not to invest in protection. It is important to note that the stochastic social dilemma situations studied in this article differ from those where there is uncertainty due to imperfect monitoring as to the actions of other agents (Aoyagi and Fréchette 2009; Ambrus and Greiner 2012; Grechenig, Nicklisch, and Thöni 2010; Patel, Cartwright, and Van Vugt 2010; Fudenberg, Rand, and Dreber 2012). In our experiments, there is perfect monitoring so that an agent knows what actions others have taken. 2
Although numerous studies have argued that introducing peer punishment can enhance cooperation in social dilemmas (Fehr and Gächter 2000; Ostrom, Walker, and Gardner 1992; Yamagishi 1988), recent research has also revealed that the unconstrained peer punishment mechanism has its limitations due to the possibility of antisocial punishment (i.e., punishing cooperators) and retaliatory behavior by those who are punished (See Casari and Luini 2009; Cinyabuguma, Page, and Putterman 2006; Denant-Boèmont, Masclet, and Noussair 2007; Dreber et al. 2008; Falk, Fehr, and Fischbacher 2005; Herrmann, Thöni, and Gächter 2008; Nikiforakis 2008; Rand et al. 2010; Wu et al. 2009; Gächter, Herrmann, and Thoeni 2010; Gächter and Herrmann 2011; Rand and Nowak 2011). 3 For example, Herrmann, Thöni, and Gächter (2008) investigate the peer punishment mechanism in public goods games using sixteen different subject pools from different cities around the world, such as Boston, Seoul, and Zurich. Their data show that peer punishment may not be effective in achieving its objective if those who cooperate are not protected from being punished. For this reason, attention has recently been focused on designing more restricted punishment mechanisms in the context of deterministic social dilemma problems where antisocial punishment is constrained (Casari and Luini 2009; Ertan, Page, and Putterman 2009; Faillo, Grieco, and Zarri 2010).
In view of these previous studies on punishment, we compare the effectiveness of two punishment options in SPD and DPD games with a prespecified number of periods. In option 1, each person can incur a cost to punish his or her counterpart at the end of any given period after learning what strategy the counterpart pursued in that period and the resulting outcomes to both players (henceforth BothPun). In option 2, only an individual investing in protection is allowed to incur a cost to punish a counterpart who has not invested in protection in that period (henceforth InvPun). The InvPun mechanism studied in this article reflects how punishment is applied in many real world settings. For example, in formal contractual relationships, when an individual reneges on his or her obligation, the victim often has the right to punish the defector.
This article contends that peer punishment mechanisms, even restricted ones, can be less effective in promoting cooperation in a stochastic social dilemma game than in the standard (deterministic) social dilemma game because it is unclear if a person’s actions should be punished if there is external uncertainty with respect to the resulting outcomes. More specifically, if a person’s noncooperative behavior does not necessarily impose an additional cost on one’s counterpart, then there may be a feeling that this person does not need to be punished. Previous studies have shown that punishment decisions are correlated with norm violations (Fehr and Gächter 2000; Fershtman and Gneezy 2001; Bicchieri 2006; Xiao 2013). In a stochastic environment, however, the outcome uncertainty may lead to normative conflict, as several norms may exist as to how one should behave in this situation. For example, with respect to climate change, some may think that countries should take the risk that there will not be a significant sea level rise in the future and thus not incur the cost of reducing carbon emissions. Others may feel that countries should adhere to the Kyoto Protocol to reduce the risk of future damage due to climate change. Previous research has shown that a punishment mechanism is less effective in promoting cooperation when normative conflict exists (Reuben and Riedl 2011; Nikiforakis, Noussair, and Wilkening 2012).
Punishment decisions are also related to the perceived negative intentions of the decision makers. In a stochastic environment, an individual may interpret noncooperative behavior as risk taking rather than indicating negative intentions with respect to others and thus reduce the desire to punish noncooperative behavior (Blount 1995; Nelson 2002; Offerman 2002; Charness and Levine 2007; Cushman et al. 2009).
For these reasons, it is not surprising that we find in our experiments that those who do not invest in protection are less likely to be punished in SPD games than in DPD games. Although InvPun mechanism is more effective than the BothPun mechanism in promoting cooperation, it is less effective when there is outcome uncertainty (i.e., in SPD games) than when they are specified with certainty (i.e., in DPD games). Interestingly, we do not observe differences in the effectiveness of the BothPun mechanism in SPD games and DPD games. In particular, we find that BothPun does not significantly increase cooperation in either game. One explanation is that the BothPun mechanism leads to retaliation toward the punisher, and less punishment means less retaliation. Therefore, less punishment does not necessarily lead to less cooperation. Our data also suggest that the occurrence of a negative event in an SPD game does not affect agents’ punishment decisions. We discuss the policy implications of our findings at the end of this article.
Literature Review
Stochastic Social Dilemma Games
Uncertainty can influence the outcomes of stochastic social dilemma games in two ways: (1) one cannot always determine a specific counterpart’s behavior based solely on the outcomes and (2) agents’ payoffs are affected by external risk as well as their own actions as illustrated by the global climate change problem discussed earlier.
The effectiveness of punishment on the first type of uncertainty has recently received much attention in the economic literature with most research focusing on unrestricted peer punishment mechanisms (similar to BothPun). For example, Patel, Cartwright, and Van Vugt (2010) show that the peer punishment mechanism is less effective when there is uncertainty regarding who is a free rider in a public goods game (also see Grechenig, Nicklisch, and Thöni 2010; Ambrus and Greiner 2012). Bornstein and Weisel (2010) show that punishment opportunities are not effective in promoting cooperation when information about individual endowments is incomplete in public goods games.
Taken together, these studies reveal that when the behavior (or intention) of the counterpart is not known with certainty, cooperators are more likely to be punished, and free riders are less likely to be punished than in social dilemmas without noise. None of these studies address the effectiveness of peer punishment in stochastic social dilemmas where agents’ behavior is transparent but there is outcome uncertainty. Furthermore, these studies did not compare the two types of punishment mechanisms examined in our experiments.
Bereby-Meyer and Roth (2006) examined games with outcome uncertainties (probabilistic prisoner's dilemma [PD] games) and compared them with deterministic PD games but did not study the impact of punishment on behavior. They focused on learning effects and argued that due to the noise of the payoffs, people learn to cooperate more slowly in the repeated probabilistic PD game than in the repeated deterministic PD game.
Restricted versus Unrestricted Peer Punishment Mechanisms and Cooperation
There is a large body of research on how introducing peer punishment can enforce cooperation (Yamagishi 1986, 1988; Ostrom, Walker, and Gardner 1992; Fehr and Gächter 2000; Dickinson 2001; Andreoni, Harbaugh, and Vesterlund 2003; Fehr and Fischbacher 2004; Fowler 2005; Xiao and Houser 2005, 2011; Carpenter 2007; Sefton, Shupp, and Walker 2007; Houser et al. 2008; also see Chaudhuri [2011] for a review).
Recently, researchers have drawn attention to the possibility of antisocial punishment and retaliation toward punishers when group members have the freedom to decide whether and who they want to punish if they are willing to pay the cost (Cinyabuguma, Page, and Putterman 2006; Dreber et al. 2008; Falk, Fehr, and Fischbacher 2005; Denant-Boèmont, Masclet, and Noussair 2007; Herrmann, Thöni, and Gächter 2008; Nikiforakis 2008; Nikiforakis and Engelmann 2011; Rand et al. 2010). Others have argued that to improve the effectiveness of punishment, it is important to restrict those who have the ability to punish (Herrmann, Thöni, and Gächter 2008). A body of research examined various forms of restricted punishment mechanisms (all in a deterministic environment) and suggests that punishment is more effective at promoting cooperation when it precludes the possibility of antisocial punishment that may emerge under the unrestricted punishment mechanism similar to the BothPun option. For example, Casari and Luini (2009) investigated a “consensual institution” in a public goods game whereby a request to punish a specific group member will be implemented only if at least two agents request such a punishment. Ertan, Page, and Putterman (2009) studied a public goods game where subjects can vote on who should be punished and found that this mechanism can promote cooperation compared to a procedure where there were no restrictions placed on who could be punished.
Faillo, Grieco, and Zarri (2010) conducted a repeated public goods experiment where a person could punish only those who contributed less than he or she had contributed. They found that the level of cooperation doubled compared with unrestricted punishment. Casari and Plott (2003) studied a mechanism where subjects could decide whether to pay a cost to monitor others, and free riders would be punished if discovered. They found this mechanism doubled the group’s total earnings compared to the case where no sanction was imposed. Andreoni and Gee (2012) show that people are willing to pay a delegated police officer to fine only the lowest contributor in a public goods environment and that this “hired gun mechanism” can avoid revenge and increase social welfare.
All these studies focus on deterministic environments and show how restricting the freedom of punishment can improve its effectiveness in inducing cooperation. We show in this article that when there is external uncertainty related to the outcome, a restricted punishment mechanism, InvPun, is less effective in promoting cooperation in an SPD setting than it is in a DPD environment, although still much more effective in both situations than the unrestricted punishment mechanism, BothPun.
Experiment
Design
Our experiment consists of six treatments determined by the baseline case and two punishment mechanisms applied to either a DPD game or an SPD game. In each treatment, subjects are told that they will anonymously play with the same person for ten periods after which they will be matched with a different participant to play the same game for another ten periods.
Baseline cases without punishment—In the baseline SPD game, two subjects are paired and play together for ten periods. At the beginning of each period, each subject is given 48 talers (2 talers = US$1). The two players must make a decision simultaneously on whether to invest in a risk-reduction measure to prevent a random negative event (with a loss of 24 talers). Subjects know the probability of the occurrence of a random event as a function of the actions of each player. If both players choose to Invest, then the negative event will not happen. The investment cost to each player is 12 talers. If only one player invests, then there is a 40 percent chance that the negative event will happen. Thus, there is a 40 percent chance the investor will lose 36 talers and the non-investor will lose 24 talers, and there is a 60 percent chance that the investor will lose 12 talers and the non-investor will lose 0 talers. If both choose to Not Invest, then there is a 64 percent chance that the negative event will happen in which case each player will lose 24 talers. Thus, there is a 36 percent chance that each will lose 0 talers in this situation. Kunreuther et al. (2009) and Gong, Baron, and Kunreuther (2009) examined the degree of cooperation outcomes in SPD games with different parameters reflecting the degree of outcome uncertainty. We chose these parameters based on previous results that showed the cooperation level was not very high in SPD games (Kunreuther et al. 2009; Gong, Baron, and Kunreuther 2009). This enables us to investigate how punishment mechanisms may promote cooperation. 4
In a DPD game without punishment treatment, the outcomes are the expected losses of the corresponding scenario of the SPD game. 5 In particular, if both players choose to invest, the negative event will not happen and each player only pays the investment cost of 12 talers. If only one player invests, it will cost the one who invests 12 talers. In addition, both players will suffer an equal financial loss of 10 talers from the negative event. If both choose not to invest, then each player suffers an equal financial loss of 15 talers from the negative event. At the end of each period, each player is informed about (1) his or her counterpart’s decision, (2) whether the negative event occurred, and (3) total losses to each player.
Treatments with Punishment—In the DPD_BothPun treatment, after each agent makes a decision on whether or not to invest, he or she sees the counterpart’s decision and his or her own current payoff. He or she then decides whether to punish the counterpart. The DPD_InvPun treatment differs from the DPD_BothPun treatment in that only those who have invested can punish a counterpart if the counterpart has not invested.
In the SPD_BothPun treatment, agents decide whether or not to punish their counterparts after they are informed of their counterpart’s decision and whether the negative event occurred. We reveal the outcome of the negative event to the subjects before they decide whether to punish because it enables us to determine the impact that the occurrence of a negative event may have on their punishment decision. In the SPD_InvPun treatment, an agent who invested can determine whether to punish a counterpart who has not invested after being informed whether the negative event occurred.
In all the treatments with punishment mechanisms, subjects must determine whether to punish the other person, and if so by how much, before learning about their counterpart’s punishment decision. In each period, a punisher can have up to 24 talers deducted from the counterpart’s payoff. Every 3 talers deducted from the counterpart costs the punisher 1 taler. In the BothPun treatments, subjects might have negative earnings in one or more periods. In this case, their earnings for that period were set at zero. At the end of each period, each subject learns what his or her earnings were in that period and whether his or her counterpart inflicted any punishment and, if so, its severity.
Procedure
We conducted the experiment at the Behavioral Lab at the Wharton School, University of Pennsylvania, by recruiting 310 subjects from the general student population at the University of Pennsylvania: 64 in SPD; 62 in SPD_BothPun; 46 in SPD_InvPun; 48 in DPD; 46 in DPD_BothPun, and 44 in DPD_InvPun. Each subject participated in only one treatment. The experiment was conducted using Z-tree (Fischbacher 2007). Subjects were told that, in addition to a fixed payment of US$10, they might receive a payment based on the outcome of one randomly selected period. About 20 percent of the subjects were randomly chosen at the end of the experiment and received their actual payoff from one random period in the game they played. Each subject was randomly assigned to one computer terminal. Before the experiment started, each subject completed an exercise to make sure he or she understood the payoffs under different strategies.
Hypotheses
Unlike the public goods game with punishment opportunities, where subjects are often unsure about who punished them (Fehr and Gächter 2000; Herrmann, Thöni, and Gächter 2008), those who receive punishment in two-player PD games always know that their counterpart punished them. This knowledge may lead to more retaliation than in a social dilemma game where there are three or more players. Furthermore, retaliatory punishment may lead to antisocial punishment where cooperators are punished when they inflict punishment on others (Falk, Fehr, and Fischbacher 2005; Herrmann, Thöni, and Gächter 2008; Nikiforakis 2008; Dreber et al. 2008; Rand et al. 2010).
In a repeated game, the punishment opportunity under the BothPun mechanism has a dual role: norm enforcing and retaliation. An individual is more likely to cooperate when he or she expects his or her counterpart to incur costs for punishing noncooperative behavior (i.e., norm-enforcing impacts). However, being punished can also lead to retaliation against the punisher. This retaliatory punishment is likely to lead to less cooperation. In fact, the BothPun mechanism can lead to less cooperation than in the baseline case with no punishment if the detrimental effect of retaliatory punishment is greater than the positive norm-enforcing effect of punishment. In contrast, the InvPun mechanism places constraints on antisocial and retaliatory actions since a person can punish only if he or she has cooperated and his or her counterpart has not. We thus hypothesize that InvPun will promote greater cooperation than will the baseline treatment and the BothPun mechanism.
Previous social dilemma studies with deterministic outcomes show that the noncooperators are much more likely to be punished than cooperators (Fehr and Gäcther 2000). In those studies, like the DPD game, noncooperative decisions will reduce the payoff for the group. In an SPD game, however, noncooperative behavior (in our experiment, to not invest) does not necessarily lead to lower earnings. In fact, both individuals earn the highest payoff if neither invests and the negative event does not occur. We hypothesize that, compared to a DPD game, this feature of an SPD game makes it less clear whether a person should be punished for not investing for the following reasons.
First, previous research has shown that normative conflict can influence the effectiveness of punishment mechanisms in enforcing cooperation (Reuben and Riedl 2011; Nikiforakis, Noussair, and Wilkening 2012). Normative conflict may arise in an SPD game due to outcome uncertainty. In particular, some people may approve of risk-taking behavior and view the decision to not invest as optimal whether or not other agents decided to invest in protection since agents may be spared losses even if they do not undertake these measures. Others are likely to view non-investment behavior as inappropriate since they are more likely to suffer large losses than if their counterpart had invested in protective measures.
Second, studies have found the decision to punish to be positively correlated with perceived negative intentions of the counterpart (Blount 1995; Nelson 2002; Offerman 2002; Charness and Levine 2007). Outcome uncertainty in the SPD game clouds the intentions underlying the non-investment decisions of the counterpart in the SPD game: noncooperative behavior might simply reflect the counterpart’s risk preferences that are unknown to the other player. In contrast, a noncooperative decision in a DPD game clearly reveals the individual’s intention. 6
More specifically, we hypothesize that under the BothPun mechanism in a DPD game, non-investors are more likely to be punished than are investors. However, the difference in the probability of being punished when one invested and when one did not invest is smaller in the SPD game than the DPD game. Under the InvPun mechanism, non-investors are less likely to be punished in the SPD game as compared with those in the DPD game.
We next discuss the implication of the above hypotheses on the effectiveness of the two punishment mechanisms in promoting cooperation in SPD and DPD games. The difference in punishment behavior between SPD and DPD environments may have a dual impact on the effectiveness of the BothPun punishment mechanism in promoting cooperation. More specifically, compared with DPD games, less implementation of punishment on non-investors in SPD environments may diminish the norm-enforcing function of punishment, but it also may lead to less retaliation. Thus, the relative effectiveness of the BothPun mechanism depends on the magnitude of the two effects in the SPD and DPD games.
In contrast, the InvPun mechanism places constraints on antisocial and retaliatory punishments. We expect that when subjects are less likely to punish their counterparts who did not invest, the norm-enforcing function of punishment will be diminished. Therefore, we predict that the InvPun mechanism is less effective in promoting cooperation in the SPD game than in a DPD game because a non-investor is less likely to be punished when outcomes are stochastic than when they are known with certainty.
Results
Each subject played the game with one partner for ten periods and with a different partner for another ten periods. Data in both ten-period sequences support both hypotheses, although we observe some differences between the two sequences. In this section, we first report the aggregate analysis of investment rates in each treatment and then investigate how subjects make their punishment decisions and how investment decisions are affected by punishment in the previous period from the data in the first ten periods where subjects did not have any experience. 7
Aggregate Analysis
Figure 1 plots the dynamics of investment rates over time in each treatment. It shows that in both the DPD and SPD environments, the InvPun mechanism achieves a significantly higher cooperation rate in almost every period than the BothPun mechanism. Table 1 reports the average investment rate in each treatment. In support of Hypothesis 1, the InvPun mechanism has a much higher investment rate than the BothPun mechanism, especially in the DPD game (SPD game: 65.43 percent vs. 48.23 percent, Mann–Whitney test, p = .14; DPD game: 84.77 percent vs. 51.53 percent, Mann-Whitney test, p < .01). 8

Investment rate over period by treatment.
Investment Percentage and Average Earnings by Treatment.
Note: SPD = stochastic prisoners’ dilemma; DPD = deterministic prisoners’ dilemma. Standard error (SE) is calculated using each pair as one observation (Obs.).
*This column reports the p value of Mann–Whitney test of the investment percentage (or average earnings) between the punishment treatment and the baseline without punishment treatment under each condition. We calculate the average investment rate (or average earnings) of ten periods for each pair. We count each pair as one observation.
Table 1 also shows that DPD_InvPun achieves a statistically significantly higher investment rate than the SPD_InvPun treatment (84.77 percent vs. 65.43 percent, Mann–Whitney test, p = .01) thus supporting Hypothesis 2. The investment rate in the corresponding baseline SPD and DPD treatments is approximately the same. We also conducted a tobit regression analysis using each pair’s average investment rate over the ten periods as the dependent variable and the six treatments’ dummy variables as the independent variables are reported in Table 2. We find the difference between the coefficients of SPD_InvPun and SPD significantly differs from that between the coefficients of DPD_InvPun and DPD (F-test, p = .04). These results provide additional confirming evidence that the InvPun mechanism is more effective in promoting cooperation in a DPD game than in an SPD game. Such differences in the cooperation rate between a DPD and an SPD game are not observed under the BothPun mechanism (F-test, p = .70).
Tobit Regression Analysis of Average Investment Rate of each Group.
Note: SPD = stochastic prisoners’ dilemma; DPD = deterministic prisoners’ dilemma; obs. = observations; SE = standard error.
***Significant at 1 percent level. **Significant at 5 percent level. *Significant at 10 percent level.
Previous studies suggest that although punishment can promote cooperation, it may not improve overall efficiency due to the cost of punishment incurred by both the punishee and the punisher (Fehr and Gächter 2000; Ostrom, Walker, and Gardner 1992; Yamagishi 1988; Dreber et al. 2008). We also examine whether the two punishment mechanisms promote efficiency by comparing average earnings across treatments. As shown in Table 1, the average earnings (in talers) are lower in DPD_BothPun than in DPD although the difference is not statistically significant (33.5 vs. 30.8). The earnings DPD_InvPun are about the same as DPD and are also significantly higher than DPD_BothPun treatment (33.6 vs. 30.8, Mann–Whitney test, p = .05). This suggests that in our deterministic social dilemma environment, InvPun mechanism is not only more effective in promoting cooperation than BothPun mechanism but also more efficient. In the stochastic environment, the average earnings (in talers) are 33.3 in SPD and 30.9 in SPD_BothPun and 33.1 in SPD_InvPun. Similar to the deterministic environment, earnings are lowest under the BothPun mechanism. However, none of the pairwise comparisons are significant (Mann–Whitney test, p > .10).
Overall, our data suggest that the restricted punishment mechanism is not only more effective in promoting cooperation but may also reduce the potential efficiency losses due to the cost of punishment than may unrestricted punishment especially in deterministic environments. The low earnings in the BothPun treatments may be attributed to the high frequency of punishment and the negative reaction toward punishment (i.e., a lower willingness to invest) under BothPun mechanisms, as we will show next.
Individual Analyses of Punishment Decisions
Descriptive analyses—We observe a substantial amount of punishment behavior in all treatments (Supplemental Appendix A reports descriptive data on the frequency of punishment decisions and their magnitude in the SPD and DPD games). Under the BothPun mechanism, the frequency of punishment is about 18 percent and the average amount of talers players paid to punish the counterpart is about 0.9 in both SPD and DPD games. Under the InvPun mechanism, the frequency of punishment drops by more than half, about 8 percent, and the average amount of talers paid to punish is only around 0.4. The data suggest that punishment tends to be much less frequent and less severe in the InvPun mechanism than in the BothPun mechanism.
Figure 2 plots the percentage of investors/non-investors who received punishment in each treatment. We separate the case when the subject punished the counterpart in the previous period from the case when he or she did not. 9 The comparison of these two cases within each treatment indicates to what degree punishment may be triggered by retaliation. Figure 2 shows that in the BothPun option, the frequency of receiving punishment is much higher when one punished the counterpart in the previous period than when one did not punish the counterpart, but this is not the case in the InvPun option for both the DPD and SPD games. Thus, our data show that the InvPun mechanism significantly reduces retaliatory punishment. For example, in the SPD_BothPun treatment when subjects did not invest, they are punished 70 percent of the time (42 of 60 cases) if they inflicted punishment on their counterparts in the previous period. In contrast, if the non-investors did not punish their counterparts in the previous period, they are punished only 8 percent of the time (19 of 239 cases). This difference suggests that the implementation of punishment is dominated by retaliatory motives.

Frequency of receiving punishment in period t under each condition.
Figure 2 also suggests that, in the BothPun treatments, there is a substantial amount of antisocial punishment. For example, investors who punished their counterparts in period t−1 are punished 76 percent of the time (31 of 41 cases) in period t in the SPD_BothPun treatment and 42 percent of the time (11 of 26 cases) in period t in the DPD_BothPun treatment.
Regression analyses—Statistical evidence for the presence of retaliatory punishment is provided by a random individual effect ordered probit regression analysis of punishment decisions. We find an individual is significantly more likely to punish his or her counterpart when he or she was punished in the previous period (χ2 test, p < .01; see Supplemental Appendix B, Table B1 for the regression result).
One potential difference between SPD and DPD games is that noncooperators may receive less punishment in the SPD game if individuals’ decisions on whether to impose a costly punishment depend on whether they experienced a negative outcome. To examine whether this form of outcome bias (see Baron and Hershey 1988; Cushman et al. 2009) exists, we compute the frequency of punishment when a loss occurred and when it did not in an SPD game. In the SPD_BothPun treatment, following the occurrence of a negative event, the frequency of punishment is 17 percent compared to 19 percent when the negative event did not happen. In the SPD_InvPun treatment, for the cases where one invested and the counterpart did not invest, about 62 percent of investors punished the non-investor counterpart when the negative event occurred and about 53 percent of investors did so when the negative event did not occur. We conducted a random individual effect ordered probit regression analysis of individuals’ punishment decisions and found the occurrence of a loss does not have a significant effect on the punishment decisions (see Table B2 for the regression results). 10 This result suggests that an outcome bias in judgment and choices does not necessarily extend to punishment decisions in repeated stochastic social dilemma environments.
One possible explanation based on Hypothesis 2 is that the uncertainty of a negative outcome in SPD game might create a normative conflict, so that it is less clear whether an invest or not-invest decision should be punished in the SPD environments. That is, in the stochastic environment, people may interpret defection as risk taking rather than a norm violation or an indication of negative intentions. Thus, even if the outcome bias leads people to think the noncooperative decision is worse when the bad outcome occurs, it may not impact on their punishment decision as it is considered risk-taking behavior rather than being morally wrong.
To test Hypothesis 2, we compare the difference in the proportion of subjects being punished when they invested and when they did not invest in the SPD and DPD environments for both punishment mechanisms. Supporting Hypothesis 2, the difference in the fraction of subjects being punished when they invested and when they did not invest is much smaller in the SPD_BothPun treatment (21 percent of non-investors and 16 percent of investors were punished) than in the DPD_BothPun (31 percent of non-investors and 8 percent of investors were punished). Under the InvPun mechanism, when the counterpart did not invest, about 86 percent in the DPD game, and only 56 percent in the SPD game were punished.
To provide statistical evidence for the difference in punishment behavior between SPD and DPD games, using the subject’s punishment amount as the dependent variable, we conducted a random individual effect ordered probit regression analysis in SPD_BothPun and DPD_BothPun. We use three categories to ensure that the number of observations in each cell is sufficiently large. The dependent variable is (CPunAmtReceivedi,t ), which is set at (1) “0” if subject i received no punishment, (2) “1” if subject i received a positive punishment amount that is no more than twelve, and (3) “2” if subject i received punishment exceeding twelve. We include the subject’s investment decision as the independent variable (Inv i,t and NotInv i,t ) and allow different coefficients for each treatment (SPD_BothPun or DPD_BothPun). Given the results of retaliatory punishment mentioned previously, we also allow different coefficients for the cases when the subject punished the counterpart in the previous period (Punisher i,t− 1 = 1) and those where he or she did not (NPunisher i,t− 1 = 1; see Table 3, Regression 1). 11
Random Individual Effect Ordered Probit Regression Analysis of Punishment Amount Received.
Note: Inv i,t = 1 if i invested in period t; = 0, o.w. NotInv i,t = 1 if i did not invest in period t; = 0, o.w.
NPunisher i, t = 1 if i did not punish the counterpart in period t; = 0, o.w. Punisher i, t = 1 if i punished the counterpart in period t; = 0, o.w. The baseline for each regression is the case when subject i in stochastic prisoners’ dilemma treatments did not invested in period i and punished the counterpart in period t−1; SE = standard error; obs. = observations.
aIncludes only the cases where the individual did not invest but his or her counterpart invested in the current period.
***Significant at 1 percent level. **Significant at 5 percent level. *Significant at 10 percent level.
We found the difference between the coefficient of Invi,t * NPunisheri,t− 1 * SPD_BothPun (−1.70) and NotInvi,t * NPunisheri,t− 1 * SPD_BothPun (−1.09) to be significantly less than the difference between the coefficient of Invi,t * NPunisheri,t− 1 * DPD_BothPun (−2.02) and NotInvi,t * NPunisheri,t− 1 * DPD_BothPun (−0.42; χ2 test, p = .02). 12 This result suggests that when subjects did not punish the counterpart in the previous period (NPunisheri,t− 1 = 1), the difference in the punishment amount received by investors (Invi,t = 1) compared with those received by non-investors (NotInvi,t = 1) is significantly smaller in the SPD environment than in the DPD environment. This result from BothPun conditions provides statistical evidence supporting Hypothesis 2.
We ran a similar regression analysis to compare the punishment amount by investors to non-investors in the SPD_InvPun and DPD_InvPun treatments (see Table 3, Regression 2). We find again that when subjects did not punish a counterpart not investing in the previous period, they are less likely to be punished in the SPD_InvPun than the DPD_InvPun treatment. More specifically, the coefficient of NPunisheri,t− 1 * SPD_InvPun is significantly smaller than that of NPunisheri,t− 1 * DPD_InvPun (0.15 vs. 1.51, χ2 test, p < .01). Thus, the results from InvPun treatments also support Hypothesis 2.
In summary, consistent with previous studies (Nikiforakis 2008; Dreber et al. 2008), we find evidence of retaliating punishment in DPD games. In addition, we find retaliating punishment occurs in SPD games. Interestingly, the occurrence of the negative event does not affect punishment decisions. More important, punishment behavior suggests that it is less clear whether a counterpart should be punished in the SPD environment than in the DPD environment.
Effect of Punishment on Investment Decision
We ran a random individual effect probit regression analysis on each individual’s investment decision in period t for each punishment treatment to determine whether the magnitude of the punishment increased or decreased the likelihood of a person investing in the next period. The independent variables include whether the individual i invested in period t−1 (Invi,t− 1), whether the counterpart invested in period t−1 (CtpInvi,t− 1), and the amount of punishment that individual i received in the period t−1 (PunAmtReceivedi,t − 1). As we pointed out earlier, punishment may have a detrimental effect when it is antisocial (i.e., punishment is imposed on the investors); therefore, we allow the investor and the non-investor to have different coefficients for this variable (NotInvi, t− 1 * PunAmtReceivedi,t − 1 and Invi, t− 1 * PunAmtReceivedi,t − 1) in the BothPun treatment for the SPD and DPD games. 13
The regression results reported in Table 4 suggest that the coefficient of Invi,t− 1 * PunAmtReceivedi,t− 1 (i.e., the amount of punishment subject i received in the previous period when he or she invested) is significantly negative in both the SPD_BothPun and DPD_BothPun treatments (p < .01). This implies that under the BothPun mechanism, the larger the punishment incurred in periodt−1 when a person invested, the less likely the person will invest in period t.
Random Individual Effect Probit Regression Analysis of Investment Decisions.
Note: Inv i,t− 1 = 1 if i invested in period t; = 0, o.w; CtpInv i,t = 1 if i’s counterpart invested in period t; = 0, o.w; NotInv i,t− 1 = 1 if i did not invest in period t; = 0, o.w; PunAmtReceived i,t− 1: the punishment amount imposed on subject i by his or her counterpart in period t−1; SE = standard errors; obs. = observations.
***Significant at 1 percent level. **Significant at 5 percent level. *Significant at 10 percent level.
On the other hand, as shown in Table 4, if one did not invest in period t−1, the punishment amount does not have any significant positive effect on investment decisions in period t. More specifically, the coefficient of NotInvi,t− 1 * PunAmtReceivedi,t− 1 in the SPD_BothPun (0.01) is not significant and, in fact, is marginally significantly negative in the DPD_BothPun treatment (−0.03). In contrast, the coefficient of PunAmtReceivedi,t− 1 is positive in both the SPD_InvPun and DPD_InvPun treatments although it is significant only in the SPD_InvPun treatment (p = .04). The latter finding implies that in an SPD game, under the InvPun mechanism, the larger the punishment amount to a non-investor in period t−1, the more likely the person will invest in period t.
Conclusion
We conducted experiments to investigate the impact of peer punishment on promoting cooperation in stochastic social dilemma environments where the payoffs are decided not only by agents’ behavior but also by exogenous uncertainty. In particular, we studied two types of punishment mechanisms, an unrestricted one where both individuals can punish (BothPun) and a restricted one where only investors can punish non-investors (InvPun), and compared behavior with a baseline case of no punishment. We find the InvPun treatment increases cooperation relative to the baseline case and the BothPun treatment in the DPD and SPD games. However, the InvPun mechanism is less effective when there is external uncertainty related to the outcome.
Our study contributes to the understanding of how exogenous outcome uncertainty affects people’s decisions in imposing and reacting to peer punishment. We provide supporting evidence for the hypothesis that non-investors are less likely to be punished in a more realistic stochastic environment than when outcomes are known with certainty. This finding suggests that for peer punishment mechanisms to be more effective in a stochastic social dilemma environment, it may not be enough to exclude antisocial punishment. When outcomes are uncertain, it is important to convey clear normative messages to the community as to desirable social behavior. For example, one may want to indicate that risk-taking behavior with respect to problems such as climate change can have negative global consequences and should be discouraged. Future studies are needed to explore how different instruments that help to convey normative messages can be applied with or without punishment mechanisms to enforce cooperation in stochastic social dilemma environments. 14 It would also be interesting to investigate how a culture of risk taking may impact on implementing global treaties or protocols designed to promote cooperation.
Outcome uncertainty is an important feature of many decision problems in naturally occurring environment such as decisions to reduce carbon emissions so as to attenuate the impacts of climate change. Our findings highlight the importance of considering the impact of outcome uncertainty when designing institutions to promote cooperation. It would be useful to conduct future studies to examine how the likelihood of different outcomes influences punishment behavior of institutions to promote cooperation in SPD environments. For example, how will the desire to punish noncooperators vary with change in the probability of different outcomes occurring?
Footnotes
Acknowledgements
We thank Min Gong, Daniel Houser, David Krantz, Nikos Nikiforakis, Roberto Weber, and participants at seminars at 2009 North-American ESA; Center for Research on Environmental Decisions at Columbia University for valuable comments on an earlier version of the paper. We gratefully acknowledge partial support for this paper provided by the Wharton Risk Management and Decision Processes Center; the Climate Decision Making Center (CDMC) located in the Department of Engineering and Public Policy (Cooperative Agreement between the NSF (SES-0345798) and Carnegie Mellon University); CREATE (National Center for Risk and Economic Analysis of Terrorism Events; and NSF Cooperative Agreement SES-0345840 to Columbia University’s Center for Research on Environmental Decisions (CRED).
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
