Abstract

Bostyn, Sevenhant, and Roets (2018) were not the first to compare hypothetical and real-life trolley problems (see Gold, Colman, & Pulford, 2014; Gold, Pulford, & Colman, 2014, 2015). Bostyn et al. changed the victims to mice and the harm to electric shock. Some of the participants decided whether to press a button to redirect an electric shock from a cage containing five mice to a cage containing one; others imagined that they were faced with the same decision and were asked “Would you press the button?” The researchers found that 84% of participants actually pressed the button, compared with only 66% who predicted that they would.
This finding is consistent with studies showing that people are bad at predicting their own behavior, including moral decisions. Loewenstein (1996, 2000, 2005) has argued that there is a hot–cold empathy gap: In affectively “cold” states, people fail to appreciate fully how “hot” states will affect their preferences and behavior. People also overestimate how bad a negative visceral feeling can be (Kang & Camerer, 2013). This could explain why participants say that they will sacrifice more money to spare others from mild electrical shocks than they actually do when the shocks are real (FeldmanHall et al., 2012), and it could also explain Bostyn et al.’s results.
We should be careful about generalizing from decisions about causing pain to decisions involving other types of harm. In a previous study (Gold, Colman, & Pulford, 2014), we found the opposite result to Bostyn et al.’s in a trolley problem in which the victims were children in a Ugandan orphanage and the harm was losing a preallocated meal: In a real decision, 80% of British and 49% of Chinese participants clicked a switch to save five children from losing meals at the cost of one other child losing a meal instead, whereas in a hypothetical version of the task, 91% of British and 73% of Chinese participants predicted that they would click the switch. One obvious difference between the two studies is that Bostyn et al.’s involved harming mice, whereas ours involved harming people. Bostyn et al. state that “recent research suggests that there is a symmetry between how people tend to treat animals and other humans” (p. 1091), but this may not apply to trolley-type dilemmas. Preferences for redistributing pain are different from preferences for redistributing money (Story et al., 2015), but there is evidence that moral judgments in hypothetical trolley problems involving economic harms follow the same patterns as moral judgments involving life-or-death decisions and other major physical harms (Gold, Pulford, & Colman, 2013).
Bostyn et al. did not answer the research question that motivated their article: “whether subjects’ hypothetical moral judgments are predictive of the actual behavior they would display in a dilemma-like situation in real life” (p. 1085). Most researchers using hypothetical moral dilemmas take moral judgment as the dependent variable of interest, but Bostyn et al. did not elicit moral judgments in their hypothetical and real-life scenarios. They argued that “hypothetical-dilemma research, while valuable for understanding moral cognition, has little predictive value for actual behavior and that future studies should investigate actual moral behavior along with the hypothetical scenarios dominating the field” (p. 1084). Their argument fails to acknowledge that investigating moral judgments in and of themselves—the approach taken in most trolley-problem research—is a valid and theoretically important approach to the study of moral reasoning, irrespective of the judgment–behavior discrepancy. Further, had Bostyn et al. elicited moral-appropriateness judgments in their hypothetical and real-life mouse dilemmas, then they would have been on firmer ground in making claims about a judgment–behavior discrepancy than they were by showing that behavior was not correlated with a measure of “moral preferences” for consequentialist reasoning derived from a hypothetical-dilemma battery.
Using the paradigm involving meals for Ugandan children described above, we elicited both actions and moral judgments about real-life trolley decisions (Gold et al., 2015). Participants could intervene in an on-screen animation to change which children lost their meals. In addition to testing a condition that corresponded to Bostyn et al.’s version of the trolley problem, we also tested the well-known footbridge variation, in which five children could be saved by dragging a photo of a single child into the path of the threat. Consistent with previous research, our results showed a difference in moral judgment between the standard and footbridge versions, with significantly more participants judging the action permissible in the standard version than in the footbridge version. However, we found no significant difference between the two versions in actions actually performed—clicking the switch in the standard version or dragging the photo in the footbridge version—although this nonsignificant difference cannot be interpreted as evidence for the null hypothesis in actions actually performed. Nevertheless, actions were significantly different from permissibility judgments, and in a regression model, rightness judgments were related to behavior. It seems possible—even likely—that hypothetical moral judgments have predictive power for actual behavior.
The relationship between moral judgments and moral decisions appears to be complex. To add to the complexity, moral judgments may differ depending on whether they are made on the basis of reading a vignette about a dilemma or actually viewing the dilemma in real life. In an experiment designed to compare moral judgments in a hypothetical trolley problem with moral judgments in a corresponding real-life scenario (where, in both, the victims were quiz participants who could lose their winnings), we found differences in moral judgments between the hypothetical and the real-life scenarios (Gold, Pulford, and Colman, 2014).
Some researchers have suggested that differences between moral judgments and moral actions indicate that judgment and action are underpinned by two separate processes (Tassy, Oullier, Mancini, & Wicker, 2013). Another obvious alternative is that what people consider morally permissible does not exhaust the factors that they actually consider when making moral decisions (Gold et al., 2015).
The reasons for these discrepancies need to be investigated further but preferably without deception of participants. In Bostyn et al.’s experiment, mice were not shocked, even in the “real-life” version, although the participants were told that they would be. Deception is increasingly avoided in research on judgment and decision making, and in behavioral and experimental economics it has been prohibited since the 1990s (e.g., Davis & Holt, 1993). Experimental rigor relies on participants believing what they are told by experimenters, and the avoidance of deception in experimental economics has been cited as one of the principal reasons why economic experiments are replicated more successfully than psychological experiments (Camerer et al., 2016). Bardsley et al. (2010) explained the potential effects of deception: If deceptive practices are used by other researchers, knowledge of that fact might spread among a local subject pool (e.g., by word of mouth). More worryingly, it could spread more widely as knowledge of experimental method is disseminated, for example through journals, and teaching in which experimental research is discussed. The importance of this transmission route should not be underestimated, given that experimental subjects tend to be drawn from among university students. (p. 283)
Epley and Huff (1998) provided compelling evidence that participants who are deceived become suspicious and that their suspicion remains elevated for several months. Reviewing all the evidence a decade later, Hertwig and Ortmann (2008) reported, “We found evidence that suspicion has the potential to adversely impact research outcomes, both in the experiment at hand and in subsequent studies” (p. 81).
Another reason to avoid attempting to deceive experimental participants is that such attempts can easily fail without the experimenters realizing it. The Milgram experiment is now so well known that one wonders whether participants in Bostyn et al.’s experiment really believed that shocks would be delivered in the real-life condition. They reported in postexperimental debriefing sessions that they felt uncomfortable (p. 1089), but that is hardly convincing evidence. In deceptive experiments, we always need to consider who was deceived: the participants or the experimenters?
Footnotes
Action Editor
D. Stephen Lindsay served as action editor for this article.
Author Contributions
A. M. Colman drafted the manuscript, N. Gold and B. D. Pulford made revisions, and all authors approved the final version for submission.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
Funding
Preparation of this Commentary was supported by an award from the Arts and Humanities Research Council of the United Kingdom (Grant No. AH/H001158/1).
