Abstract

Recently, Bostyn, Sevenhant, and Roets (2018) assessed the real-world predictive power of hypothetical trolley-type dilemmas. Participants responded to such dilemmas and then made a real decision about harming one mouse versus five mice. The authors report that the trolley-type judgments did not predict the real decisions. We regard their research as valuable and endorse their most general conclusion: Studying hypothetical judgments cannot replace studying real decisions. However, a closer look at their data casts doubt on their central claim. Moreover, their research strategy reflects a common misunderstanding of what makes trolley dilemmas most useful.
Bostyn et al.’s hypothetical dilemmas employed a nonstandard response format. In nearly all research using trolley-type dilemmas, participants evaluate only the proposed utilitarian action (e.g., pushing the man off the footbridge to save five lives) and do not separately assess the deontological alternative (e.g., not pushing). Because participants give only a single judgment, their responses are inherently comparative, accounting for both “horns” of the dilemma. Bostyn et al., however, had participants separately evaluate the utilitarian and deontological options. Having done this, the most natural approach would have been to calculate a difference score to model the relative appeal of the two options according to each participant. This reflects the logic of their experiment, which was aimed at predicting a real choice between a utilitarian option and a deontological option. A logistic regression using difference scores reveals marginally significant evidence that hypothetical judgments predict real judgments—odds ratio, or OR = 1.56, z = 1.77, p = .077, a significant effect with a one-tailed test (p = .038) based on a clear directional prediction; with age and gender controls as in Bostyn et al.’s study: OR = 1.62, z = 1.86, p = .063, one-tailed p = .031.
Bostyn et al. took a different approach. They included both measures separately in their regression and report that evaluations of the hypothetical utilitarian options did not significantly predict the mouse-shocking decisions (p = .41). However, they mention only in an endnote that participants’ evaluations of the hypothetical deontological options were marginally significant predictors of mouse shocking (z = −1.75, p = .081, one-tailed p = .040). Participants’ evaluations of the hypothetical utilitarian and deontological options are equally relevant predictors in asking whether hypothetical judgments predict real judgments. The results described above (marginal or not) are inconsistent with claiming strong evidence for the null hypothesis. Repeating Bostyn et al.’s Bayesian analysis with the difference score (scaled, as per Gelman, Jakulin, Pittau, & Su, 2008) yields a BFH0 of 0.95 with a 95% credible interval for the regression coefficient of 0.00–1.89, indicating no evidence in favor of the null hypothesis. Thus, while their data provide no strong evidence that hypothetical trolley judgments predict real mouse-shocking decisions, their data also provide no evidence for the null hypothesis asserted by Bostyn et al.
Our broader concern, however, is with a widespread misunderstanding of what trolley-type dilemmas are supposed to do. What is most interesting about trolley dilemmas is the contrast between cases (Thomson, 1985). In the switch case, people reliably approve of hitting a switch that will turn a trolley away from five people and toward one person. In the footbridge case, people reliably disapprove of pushing one person off of a footbridge in order to save five people. Why such different answers? And what does this say about our moral thinking?
The dual-process theory (Greene, Sommerville, Nystrom, Darley, & Cohen, 2001; Shenhav & Greene, 2014) provides an answer: In response to both cases, people engage in simple, cost–benefit reasoning favoring action. But in the footbridge case, the harmful action is more emotionally salient, generating a competing response that makes most people disapprove (or approve reluctantly; cf. Bostyn et al.’s significant results for “doubt” and response times). This theory has received strong support from studies using manipulations targeting specific processes (e.g., Crockett, Clark, Hauser, & Robbins, 2010; Shenhav & Greene, 2014; Trémolière, De Neys, & Bonnefon, 2012) and studies of clinical populations with process-specific deficits, including patients with ventromedial prefrontal cortex and hippocampal lesions (Ciaramelli, Muccioli, Làdavas, & di Pellegrino, 2007; Koenigs et al., 2007; McCormick, Rosenthal, Miller, & Maguire, 2016), psychopathy (Koenigs, Kruepke, Zeier, & Newman, 2012), and frontotemporal dementia (Mendez, Anderson, & Shapira, 2005).
Critically, these studies focus on dissociating processes that exist within healthy people. This explains why people are so puzzled when they first confront the switch and footbridge cases together. Recently, however, some researchers have assumed that trolley-type dilemmas, in order to be useful, must make reliable predictions about differences between people, either as moral personality tests (Bartels & Pizarro, 2011; Kahane, 2015; Kahane et al., 2018; Kahane, Everett, Earp, Farias, & Savulescu, 2015) or as laboratory surrogates for real-world decisions (Bauman, McGraw, Bartels, & Warren, 2014; Kahane et al., 2015). This reflects a misunderstanding of what trolley dilemmas do best and what the dual-process theory is trying to explain—akin to criticizing the Müller-Lyer illusion for failing to predict people’s visual acuity.
But should trolley dilemmas not tell us something about real-world behavior? They should, and they do—indirectly. Psychopaths and various lesion patients have real-world moral deficits, and they respond to trolley dilemmas in ways that are precisely predicted by the dual-process theory, with affective deficits leading to more utilitarian judgment in footbridge-like cases (Bartels & Pizarro, 2011; Ciaramelli et al., 2007; Koenigs et al., 2012; Koenigs et al., 2007; Mendez et al., 2005). Likewise, a recent lesion-based network analysis credits the dual-process theory with explaining patterns in damage leading to criminal behavior (Darby, Horn, Cushman, & Fox, 2017). And contra the claims of Kahane et al. (2015), Conway, Goldstein-Greenwood, Polacek, and Greene (2018) have shown that utilitarian judgments also reflect prosocial motivations in healthy people.
Trolley-type dilemmas are best understood as high-contrast cognitive probes (like flashing checkerboards) that can dissociate processes within people, not as moral personality tests or surrogates for real-world emergencies. They can serve as individual-differences measures, especially with process dissociation (Conway & Gawronski, 2013; Conway et al., 2018), and there is some evidence (in addition to that presented above) that trolley dilemmas can predict real individual behavior (Dickinson & Masclet, 2018). But even if there were no individual variation to explain—for example, if everyone said “yes” to switch-type cases and “no” to footbridge-type cases—trolley dilemmas would retain their original interest and purpose. Their greatest value lies not in their ability to explain our moral differences, but in their ability to reveal the fault lines running through our shared capacity for moral cognition.
Footnotes
Action Editor
D. Stephen Lindsay served as action editor for this article.
Author Contributions
D. Plunkett analyzed the data. D. Plunkett and J. D. Greene interpreted the data, wrote the manuscript, and approved the final version of the manuscript for submission.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
