Abstract
The ephemeral-reward task involves providing subjects a choice between two distinctive stimuli, A and B, each containing an identical reward. If A is chosen, the food associated with A is obtained and the trial is over. If B is chosen, the food associated with B is obtained, but the food associated with A can be obtained as well. Thus, the food-maximizing solution is to choose B first. Although cleaner fish (wrasse) and parrots easily acquire the optimal response, choosing B, several primate species, as well as rats and pigeons, do not. To account for these paradoxical findings, we hypothesized that certain species do not associate the choice and reinforcement with the second reinforcement because they often respond impulsively to the seemingly equal, initial choice. To test this hypothesis, we separated the initial choice from the first reinforcement by imposing a 20-s delay between the choice and its outcome. Under these conditions, both pigeons and rats acquired the optimal choice response. We suggest that impulsive choice may make it difficult to acquire certain tasks, and imposing a delay between choice and outcome may decrease impulsivity and, paradoxically, allow for optimal task performance. The general implications of this finding are discussed.
When certain relatively simple learning tasks, such as serial reversal learning, have been used to compare the learning ability of different species, intuition about animal intelligence has been confirmed in many cases (e.g., Bitterman, 1965). Species assumed to be more like humans typically perform better than those that are less like us; apes tend to show more improvement over reversals than monkeys, monkeys more than rats, rats more than pigeons, and so on.
One apparent exception to this general intuition is the performance by different species on an apparently simple task we call the ephemeral-reward task. With this task, animals are given a choice between two distinctive stimuli (e.g., Plate A and Plate B) each containing an identical food reward. If, for example, the animal chooses the food on Plate A, Plate B is removed and the trial is over. But if the animal chooses the food on Plate B, it can also have the food on Plate A (Bshary & Grutter, 2002; Salwiczek et al., 2012). With this task, the food-maximizing solution is always to choose the food on Plate B because it results in two rewards per trial, whereas if the animal chooses the food on Plate A, it receives only one reward.
Bshary and Grutter (2002) found that bluestreak cleaner wrasse (Labroides dimidiatus), a species of fish that cleans the mouth and gills of larger fish species (i.e., clients), are generally able to learn to choose optimally in under 100 trials. However, several primate species, including capuchin monkeys (Cebus apella), orangutans (Pongo spp.), and several chimpanzees (Pan troglodytes), when trained on this task, were surprisingly unable to learn to choose optimally in a comparable time (Salwiczek et al., 2012). Salwiczek et al. attributed the optimal performance by fish to the natural foraging behavior of this reef-dwelling species. They suggest that when a client fish visits the reef, it is an ephemeral resource and should be serviced immediately, or it will be serviced by other wrasse, whereas reef-dwelling clients are relatively permanent and can be serviced later. Thus, the wrasse has learned to service the “ephemeral” client first (the optimal alternative) rather than the “residents.” Primates, however, live in quite a different environment that does not encourage optimal choice. For example, they typically do not have food that they can leave and come back to later because of likely competition from conspecifics, and this potential competition may encourage them to be impulsive.
This hypothesis was called into question by the finding that grey parrots also find this task relatively easy, and, according to Pepperberg and Hartsfield (2014), the parrots’ natural feeding environment is more like that of primates. Pepperberg and Hartsfield suggested that because parrots are small organisms that typically have a high metabolism, the energetic costs associated with a wrong decision might have greater consequences on their survival than it might for larger animals. This is why they must pay particular attention in the task, whereas the larger primates can better afford the energetic cost of making the wrong decision. Alternatively, Prétôt, Bshary, and Brosnan (2016b) suggested that optimal performance might depend on the presence of more ecologically relevant cues or on better generalization from the experience of the wrasse in natural contexts, but it is not clear what those cues might be or why generalization should have been better for the wrasse and parrots than for the primates.
Pepperberg and Hartsfield (2014) made the more specific suggestion that fish and parrots choose between the two plates with their mouths (which requires sequential choice), whereas primates choose with their two hands, but the two food items cannot be simultaneously acquired. The presence of two identical food pieces that cannot be simultaneously acquired possibly triggers a level of frustration or stress (Salwiczek et al., 2012).
To test the hypothesis that animals that have only a single means of choosing can more easily learn this task, we tested pigeons using two different procedures (Zentall, Case, & Luong, 2016). In the first experiment, we used the manual presentation of the alternatives (as was done with the other species). Each alternative, A or B, was a uniquely colored (yellow or blue) disk containing a single dried pea. If the pigeon chose the pea on the A disk, the B disk was removed and the trial was over. But if the pigeon chose the pea on the B disk, it was free to eat the pea on the A disk as well (the optimal color was counterbalanced across subjects). We were surprised that the pigeons were not only unable to learn to choose the ephemeral alternative that allowed them to have both peas but also showed a significant preference for the suboptimal alternative—the one that provided them with a single pea. Apparently, choosing with mouths or beaks is not the distinguishing characteristic of species that can easily acquire this task.
In Experiment 2, we replicated the phenomenon with pigeons in an automated (operant) chamber in which pecking each color provided the pigeons with access to a grain feeder for 2.0 s. Once again, we found a significant tendency to choose the suboptimal alternative, even after 400 trials of training. Clearly, the pigeons did not appear to be associating the second reinforcer with their initial choice. But why did they show a significant preference for the suboptimal alternative? One possibility is that at the start of training, they would have received twice as many reinforcers immediately following a peck to the suboptimal alternative (see Fig. 1a). That is, every trial ended with reinforcement associated with the suboptimal alternative (see also Salwiczek et al., 2012).

Design of Experiment 3 in Zentall, Case, and Luong (2016). For the control group (a), before learning occurred, twice as many reinforcements (RFs) were associated with the suboptimal choice (Yellow) as with the optimal choice (Blue), and all trials ended with a response to the suboptimal choice. For the experimental group (b), the optimal choice and suboptimal choice were associated with an equal number of reinforcements. The arrows indicate, for an optimal choice, the progression from one reinforcer to the next.
To test this hypothesis, in Experiment 3, we arranged the contingencies such that after initial choice of the optimal alternative and during reinforcement, the color of the suboptimal alternative changed to white, and a peck to the white stimulus was reinforced (see Fig. 1b). Thus, there should be no inherent bias, because given initial random choice, yellow, blue, and white would be associated with equal reinforcement. Preferences by this group were compared with those of a control group that experienced no change in color. Once again, the control group showed a significant preference for the suboptimal alternative; contrary to our prediction, however, although the experimental group chose the optimal alternative significantly more than did the control group, it did not acquire a significant preference for the optimal alternative (see Fig. 2).

Pigeons’ performance on the ephemeral-reward task performed in the automated (operant) chamber. Choice of the optimal alternative is presented as a function of trial block. For the color-change group, choice of the optimal alternative provided reinforcement and changed the color of the other alternative to white. For the control group, there was no color change. Data are from Experiment 3 in Zentall, Case, and Luong (2016). Error bars indicate ±1 SEM.
Given the unexpected results with pigeons, we then examined whether rats, another species extensively studied by psychologists, would be able to acquire this task. Alternatively, they might show suboptimal choice similar to that of pigeons or, because they are mammals, they might show behavioral indifference to the two alternatives, as did primates. In our experiment (Experiment 1 in Zentall, Case, & Berry, 2017a), we used a procedure similar to that of Experiment 1 in Zentall et al. (2016). Although the rats also failed to learn to choose the optimal alternative in over 800 trials, unlike the pigeons, they did not show a significant preference for the suboptimal alternative (i.e., they performed at chance levels).
It appears that some species treat the task as a choice between two immediately present reinforcers and fail to consider the relation between the second reinforcement and the original choice. It occurred to us that this effect may be indirectly related to the phenomenon of delay discounting (Ainslie, 1975). In delay discounting, subjects are given a choice between a small immediate reinforcement (e.g., one pellet) and a larger but delayed reinforcement (e.g., four pellets). Although it is usually optimal to choose the larger, later reinforcer, under a variety of conditions, subjects choose the smaller, sooner one (Green & Myerson, 1995). The ephemeral-choice task is somewhat different because initially the two alternatives provide equal reinforcement, and, given the optimal choice, the second reinforcement is delayed, if only by 1 to 2.5 s. Nevertheless, the delay may be sufficiently long to reduce the likelihood that the reinforcement that follows the response to the second stimulus becomes associated with the initial choice of the optimal alternative.
A review of the delay-discounting literature provides a possible procedural variation on delay discounting that not only may explain why certain species have so much trouble acquiring this task but also may identify conditions under which they can acquire it. Rachlin and Green (1972) describe a procedure in which pigeons that normally choose the suboptimal smaller, sooner reinforcer will show better “self-control” and prefer the larger, later reinforcer. Rachlin and Green required pigeons to choose whether, after a few seconds, they wanted (a) to make a choice of the smaller, sooner reinforcer or the larger, later reinforcer or (b) to receive the larger, later reinforcer (without a choice). That is, the pigeons were allowed to make a “commitment” to the larger, later reinforcer, and so they had no opportunity to impulsively choose the smaller, sooner reinforcer when it became available. By avoiding the smaller, sooner reinforcer, they could ensure that they would receive the larger, later reinforcer. When the prior commitment time was sufficiently long, most of the pigeons committed to the larger, later reinforcer.
In the ephemeral-reward task, both alternatives appear to provide equal, immediate amounts of food; the optimal alternative provides additional food, but only after a short delay. To determine whether prior commitment might facilitate optimal choice by reducing pigeons’ natural impulsivity, we examined whether they would be more likely to integrate the two reinforcements if the time between choice and the first reinforcement was longer. In this experiment (Zentall, Case, & Berry, 2017a), we used a 20-s fixed-interval schedule; that is, we inserted a delay of 20 s between the initial choice and the first reinforcer). If the optimal alternative was chosen, a reinforcer was provided after the delay, and a single peck (fixed ratio 1) to the remaining stimulus provided a second reinforcer; if the suboptimal alternative was chosen, reinforcement followed after the delay, but the trial was over. We included a control group in which the standard choice involved a single peck; we also controlled for the total duration of the trial.
The results of this experiment were clear. On the one hand, the control group showed the characteristic significant preference for the suboptimal alternative—although with continued training (forty 10-trial sessions), choice of the suboptimal alternative by the control group approached chance level. On the other hand, the experimental group showed significant evidence of learning to make the optimal choice (see Fig. 3), although the pigeons did not acquire the task nearly as quickly as the fish or the parrots. Apparently, making a prior commitment allowed the pigeon to integrate the two reinforcers that occurred when the optimal alternative was chosen.

Pigeons’ performance in a delayed-reinforcement version of the ephemeral-reward task. Choice of the optimal alternative is presented as a function of session. Pigeons in the 20-s fixed-interval (FI 20s) group chose a color, and a reinforcer was delivered for the first response 20 s later. Choosing the optimal alternative resulted in reinforcement after the delay, and a single peck to the remaining stimulus resulted in a second reinforcer; if the suboptimal alternative was chosen, reinforcement followed the delay, but the trial was over. Pigeons in the fixed-ratio 1 (FR1) group had to make a single peck to obtain initial reinforcement. Error bars indicate ±1 SEM. Data are from Zentall, Case, and Berry (2017b).
Because pigeons showed optimal choice when reinforcement was delayed, we decided to investigate whether rats would also benefit from the separation between the choice and the first reinforcement. Rats were required to complete a 20-s fixed-interval schedule to receive the first reinforcement, but only a single response was required to obtain the second (Zentall, Case, & Berry, 2017b, Experiment 2). Our results were very similar to the results found with pigeons (see Fig. 4). It appears that the insertion of a delay between choice and reinforcement allowed the rats to choose optimally as well. These findings are somewhat counterintuitive, in that one typically thinks of delay of reinforcement as not being conducive to optimal learning (e.g., see Capaldi, 1978), but in this case, the delay is between a choice and reinforcement, and the delay made it less likely that the choice would be made impulsively.

Rats’ performance in a delayed-reinforcement version of the ephemeral-reward task. Choice of the optimal alternative is presented as a function of session. Rats had to complete a 20-s fixed-interval schedule to obtain initial reinforcement. Error bars indicate ±1 SEM. Data are from Zentall, Case, and Berry (2017a).
Why species such as wrasse and parrots appear to choose optimally when trained on this task without the need for the delay of reinforcement remains an open question. It may be that wrasse, which often swim into the mouths of larger predatory fish, have learned to make careful choices because impulsive choices may have serious consequences, especially given that the cleaners sometimes bite as they are cleaning (Gingins, Werminghausen, Johnstone, Grutter, & Bshary, 2013); consequently, the cleaners must naturally show more “self-control” (see Bshary & Grutter, 2005;). It may be that animals that do not choose impulsively are more likely to integrate the first and second reinforcement in the ephemeral-reward task.
But what about parrots? It should be noted that the parrots in the Pepperberg and Hartsfield (2014) study had extensive prior training on a variety of tasks. One parrot had been exposed to “continuing studies on comparative cognition and interspecies communication” (p. 299), whereas the other two had received considerable training on referential communication. It is possible that this training had the effect of reducing their natural impulsivity. In fact, one of the parrots in the Pepperberg and Hartsfield study was found to show great impulse control when given a choice between an immediate desirable reward and a delayed (by as much as 15 min), more desirable reward (Koepke, Gray, & Pepperberg, 2015). It remains to be seen whether parrots that have not had a long history of training with other tasks would also show the same optimal choice with this task. However, this cannot explain the poor performance of the primates used as subjects in the Salwiczek et al. (2012) research because they too had been trained on a variety of cognitive tasks.
Despite their presumed superior intelligence, primates did not readily learn to choose optimally when trained on the original ephemeral-reward task. But primates are also known for their impulsive choice (for a review of self-control by primates, see Beran, 2015). There is evidence, however, that they can learn to choose optimally with this task under certain conditions. Prétôt, Bshary, and Brosnan (2016a) modified the ephemeral-reward task by presenting the stimuli on a computer monitor and requiring monkeys to use a joystick to respond by moving a cursor to the selected stimulus, and when they did so, the monkeys received rewards at a pellet dispenser; this procedure provided some separation between the initiation of the response and the reinforcement—rather than reaching directly for the food, as they did in the Salwiczek et al. (2012) study, they had to move the cursor to the chosen alternative, click on it, and then reach to the feeder for a pellet. It also prevented the monkeys from attempting to reach for multiple items simultaneously. The absence of the experimenter during testing and the absence of other distractions may also have played a role.
In another experiment, Prétôt et al. (2016b) found that monkeys could learn to choose optimally if each reinforcer was placed under a distinctively colored cup, a procedure that also separated the choice from immediate reinforcement, albeit only briefly. In another experiment, the authors tested the monkeys with visible food, but the food was distinctively colored pink or black. It may be that giving the food unusual colors reduced impulsive choice by making the food appear novel, thus causing the monkeys to choose more cautiously. Alternatively, it may be that the distinctive color of the food caused the monkeys to pay closer attention to the choice that they made.
The separation of choice from reinforcement also may be relevant to the acquisition of other tasks. In an often-cited study by Boysen, Berntson, Hannan, and Cacioppo (1996), a chimpanzee was trained on a reversed-reward contingency task. The chimpanzee was offered a choice between two plates of candy, one of which always had more pieces than the other. Because of the reversed-reward contingency, the chimpanzee always received the candy on the unchosen plate. Despite extensive training, however, she was unable to consistently choose the plate with fewer candies and thereby obtain the contents of the plate with more candies. As it happened, this chimpanzee had been trained to symbolically represent the number of objects in a set with the use of Arabic numerals (Boysen & Berntson, 1989). That is, she had learned the association between Arabic numerals and the number of objects that each represented. When the task was modified such that the choice was between two Arabic numerals, the chimpanzee quickly learned the reverse contingency. Once again, removing the candies from view may have reduced the impulse to choose the larger number of candies and allowed the chimpanzee make the more optimal choice.
Discussion
The ephemeral-reward task is a relatively novel task that provides somewhat surprising results: Wrasse and parrots acquire the optimal response quite easily, whereas primates, pigeons, and rats do not. The ecological account proposed by Salwiczek et al. (2012), that wrasse need to learn to service ephemeral visitors to the reef before servicing resident clients, does not appear to account for the parrots’ ability to acquire the optimal response (Pepperberg & Hartsfield, 2014). Furthermore, the hypothesis that fish and parrots choose with their mouths whereas primates choose with their hands does not explain why pigeons, which choose with their beaks, do not easily acquire the optimal response. We propose that impulsive choice may be responsible for the failure to acquire this task. When we forced pigeons and rats to choose 20 s before the first reinforcement was presented, we found that both species learned to choose optimally. Paradoxically, although delay of reinforcement is typically thought to retard task acquisition, in certain cases, the separation of the choice response from reinforcement may actually facilitate acquisition by reducing impulsivity.
These results may have implications for several kinds of maladaptive human behavior that appear to result from impulsive decision making due to temptation, such as addictive commercial gambling (see Zentall, Andrews, & Case, 2017) and excess caloric intake (obesity). If humans can be taught to make a commitment at a time well before the to-be-avoided reward becomes available (the time of temptation) it may be possible to avoid making suboptimal impulsive choices (Ainslie & Haendel, 1983; Ainslie & Monterosso, 2003).
Recommended Reading
Rachlin, H., & Green, L. (1972). (See References). A study showing that with a delay-discounting procedure, impulsive choice can be remedied by inserting a delay between the choice and the outcomes.
Salwiczek, L. H., Prétôt, L., Demarta, L., Proctor, D., Essler, J., Pinto, A. I., Wismer, S., Stoinski, T., Brosnan, S. F., & Bshary, R. (2012). (See References). A study showing that with the ephemeral reward task, cleaner fish maximize rewards, whereas under similar conditions, most primates do not.
Zentall, T. R., Case, J. P., & Berry, J. R. (2017a). (See References). A study showing that in the ephemeralreward task, inserting a delay between choice and the first reward enabled pigeons to choose optimally.
Zentall, T. R., Case, J. P., & Berry, J. R. (2017b). (See References). A study showing that rats (like pigeons) were unable to maximize rewards on the ephemeralreward task until a delay was inserted between choice and the first reward.
Zentall, T. R., Case, J. P., & Luong, J. (2016). (See References). An article reporting that pigeons were unable to maximize rewards on the ephemeral-reward task and that although training artifacts contributed to the deficit, pigeons still did not learn when the artifacts were eliminated.
Footnotes
Acknowledgements
The authors wish to acknowledge Jonathan R. Berry and Jasmine Luong, who were instrumental in collecting the data discussed in Zentall, Case, and Luong (2016) and Zentall, Case, and Berry (2017a, 2017b).
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
