Abstract
A change in motivational state does not guarantee a change in operant behaviour. Only after an organism has had contact with an outcome while in a relevant motivational state does behaviour change, a phenomenon called incentive learning. While ample evidence indicates that this is true for primary reinforcers, it has not been established for conditioned reinforcers. We performed an experiment with rats where lever-presses were reinforced by presentations of an audiovisual stimulus that had previously preceded food delivery; in the critical experimental groups, the audiovisual stimulus was then paired a single time with a strong electric shock. Some animals were reexposed to the audiovisual stimulus. Lever-presses yielding no outcomes were recorded in a subsequent test. Animals that had been reexposed to the audiovisual stimulus after the aversive training responded less than did those that had not received reexposure. Indeed, those animals that were not reexposed did not differ from a control group that received no aversive conditioning of the audiovisual stimulus. Moreover, these results were not mediated by a change in the food’s reinforcement value, but instead reflect a change in behaviour with respect to the conditioned reinforcer itself. These are the first data to indicate that the affective value of conditioned stimuli, like that of unconditioned ones, is established when the organism comes into contact with them.
Nothing ever becomes real till experienced—even a proverb is no proverb until your life has illustrated it.
Contrary to both folk psychological and influential historical (e.g., Hull, 1930, 1943) perspectives, behaviour is not a direct function of changes in primary motivation. Changes in primary motivation produce changes in operant behaviour only if the animal has had contact with the contingent outcome while in the relevant motivational state (Adams & Dickinson, 1981; Balleine, 1992; Changizi et al., 2002; Dickinson & Balleine, 1994; Dickinson & Dawson, 1989; Lopez et al., 1992). For example, Balleine (1992, Exp. 3) investigated the role of incentive learning in a factorial design in which he manipulated (1) whether rats were given a unique food reinforcer while sated before operant training, and (2) whether rats were sated prior to a final extinction test. Rats sated at test, but had not been preexposed to the outcome while sated, pressed the lever at the same high rate as rats under food restriction at test. Only the rats which had been preexposed to the food while sated reduced lever-pressing at test. These findings suggest the insufficiency of any explanation of learning dependent solely on the primary motivational state of the animal. Operant behaviour is also a function of the animal’s past response to outcomes themselves.
Taste devaluation provides another illustrative example (see, for example, Logue, 1979; Parker, 2003). Taste aversions are formed by repeatedly inducing (e.g., by an injection of LiCl) gastric distress following consumption of a food item. This results in a reduction in both consummatory responding and the operant behaviour upon which the food is contingent. There are diverging hypotheses regarding why behaviour should be impacted by the devaluation treatment. One possibility, the signalling hypothesis, suggests that the animal learns to avoid the food because it becomes a predictor of illness. The hedonic-shift hypothesis, in contrast, suggests that the initial affective value of a food item, itself learnt via pairing of the sensory properties of the food with nutritive gastric feedback, is altered by pairing the same sensory properties with toxicity-induced gastric feedback. In other words, devaluation effects result from a change in how the food tastes (Balleine, 2011). Both the signalling and hedonic-shift hypotheses make the same behavioural prediction for cases in which food and illness are repeatedly paired. However, when only one food–illness pairing occurs, the animals will have only tasted the food prior to being sick (i.e., when its affective and incentive value were intact). The hedonic-shift hypothesis predicts that animals will not alter their lever-pressing frequency until they are reexposed to the food; that is, until they come into consummatory contact with the food after the devaluation pairing. At that point, toxicity-induced feedback reduces the food’s incentive value. In contrast, the signalling account suggests that a single pairing of food and illness should be sufficient for rats to predict illness following the food, thus eliminating the need for reexposure. Research has supported the hedonic shift account of the outcome devaluation effect and has weakened the case for the signalling hypothesis (e.g., Balleine, 2001; Balleine & Dickinson, 1991, 1992); but see also Paredes-Olay & López, 2000; Rescorla, 1992).
This suggests that the animal’s behaviour with respect to the food itself is a critical factor governing operant behaviour supported by food. Compelling support for the role of behaviour in revaluation was provided by Balleine et al. (1995). They concurrently established two distinct operant contingencies (e.g., reinforcing lever presses with sugar and chain-pulls with salt) before injecting animals with LiCl, which produced gastric distress.During reexposure, one outcome was consumed following the injection of an anti-emetic, which reduces the disgust response. The other was consumed after a vehicle injection. On a subsequent choice test, they found that animals performed more of the behaviour previously reinforced by the outcome reexposed under the anti-emetic. By chemically blocking disgust-related behaviour, the researchers successfully prevented the reduction of the reinforcing properties of the outcome (see also Balleine et al., 1994).
The bulk of relevant research in incentive learning, including the cases discussed here, has investigated shifts in value by direct manipulations of gustatory outcomes. Examples of incentive learning with other forms of primary reinforcers are rare (e.g., Everitt & Stacey, 1987; Hendersen & Graham, 1979). Furthermore, it is unknown whether direct responding to conditioned stimuli (i.e., secondary or conditioned reinforcers) is necessary to change the behaviour established by them. A conditioned reinforcer gains control of operant behaviour by virtue of its covariance with a primary reinforcer; for example, the sound of a clicker (e.g., Gillis et al., 2012; Pryor, 2019) may strengthen desirable behaviour if it is occasionally accompanied by food (i.e., a primary reinforcer). If an established conditioned reinforcer is paired once with an aversive event, does it then need to be presented again before a reduction in responding occurs by virtue of the reinforcer’s new relation to an aversive stimulus? Without such reexposure to the conditioned reinforcer, the animal’s response to it will never have reflected anything but its appetitive value. Therefore, the operant behaviour that had already been established by the conditioned reinforcer may be unaffected until and unless the conditioned reinforcer is reexposed. To date, no evidence has established whether this occurs. To our knowledge, the only study to attempt to modify the conditioned reinforcer did so by devaluing the primary reinforcer with repeated LiCl injections (Parkinson et al., 2005). The researchers found that the conditioned reinforcer still maintained operant behaviour, though Pavlovian conditioned approach, supported directly by the food, was indeed attenuated.
We investigated whether reexposure to a devalued conditioned reinforcer is necessary for operant performance to reflect a changed value of the reinforcer. In the first phase (see Table 1), an audiovisual stimulus C was paired with food, establishing a conditioned reinforcer. Lever-presses were then reinforced via contingent presentations of C in the second phase. After operant responding was established, the rats were divided into four groups for the next phase: In Groups Paired and Paired-Re, a single trace pairing of C with a strong shock (S*) was delivered 1 ; Group Control also received single presentations of C and S*, but separated by several minutes; animals in Group Two-Paired were given two pairings of C with shock. In the fourth phase, Group Paired-Re was reexposed to C alone prior to a test in which rats in all groups could lever-press in extinction. A signalling account predicts that a conditioned reinforcer, devalued once, need not be reexposed to result in reduced lever-pressing. A hedonic account of behaviour, however, suggests that the incentive value of the conditioned reinforcer will remain unchanged without reexposure to C. Two pairings of C and S* in Group Two-Paired allows for reexposure to C following the first C–S* pairing and should reduce the reinforcing properties of C similar to the effects of repeated pairings of the primary reinforcer with LiCl in a taste aversion procedure. With reduced contiguity between C and S* in Group Control (after, for example, Mahoney & Ayres, 1976), we predicted no change in the reinforcing characteristics of C. With no contact with C subsequent to the shock trial and thus, no opportunity for the animals to learn about its new aversive properties, we expected Group Paired to lever-press more at test than the two groups reexposed to C (Paired-Re and Two-Paired).
The design of the experiment.
C was a stimulus compound of a tone and light presented concurrently; LP indicates the opportunity for lever-pressing across both active and inactive levers, with LPa denoting active lever presses and LPi denoting inactive presses; arrows indicate serial contingent presentations of stimuli; “/” indicates when events were unpaired with respect to each other; “−” indicates an absence of punctate stimuli. Each group contained eight animals. There were four female rats in each of groups Unpaired and Two-Paired; three females in Group Paired; and five females in Group Paired-Re.
Method
Subjects
Subjects included 32 (16 male, 16 female) experimentally naïve Long-Evans rats (Rattus norvegicus) obtained from Invigo Laboratories (Indianapolis, IN) and 5 months old at onset of the experiment. All subjects were maintained in pairs with another animal of the same sex. Housing consisted of translucent plastic tubs with a substrate of wood shavings in a vivarium maintained on a 12 hr dark/12 hr light cycle, experimental manipulations were conducted during the light cycle. A progressive food restriction schedule was imposed over 3 weeks prior to the beginning of the experiment until all rats were within 80%–85% of their free-feeding weight. Rats were weighed three times a week during the experiment to make sure they did not fall outside the food deprivation percentages. All animals were handled daily for 30 s during the 7 days prior to the initiation of the study. All research was conducted in accordance with Texas Christian University’s Institutional Animal Care and Use Committee.
Apparatus
All tests occurred in standard operant chambers measuring 30 × 25 × 20 cm (l × w × h) housed within a sound and light-attenuating environmental isolation chest (Med Associates, Fairfax, VT). The walls and ceiling of the chamber were composed of clear Plexiglas. The floor was constructed of stainless-steel rods measuring 0.5 cm in diameter, spaced 1.5 cm centre-to-centre, which allowed for the presentation of shock (1 mA with a 4-s duration). The chamber was equipped with a food dispenser that delivered chocolate-flavoured pellets (Bioserv, 50% sucrose w/w). The operant chamber included two retractable levers on either side of the magazine spaced 16.5 cm apart from one another (measured centre-to-centre) and 6.4 cm above the grid floor. A light was located approximately 4 cm above the left lever, as was a speaker that produced a pure 3,000 Hz tone. The houselight was located approximately 13 cm above the right lever. Ventilation fans in each enclosure and a white noise generator on a shelf outside of the enclosure created a consistent background noise of 75 dB (A).
Procedure
Magazine training
For the first 2 days, the rats were trained to eat from the food magazine. All sessions began with a 120-s period in which no food was delivered and lasted for a maximum duration of 50 min. Single food pellets were then delivered on a variable time (VT) 60-s schedule. The houselight remained illuminated throughout the duration of the session.
Phase 1: Pavlovian
Trials were defined by the houselight being extinguished and the onset of audiovisual stimulus C, consisting of a 5-s presentation of tone and light. This was followed immediately by the delivery of a pellet and the illumination of the houselight. Ten trials per session were delivered on a VT 120-s schedule. The total duration spent in the magazine in the 30-s before and the 5-s during the audiovisual stimulus was recorded. A ratio of the responses during the tone over total (i.e., tone+[pretone/6]) responses were calculated for each rat—values above 0.50 indicate that the rat entered the magazine more during the tone than before it. There were a total of eight sessions, each lasting for 25 min.
Phase 2: conditioned reinforcement
For two consecutive sessions, rats were placed in the operant box with both levers extended. Lever presses at one location were reinforced by presentation of C on a random ratio (RR)-2 schedule, while presses on the other lever did not produce any programmed events. The position of the active and inactive levers was counterbalanced across subjects. Each session lasted for 60 trials or 30 min, whichever came first. Magazine entry durations were recorded in seconds. All rats were required to press each lever at least once during this phase to qualify for Phase 3. Subjects were then assigned to four groups (see Table 1) matched by their overall number of lever presses during this phase.
Phase 3: conditioned reinforcer devaluation
During this single session, the levers were retracted and no food pellets were delivered. For each of the paired groups the audiovisual stimulus was presented for 5-s, followed by a 5-s interstimulus interval (ISI), and then by the onset of a 4-s, 1-mA unscrambled footshock. For the Two-Paired group, a 3-min interval was implemented following the termination of the first shock, followed by a second pairing of the audiovisual stimulus and the shock. For the Unpaired group, the tone and the shock were separated by an ISI of 3 min. This session was 10 min in duration for all animals.
Phase 4: reexposure
Subjects were placed in the operant chamber with the lever retracted. The audiovisual stimulus was presented once, 5 min into the session, for the Paired-Re Group. Subjects in all other groups received no stimulus presentations. Sessions lasted 10 min.
Revaluation test
On the following day, both levers were extended and lever press responses recorded for 30 min. Lever press responses were nonreinforced. Lever presses and nose poke durations for the session were recorded.
Results
Phase 1: Pavlovian acquisition
These analyses were conducted using the group assignments (Unpaired, Paired, Paired-Re, and Two-Paired) established after performance matching from Phase 2. During Phase 1, single sample t-tests indicated that discrimination ratios for each group during the last two sessions were significantly above chance (0.5), ts ⩾ 4.48, ps ⩽ .001. As expected, discrimination ratios improved across training with no differences between groups. A mixed-measures analysis of variance (ANOVA) with Group and Block (first four vs. last four sessions) as factors revealed a main effect of block on discrimination ratio, F(1, 28) = 128.71, p ⩽ .001,
Phase 2: conditioned reinforcement
These analyses were conducted using the group assignments established after performance matching. One rat in Group Unpaired did not press either lever in Phase 2, and was thus eliminated from this and all subsequent analyses. A Group × Block (Day 1 vs. Day 2) × Lever (active vs. inactive) three-way mixed ANOVA was conducted on the number of lever presses and, as expected, revealed no main effect of group, F(3, 28) = 0.02, p = .99,
Revaluation test
An outlier analysis was performed on the total number of lever presses that took place at test. Any animal deviating more than three z-scores from the pooled mean was eliminated; one animal in Group Two-Paired (z = 3.97) was removed before the following analyses were performed.
A one-way ANOVA revealed that groups significantly differed in the number of lever presses emitted during test, F(3, 26) = 5.82, p = .004,

(a) Mean lever presses at test across groups. Error bars indicate standard errors of the mean. (b) Suppression ratio as a function of reexposure to the conditioned reinforcer during the reevaluation test. The dotted line indicates the level at which response rates in Phase 2 and test are equivalent (i.e., an absence of suppression). A value of 0.0 indicates complete suppression of responding during test. White boxes indicate scores in the second quartile within group; grey boxes indicate those in the third quartile; and their junctions indicate median scores. Error bars represent the range of scores.
The number of active lever presses for Phase 2 was calculated by taking the average of the total number of lever presses during Sessions 1 and 2. Suppression ratios, which account for individual differences in baseline responding, were calculated for each rat. This was accomplished by dividing the number of active lever presses during the revaluation test by the sum of active lever presses during the revaluation test and the average number of active lever presses during Phase 2. A one-way ANOVA with group as the between-subjects factor analysed differences in active lever press suppression ratios. The results revealed a significant main effect of group, F(3, 26) = 7.32, p = .001,
A one-way ANOVA with group as the between-subjects factor compared the mean durations of magazine entry across groups (MUnpaired = 215 s, SEM = 67.2 s; MPaired = 182 s, SEM = 21.6 s; MPaired-Re = 179 s, SEM = 48.2 s; MTwoPaired = 216 s, SEM = 64.7 s) at test. There were no significant differences between groups, F(3, 27) = 0.92, p = .45,
Discussion
Operant performance established by conditioned reinforcement was initially unaffected by a devaluation procedure consisting of a single pairing of the conditioned reinforcer C and shock. Rats in Group Paired continued to press the lever at a similar rate to those in which C was unpaired with respect to shock (i.e., Group Unpaired). However, rats that received the same devaluation procedure but were reexposed to C following devaluation exhibited a reduction in lever pressing comparable to that found in animals that received multiple C→S* pairings (i.e., Group Two-Paired). This is, to our knowledge, the first demonstration that the value of conditioned reinforcers is established by means similar to primary reinforcers (i.e., food; for example, Balleine, 1992). If primary reinforcers are established by pairing the sensory properties of a food with nutritive gastric feedback, then perhaps the conditioned reinforcer becomes an added part of the sensory and perceptual properties of food, in a manner like that of the taste of the food. Directly behaving with respect to a conditioned reinforcer following a single revaluation opportunity is necessary to modify the subsequent behaviour upon which it is contingent.
Numerous dimensions characterise stimuli. Some of these dimensions can be described in terms of physical units (e.g., their brightness, loudness, size, or duration in time). Other aspects of stimuli are best described in terms that relate to their effects on behaviour. Stimuli have motivational properties (see, for example, Konorski, 1967). Physical and motivational properties are typically entwined. For example, the physical features of a food item may come to be predictive of its motivationally relevant features (e.g., caloric density). Gastric feedback (e.g., disgust) after ingestion necessarily follows contact with a food’s flavour. Consequently, the item must be reexperienced following any change in the motivational properties of that outcome before behaviour previously established by the reinforcer will be affected. This experiment extends the generality of this relationship to conditioned reinforcers, which have little by way of unconditioned motivational characteristics.
Animals must be reexposed to a conditioned reinforcer following devaluation before its sensory features come to elicit aversion. In the current experiment, the conditioned reinforcer had appetitive properties when it was presented during the single devaluation trial. Without reexposure, the original incentive properties of the conditioned reinforcer remained intact. In other words, animals must have the opportunity to respond to the changed affective value of the conditioned reinforcer before operant performance established by that conditioned reinforcer is affected (e.g., Parkinson et al., 2005).
One might initially question whether the effect reported here (Figure 1) is representative of a change in the conditioned reinforcer. Perhaps it instead reflects a form of mediated conditioning (Holland, 1981; Holland & Wheeler, 2009). A reduction in operant performance may potentially indicate mediation of sucrose’s value by virtue of the conditioned reinforcer being a shared associate of both shock and pellets. This, however, is unlikely for at least two reasons. First, such an arrangement would require operant performance to be a function of devaluation of sucrose by shock delivery. While not theoretically impossible, this is unlikely. Flavour aversions are typically a function of contingent illness and not of external painful stimulation (e.g., Garcia & Koelling, 1966). Indeed, Holland (1981) found that mediated food aversions occurred only when an exteroceptive associate of food was paired with illness, and explicitly did not occur when paired with shock. While another paper (Krane & Wagner, 1975) found that food aversions could be established by presentation of shock, doing so required an interstimulus interval (ISI) far longer than that which we used (i.e., 210 s vs. the 5 s ISI in this experiment). In addition, if we did obtain mediated revaluation of sucrose, we should expect conventional measures of Pavlovian behaviour to have reflected this—but they did not. Contrast this with Parkinson et al. (2005), in which the primary reinforcer (i.e., sucrose) was devalued—these researchers reported a decrease only in Pavlovian magazine entry, and not in the operant behaviour upon which a conditioned reinforcer was contingent. Our results suggest that the observed reduction in operant performance is a function of the conditioned reinforcer and not one of the food’s value itself, though it is an admitted limitation that we did not directly confirm this with a consumption test.
These results suggest a wealth of possibilities for future work. Researchers may consider establishing the temporal dimensions of the phenomenon. For example, how does the strength of the effect of reexposure vary as a function of when it is presented relative to devaluation? Does the ISI in revaluation treatment matter similarly to that in traditional conditioning procedures? Future work might document, as analogous work has done (e.g., Balleine & Dickinson, 1991; Balleine, 1992), how behaviour changes when the conditioned reinforcer is again made contingent on the performance of the operant response. Researchers might also investigate whether the shock’s magnitude is relevant to the effect reported here—might a single pairing of a conditioned reinforcer with a stronger shock increase the likelihood of an immediate effect on operant behaviour (after Paredes-Olay & López, 2002)?
In sum, we report that a single pairing of a conditioned reinforcer with an aversive stimulus was by itself ineffective in diminishing the behaviour supported by it. Only after animals were reexposed to the conditioned reinforcer did they reduce behaviour upon which the reinforcer had been contingent. This result has wide-ranging implications for research on the establishment of value and the role of prediction in psychological science.
Footnotes
Authors’ note
All experimental procedures were done in correspondence with the relevant ethical guidelines and were approved by TCU’s Institutional Animal Care and Use Committee.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
