Abstract
I propose a dual-self model in which two selves have conflicting preferences over the action to be taken by an agent. Departing from existing dual-self models, the two selves are treated symmetrically. They have identical instantaneous utility, and only differ in their time preference. The default action of the agent is modelled as the outcome of a Tullock contest among the selves, where the self who wins chooses their preferred action. Viewing the outcome of this contest as the point of disagreement, the selves are allowed to negotiate to a mutually preferred outcome, and this negotiation is modelled as a Nash bargaining problem. I show that multiple well documented ‘behavioural’ deviations from standard utility maximizing behaviour can be generated from this model, including time inconsistent behaviour such as diminishing impatience, as well as violations of independence of irrelevant alternatives in choice problems. Notably the preference reversals from time inconsistency are ‘smooth’, as opposed to the singular reversal in quasi-hyperbolic discounting, the standard model used in the literature. Further, the model implies correlation of these deviations due to their dependence on the same parameters. Finally, this approach provides insight on evaluating the welfare effects of various interventions.
‘The idea of self-control is paradoxical unless it is assumed that the psyche contains more than one energy system, and that these energy systems have some degree of independence from each other.’ (McIntosh, 1969)
Introduction
The observable behaviour of decision making agents includes a number of ubiquitous effects not consistent with the predictions of standard utility maximization with geometric time discounting. Among these, agents exhibit time inconsistency; they have reversals of preference when outcomes are delayed. Agents exhibit violations of independence of irrelevant alternatives in choice problems; in general, their decision may depend on the entire choice set, and they may be tempted ‘towards’ an option that they yet do not choose. Related to both of these, agents seek out commitment devices, whether to prevent preference reversals or to remove temptations, even when such devices are costly.
An individual containing conflicting internal preferences is certainly one potential cause of these phenomena. One strand of work in neurology views the brain as operating with a ‘team of rivals’ architecture, wherein different sections of the brain compete with each other directly for control over the actions of the individual, for example Eagleman (2011). MRI evidence is consistent with the notion that decisions made with different time horizons engage very different areas of the brain, for example, McClure, et al. (2004). Threads of research in psychology also address the idea of conflict between multiple selves, for example, Ainslie (1986).
Without laying claim to being a model of the brain, which it is not, the goal of this work is to formalize this neurological inspiration and account for the described empirical regularities in behaviour. I do this through the use of a novel ‘dual-self’ model. The model gives rise to a smooth form of time inconsistency having the same qualitative implications as hyperbolic time discounting, as well as a two-sided temptation effect, in which unchosen alternatives alter the decisions of the agent. The model predicts the use of costly commitment devices by the agent, provides strong intuition about the nature of these phenomenon, and provides some novel insight into welfare evaluation.
I model an individual decision-maker, or agent, as consisting of two selves; one patient and one impatient. The selves are taken to share the same payoff utility, and each discounts time geometrically, but they differ in their time discount factor. Thus, while they agree on the immediate utility granted by a decision, such as which flavour of ice cream is the most delicious, they will in general disagree with regard to decisions which have consequences over time, such as whether it is a good idea to eat the ice cream. If the selves cannot agree on what action to take, it is assumed that they engage in a costly conflict over control of the action, modelled as a variant on a Tullock (1980) rent-seeking game. The self that wins the conflict then chooses their most preferred action.
I allow the selves to negotiate in order to select an action mutually preferred to this costly conflict; this negotiation is modelled as a Nash bargaining problem. The bargaining set is the set of possible utility vectors created by available actions, or lotteries over actions, and the outcome of the costly conflict is treated as the disagreement point for the bargaining. Thus, in equilibrium, the costly conflict will not occur, as the selves will negotiate to a better outcome. This bargaining can result in either a deterministic choice in some applications, such as consumption-savings, or an agreed upon mixing between discrete choices in some discrete menu choice applications; both of these are addressed. Section 2 details the model in full.
The first empirical regularity, addressed in Section 3, is diminishing impatience, a particular form of time inconsistent preference reversal where, as the consequences of a decision are pushed into the future, an individual’s choices exhibit less impatience regarding the outcome. As a simple example, an individual given the choice between $100 today and $120 tomorrow may choose the $100 today, but when presented with the choice between $100 in seven days and $120 in eight days may switch and now desire to wait the extra day for the greater reward. As another example, an individual may desire to save a large portion of his next pay check for retirement; however, when payday arrives, he may change his mind when faced with the immediate reality of reducing his consumption. Proposition 1 establishes that the multiself bargaining model reflects a smooth form of diminishing impatience. In discrete decisions, as payoffs at different times are both pushed into the future, the observed patience of the agent increases continuously, in that they choose latter rewards with continuously increasing probability. For continuous decisions, such as consumption-savings, the model predicts that their decision reflects greater patience as the decision is made farther in advance, such as through a higher savings rate.
The empirical evidence for diminishing impatience is broad, and spans disciplines, seeing particular interest in psychology and economicsm, for example, Thaler (1981), Loewenstein and Prelec (1992), Kirby and Herrenstein (1995), Frederick et al. (2004) and Fang and Silverman (2009). Many models attempt to incorporate the phenomenon by utilizing forms of time discounting other than geometric. Evidence, for example, Ainslie (1992) and Myerson and Green (1995), suggests that the form of time discounting reflected in the decisions of agents is well fit by hyperbolic discounting, in which the discount factor falls steeply at first, but then less rapidly. 1 Consumption data is shown by some work, for example, Angeletos et al. (2001), to be much better fit by hyperbolic models than geometric. The most commonly used model of diminishing impatience, quasi-hyperbolic discounting, also referred to as ‘beta-delta’, was introduced by Laibson (1997). Quasi-hyperbolic discounting has one time discount factor between the current period and the next period, and a second, higher, time discount factor for evaluating between all periods thereafter. This form of discounting was a critical development in the analysis of time inconsistency, but has the drawback that it allows only a discontinuous form of diminishing impatience: a stark division between ‘now’ and ‘later’ sensitive to the precise definition of the length of a period. One advantage of the multiself bargaining model is that the form of diminishing impatience it creates more closely reflects hyperbolic than quasi-hyperbolic discounting, avoiding this discontinuity and period sensitivity.
The effects of tempting options on choices is addressed in Section 4. Temptation here is used to refer to cases where the addition of an option to a choice set alters the decision made by the individual even when the new option is not chosen; in other words, a violation of independence of irrelevant alternatives. This is documented in choices from discrete sets, as well as in consumption-savings decisions, for example, Munnell (1974; 1976), Ashraf et al. (2006), Huang et al. (2013). Existing literature on temptation primarily relies on choices being assigned some explicit temptation or self-control value, for example, Gul and Pesendorfer (2001). Actions are assigned differing levels of ‘temptation’ which the agent has a limited, or costly, capacity to resist. In contrast with this literature, in this model temptation effects will arise endogenously from the differing utility evaluations of the selves. Proposition 2 establishes that when the utility granted to a self by their most preferred action increases, the agent will choose an action, or lottery, that grants that self a higher expected utility.
Section 5 investigates the implications of the model for welfare evaluation, with the primary insight being that while the model is more limiting than the classic model for welfare evaluation, we can recapture much that is lost by models that use non-geometric time discounting. This is accomplished by viewing the selves as individuals for the purposes of examining welfare, so that we can evaluate policies from the standpoint of how they affect the welfare of both selves.
Section 6 discusses related literature, most significantly the existing multiself literature, and underlines the divergences of this model from similar ones.
Section 7 concludes.
The Model
Overview
An agent here is assumed to have two inner selves, one more patient than the other. The two selves will usually bargain and come to an agreement on the best course of action. If they fail to do so, there is a conflict between them, in which they both exert costly effort in order to gain control over the actions of the agent. How much effort they exert is influenced by how much difference they see between the choices available. Due to the costly nature of this conflict, bargaining occurs in equilibrium, and the conflict serves as a disagreement point.
Decision Problem and Notation
Time is continuous, but there are a finite number of discrete times t1 = 0, t2, t3… tN, where decisions are made and payoffs received, with ∆n = tn + 1 − tn ≥ 0. The decision made at time tn is referred to as decision n. An agent is assumed to consist of two selves. Both selves share an identical utility function over payoffs, u(·); they differ only in their time discount rate, ρ. One self is referred to as ‘long-term’ with ρl, and one is referred to as ‘short-term’ with ρs. It is assumed that these time discount rates satisfy ρs > ρl > 0; note that this means the discount factor of the long-term self is larger. 2
The choice of continuous time here allows for ease in comparative statics exercises to follow, but as utility is not evaluated continuously, the model should conceptually be thought of as consisting of discrete periods with varying period lengths. In particular, for fixed intervals, ∆1 = ∆2 = … = ∆N−1 = ∆, the model becomes a standard discrete model, with time discount factors given by
The actions available to the agent at decision n are given by action set An; An is assumed to either be a discrete choice set, or a compact Euclidean space. The set of lotteries over actions is given by An. At each decision point the agent must select one lottery over actions αn ∈ An. Let the history of realized actions up to decision n be given by hn = {a1, a2, …an−1}. In general, An, and therefore An, may depend on this history. In addition to its effect on future action sets, each realized action grants some payoff vector, and the corresponding payoff utility is denoted u(at).
3
Each self evaluates their own welfare using standard expected utility with geometric time discounting, so that the time 0 discounted utility of a realized action stream a1, a2,… an to self i is given by
Every lottery αn at decision n creates an expectation over realized actions, and thus an expected payoff utility, given by
The decision procedure employed by the agent will be described recursively, starting from the final decision. Consider the decision made at time tN. As there are no future decisions, the concern of the selves is entirely the payoff utility derived from the possible actions. As the selves agree on payoff utility, they agree on rankings of lotteries for this final decision. Thus, the lottery chosen is simply
where D(AN(hN)) indicates the decision made from lottery set AN(hN). The expected utility that self i derives from this decision at time tN will be denoted Ui,N(D(AN(hN)), hN). 4
Now consider decision N–1, at time tN−1, where the lottery is being selected from AN−1(hN−1). Each possible lottery αN−1 creates not only an expected payoff utility but also an expectation over the future history, hN. This future history in turn influences the future action set, which influences the future decision made, and therefore the future utility. Specifically, the discounted utility to self i at time N–1 from lottery αN−1 is given by
This utility will also be denoted as Ui(αN−1) to conserve notation; it consists of the expectation of payoff utility, u(aN−1), as well as the expected discounted future utility. Note that {hN−1, aN−1} is the future history, and the expectation is over what value aN−1 will take. Essentially, the selves are (correctly) projecting the action that will be taken at the final decision point based on the action taken today, and discounting the utility they will receive from that action based on the time difference between the current decision and the latter one, ∆N−1. Note that this implies the assumption of sophistication of the selves. Now, in contrast to the final decision, due to the differing discount rates, the selves do not agree on the ranking for this decision, and generically may prefer different lotteries.
Each lottery αN−1 creates a utility vector (Us(αN−1), Ul(αN−1)), as determined by the above valuation. Denote by UN−1(AN−1, hN−1) the set of all such utility vectors for decision N–1, which will be abbreviated UN−1. Since the choices are lotteries over actions, UN−1 will be a convex set. A conflict/bargaining procedure, described in the next two subsections, is used to select a single utility vector from UN−1, and the lottery chosen by the agent is the one corresponding to that utility vector. These steps can then be applied recursively backward, as the selves can now project the lottery chosen at decision N–1, and so on. The utility of lottery αn to self i at time tn being given by Ui,n(αn, hn) in accordance with the above formula, or equivalently Ui(αn).
Finally, note that the prior formula is not a Bellman equation. D is a decision process, but one that will depend on the utility values of both selves, not just self i, thus it cannot be expressed as an optimization decision made by self i.
Conflict
Now we turn to describing the procedure by which the agent selects an action when the selves do not agree.
5
Consider decision n made at time tn, and corresponding set of utility vectors, Un, created by the lotteries in An. First, note that each self will have a bliss action in An: the action which grants them the highest discounted utility.
6
This bliss action is given by
These bliss actions induce a pair of bliss points in Un, given by:
Thus,
Sn is the non-negative difference in utilities that the short-term self will receive from the two bliss points; similar for Ln and the long-term self. Figure 1 illustrates these terms.

By default, it is assumed that the selves will engage in a conflict modelled as a slight variant of the Tullock (1980) rent-seeking game.
7
First, the selves simultaneously commit to an effort choice ei. Second, the winner is determined based on the effort choices. The probability that the short-term self is the winner is given by
with 0 ≤ γ. Third, the winner selects the action taken by the agent. Naturally, the self that wins will select their own bliss action. Considering potential equilibria, note that es = el = 0 is not one, as both selves would have incentive to exert marginal effort. Thus, the short-term self selects es to maximize:
This creates an expected utility vector for the selves given by
To ensure uniqueness, we will restrict attention to 0 ≤ γ ≤ 1 in the conflict game. 8 A higher γ means that the effort choices of the selves will have more impact on the probability of winning. This naturally results in an equilibrium in which they exert higher effort, as we see. Further, the probability of one self winning increases with the difference between the options for them (Sn or Ln), and decreases with the differences between the options for the other self.
Bargaining
The randomization over bliss actions resulting from the equilibrium of the conflict game is treated by the selves as a default outcome. However, the model allows the selves to bargain, or negotiate, to a mutually preferable option. The outcome of this bargaining is assumed to be the Nash bargaining solution as applied to the set of utility vectors, Un, using the outcome of the conflict game as the disagreement point.
For the remainder of the paper, we will add the cost of effort from the conflict game back to the disagreement point, leaving the disagreement point as a simple mixing between the utility vectors induced by the two bliss actions. This step will allow for cleaner intuition and understanding: the actual cost of effort in the conflict game is not the source of any interesting behaviour in the model, and we should be agnostic about whether the selves are truly ‘spending’ effort in any meaningful way. Fortunately, the qualitative nature of the results are not changed by this step, and where relevant both disagreement points are shown to yield the same result in the proofs. This gives a simpler form to the disagreement point derived from the conflict game, now given by
The standard Nash bargaining solution maximizes the product of the gains of the two selves relative to a disagreement point; this product is denoted the Nash product. Thus, the action taken is given by
The chosen utility vector is illustrated as α in Figure 2. 9 Note that this procedure implies that conflict does not occur in equilibrium. Rather, anticipation of conflict drives the bargaining between selves. Both selves are able to project the outcome of conflict, and it is this commonly anticipated outcome, in the event that bargaining fails, that drives the outcome of the bargaining.

An affine transformation could also be applied to both effort costs in the conflict game without changing the outcome, but the costs have been normalized so that the marginal cost of effort is 1. The last part of Lemma 2 is essentially the Symmetry axiom of Nash bargaining: the decision process does not favour oneself over the other; the classic axiom is based on an exogenous, symmetric disagreement point, however.
Observe that if the Pareto frontier consists only of mixtures between the two bliss points, then the disagreement point (itself being a mixture between the bliss points) will lie on the Pareto frontier, and thus coincide with the outcome vector. So, for example, if there are only two actions, the outcome can be interpreted as the selves agreeing on the same mixing that would result from conflict, and by doing so bypassing the actual costs of conflict. 10
Illustrative Example
To illuminate the workings of the model, we look at a simple example of savings-consumption. Consider an agent endowed with $1 at time t1, and nothing at time t2, with ∆1 = t2 − t1 = 1. He must decide how much to consume at time t1 and how much to save; assume there is no interest on savings. We take his instantaneous utility function to be
Denote the amount saved at time t1 as a. Modeling the action as being the choice of a, we have A1 = {0,1}, with each action granting a payoff utility as well as constraining the action set at time t2. At t2, both selves will agree to consume the full amount remaining, so decision 2 is trivial. Thus, each self in the first decision has a discounted utility given by
which gives
This creates bliss points
Thus,
Finally, the amount of savings undertaken by the agent is found by solving
which gives a ≈ 0.354. This outcome is illustrated in Figure 3.
Diminishing Impatience
The source of diminishing impatience in the model is best introduced through a simple example. Consider an individual with payoff utility function given by u(w) = w who, at time t1, is given the choice between receiving $100 at time t2, or receiving $120 at time t3. The utility vectors created by the two options are, respectively,

As this is a binary decision, the resulting At consists of all mixing lotteries between the two options, and Ut is a line segment as a result. Consider the interesting case where 120
so that,
We will use γ = 1, so we have
As ρs −ρl > 0, it is straightforward to see that

The result is generalized in the following.
MH at time t3, with u(MH) > u(ML), denote by pL the probability an individual will choose ML. Then,
with the derivative strict wherever 0 < pL < 1.
Intuitively, as rewards are pushed into the future, both selves care less about the difference between the two rewards; that is, the discounted value of the utility difference shrinks. However, the difference shrinks at a faster rate for the short-term self. Essentially the long-term self sees a much bigger difference between far future rewards than the short-term self does, relatively. As a result, the short-term self has little incentive to exert effort in the conflict game, and wins such a fight with low probability, shifting the disagreement point in bargaining to favour the long-term self. Observationally, as ∆1 increases the behaviour of the individual becomes closer to that predicted by standard geometric discounting, as though the long-term self was the entire individual. Now, the general result for diminishing impatience:
with the inequality strict if the Pareto frontier of U1 is smooth.
The term
The intuition of Proposition 1 is that as the consequences of a decision are pushed into the future, the outcome asymptotically approaches the bliss action of the long-term self. Observationally, the time discounting as predicted by the model approaches a geometric pattern for long time horizons, which reflects the geometric discounting of the long-term self. In the case of a discrete decision, this takes the form that the probability that the agent will choose the bliss action of the long-term self approaches 1. For a continuous decision, the action chosen by the agent approaches the bliss action of the long-term self asymptotically. Putting Proposition 1 in terms of utilities allows the capture of both of these cases.
I’ll now illustrate Proposition 1 for a continuous decision. Consider again a consumption-savings example, with no interest and u(c) =
The short-term self would like to choose a to maximize
Determining S and L, we find that
The intuition becomes clear here again: the short-term self is discounting S1, so that self sees less difference between the bliss points as ∆1 increases. The long-term self discounts the difference to a smaller degree (in the chosen example, they do not discount at all, hence the absence of Δ1 from L1). We can now calculate the probability of the short-term self winning the conflict game:
So, we see for the continuous case that the short-term self again has a lower probability of winning the conflict game as the payoffs are pushed into the future. Since this lower probability is anticipated by both selves, this will translate into a bargaining outcome more favourable to the long-term self; in this case, a higher savings rate.

Figure 5 shows the result for ∆1 = 1; that is, the result when the agent is deciding on the amount of saving 1 period in advance. The utility for the short-term self contracts as compared to the Section 2.5 example (as seen by horizontal compression in the graph) and, as a result, the disagreement point moves closer to the bliss point of the long-term self; the savings rate increases from the previous 0.354 to 0.388.
The model thus predicts that when an individual is making a savings decision, the amount that they will choose to save increases the farther their decision is from the date of initial consumption potential. It is this ability to account for preference reversals between any two time delays that prevents the model from being reliant on period length specification, and aligns it more closely with hyperbolic discounting models, as opposed to the β-δ specification.
Temptation
We now turn to temptation effects: the observation that agents’ decisions do not always satisfy independence of irrelevant alternatives. In the multiself bargaining model, the action decided upon by the agent will depend on the bliss actions of both selves, even if these actions are ‘irrelevant’ (never chosen) options. This is due to the fact that the bliss points determine the resolution of the projected conflict game, and therefore the disagreement point that the selves anticipate in their bargaining. Irrelevant alternatives that are not the bliss point of either self will not influence the outcome; in this sense the model is in accordance with existing literature on temptation, which primarily takes the view that it is only the most tempting point that is relevant. For expositional purposes, this section focuses exclusively on the bliss point of the short-term self, but the results apply similarly for the bliss point of the long-term self.

Consider an agent at a restaurant choosing between a (H)ouse salad, (G)rilled steak, and a (B)acon cheeseburger. The individual believes that B is the most delicious and H the least, but that the opposite is the case regarding the health effects of the choices. Suppose further that, as a result of health effects being something in the future, the preferences of the selves are such that
The left graph of Figure 6 illustrates the decision made when the individual is choosing between only H and G. In the right graph we see how the outcome changes when we add in the third option, B. The point α shows the lottery chosen by the agent in the first case, while the addition of B moves the lottery to α’, shown to the right. With the addition of B, though the individual still chooses B with zero probability (the outcome shown is a mixing between H and G), the weight placed on G grows. Intuitively, in the presence of an option more desirable to a self, that self is more willing to exert effort in the projected conflict game; this ‘pulls’ the disagreement point, and thus the ultimate decision of the agent, closer to the bliss point of that self. It is important to note, though, that the long-term self is also more willing to exert effort with the addition of point B, since that self is much more opposed to B than they were to G. Temptation is always bi-directional in the model, so that it is the differences in bliss utilities for both selves that determines the overall effect. It is therefore not always the case that adding an action to At which has a greater utility for the short-term self than their current bliss point moves the outcome in their favour.
To nail down the net temptation effect of a given option, I will break down the effect into component parts. First, Proposition 2 considers the effect of making the bliss point of the short-term self better or worse for the short-term self; in graphical terms this is moving B to the right or left.
with the inequalities strict if the Pareto frontier is smooth at (Us(αn), Ul(αn)).
The interpretation of Proposition 2 is straightforward: if we increase the utility granted to the short-term self by their bliss point, then the action chosen by the agent changes to one that grants a higher utility to the short-term self. This is the temptation effect in its purest and most intuitive form: as the desserts on the menu become more delicious, the agent is pulled more towards them. If the Pareto frontier is smooth, then strictness of this change implies a continuous temptation effect, one which applies even if the bliss point is chosen with zero probability. However, if the slope is not defined at the outcome, then the temptation effect ceases to be strictly increasing.
For example, in Figure 6, if we move point B to the right, the decision will shift smoothly to the right as well; move B far enough, and the individual will eventually choose G for certain, and remain there for an interval as B is moved farther to the right. Once B is moved far enough, the individual would begin mixing between G and B, and the temptation effect would again be continuously increasing. 11 The utility granted to the long-term self by B remains the same, and thus the difference to the long-term self between the bliss points remains the same (Ln constant). This means that the long-term self’s incentive to exert effort in a potential conflict is unchanged, but the short-term self’s incentive is increased. Thus, the projected probability of the short-term self winning a conflict is conclusively increasing, moving the disagreement point closer to B along the mixing line between H and B; this is compounded by the fact that the mixing line itself is shifting in favour of the short-term self (since the B endpoint is moving in favour of the short-term self). Finally, the fact that the disagreement point has shifted in favour of the short-term self means that the bargaining outcome will as well.
The relevance of requiring that a section of the Pareto frontier surrounding the decision be unchanged is to ensure that the changing point is an irrelevant alternative. Doing away with that condition, we obtain a more limited result:
with the inequality strict if the Pareto frontier is smooth at (Us(αn), Ul(αn)).
Without the limitation on the Pareto frontier, an improvement in the bliss utility of the short-term self still moves the outcome in a direction favourable to the short-term self. However, it may also add Pareto improvements on the previous outcome, and so the net effect on the utility granted to the long-term self is ambiguous. This would occur if, in Figure 6, B was shifted far enough to the right that the Pareto frontier became the line segment between H and B (so that G was no longer on the frontier).

I close this section by illustrating the temptation effect in a continuous action set. Consider again our illustrative savings example, and suppose that the agent has committed to saving at least 0.3. Then, when it is time to make the final savings decision, the bliss action of the short-term self is to save exactly 0.3, as 0.2 (their former bliss action) is no longer in A. As a result, the agent is less tempted towards low saving rates, and the action chosen by the agent moves from a = 0.354 to a = 0.367. This is shown in Figure 7.
Welfare Implications
While it presents more challenges than the canonical expected utility model, the dual-self bargaining model presented here need not be silent on questions of welfare. By choosing to view an agent as actually consisting of two individuals, there are several results that grant leverage for welfare evaluation. This is an unorthodox notion of welfare, differing from other dual-self models, such as Fudenburg and Levine (2006), that consider the long-term self’s welfare to be the sole metric of importance. However, such an approach would not make sense in the context of this model. There is, it can be said, an advantage over models in which there is a single individual whose preferences change in each period due to non-geometric time discounting. Rather than have one set of preferences for each period, here there are only two in total. There is also the observation that utilities are intrapersonal here, rather than interpersonal. Selves share a payoff utility from actions, and so utility comparisons arguably carry more weight here than they would when comparing across individuals. I first address the notion of Pareto improvements: actions that improve the utility for both selves over another action.
This has a close relation with the notion of the unambiguous choice relation developed by Bernheim and Rangel (2009). Essentially, if a is never chosen when b is available, then we say that b is unambiguously preferred to a, written as bP∗a, in the terminology defined by their work. Thus, in this model, b is unambiguously preferred to a if it represents a Pareto improvement over a in regard to the two selves. An immediate extension is that for a given action, a, if there exists another action, b, which is unambiguously preferred, a can definitively be said to be Pareto inefficient from a welfare perspective. It also has the important implication that if an agent always makes the same decision from a binary choice (probability 1), then the choice made is a Pareto improvement over the other, and thus can be definitively said to be welfare improving.
Now I turn to the more difficult question of welfare evaluation of options when neither is a Pareto improvement over the other. If we wish to make statements about welfare on such questions, it is necessary to consider some aggregation of the utilities granted received by the two selves. One natural method by which to do so is a utility weighting welfare function, W(a) = wUl(a) + (1 − w)Us(a). Under such a function, several results follow.
Lemma 6 implies that, if welfare of an individual is evaluated by a weighting between options, observation of choice from a binary set of two actions is sufficient to say which is welfare superior. 12 If w = 0.5, so that welfare is considered as an equal weighting between the selves, then the condition simply becomes p < 0.5, so that γ need not be known, nor which action is preferred by which self. This is appealing for application: it would imply that in a series of random choices between two options the agent would be observed to choose the option granting higher welfare more frequently.
We may also wish to consider welfare weightings that place greater weight on the utility of the long-term self.
⇒ Ul(a) − Ul(b) > Us(b) − Us(a) ⇒ w(Ul(a) − Ul(b)) > (1 − w)(Us(b) − Us(a)) for w ≥ 0.5
⇒ wUl(a) + (1 − w)Us(a) > wUl(b) + (1 − w)Us(b) for w ≥ 0.5.
This corollary says that if a utility weighting welfare function places at least equal weight on the long-term self’s utility, then observing the agent choose the long-term self’s preferred action more frequently than an alternate action implies that the more frequently chosen action grants higher welfare. This is of interest because if, for example, a bag of potato chips is resisted more often than not, then it implies that it is welfare improving to remove the bag of chips as an option. Additionally, it allows a degree of welfare evaluation to take place without taking a stance on whether an equal weighting between selves or a higher weighting on the long-term self is the correct choice in a welfare function. 13
Related Literature
In the literature on multiself models, Thaler and Shefrin (1981) introduced the ‘doer-planner’ dual-self interpretation. The planner is concerned with lifetime utility, while the doer is completely myopic. The doer exercises full control over the action taken by the individual, whereas the planner exercises costly action to constrain the action taken by the doer. Fudenburg and Levine (2006), building upon Thaler and Shefrin, bears the closest resemblance to the model presented here. They develop a model consisting of a single long-lived patient self, and a series of one period lived short-term selves, in which the short-term self has full control over the action in each period, and the long-term self exerts costly effort to constrain the actions available to the short-term self. Their model accounts for several of the empirical regularities discussed in this work, including diminishing impatience. In Fudenburg and Levine (2011), they extend their model to develop conditions on self-control cost that account for Allais paradox phenomena. In Fudenburg and Levine (2012), they relax the assumption of completely myopic short-term selves in order to remove the discontinuity in diminishing impatience. The dual-self bargaining model presented here departs from this literature by treating selves symmetrically, and by having only two selves (as opposed to a succession of myopic selves). The key conceptual innovation is that the behaviour of interest can be generated solely by a difference in geometric time preference between two selves, without the introduction of self-control cost as a modeling element.
Gul and Pesendorfer (2001; 2004) develop a very important representation result for preferences for commitment and self-control in which each outcome has two evaluations: u(·) representing the a priori ranking over singleton sets, and v(·) representing the instantaneous ‘urge’ an action. An individual chooses a menu in one period, and then in the next period chooses x from that menu to maximize
The term in parenthesis is the difference in temptation utility between the choice made and the choice with the strongest temptation utility; this is interpreted as the self-control cost. Preferences are defined over {(A, x): x ∈ A} where A is the set of lotteries chosen in period 1, and x ∈ A is the lottery chosen in period 2. In other words, the preference is over both the menu and the choice from the menu. This extended preference allows them to explain apparent preference reversal without the preference relation so defined being violated, as well as account for self-control and temptation effects. Gul and Pesendorfer advance the view that the preferences they develop can be viewed as being dynamically consistent.
It is shown in Benabou and Pycia (2002) that Gul and Pesendorfer’s representation can also be interpreted through the dual-self view of Thaler and Shefrin, in which there is an endogenous probability of ‘losing control’ to your more myopic urge. Their re-interpretation bears an interesting conceptual connection to this model; they postulate two selves who ‘lobby’ the brain for control, each expending a resource cost, and receiving probability of control proportional to the expenditure of the effort. This bears a close resemblance to the conflict that we use to determine our disagreement point. There are important distinctions, most saliently of which is that the model here allows bargaining between the selves rather than stopping at strict randomization between their most favoured points; this allows the accommodation of both randomization in the case of discrete decisions, as well as deterministic decisions for actions such as savings-consumption choice. In contrast, Benabou and Pycia’s interpretation implies randomization in all cases where the selves prefer different outcomes. Further, I distinguish selves by a single parameter, rather than requiring two distinct evaluations u(x) and v(x) (also required by the general representation of Gul and Pesendorfer). This more restrictive approach allows the generation of novel insights.
Other literature on multiple selves includes Chatterjee and Krishna (2009), who develop a model of conflicting preferences in which an ‘alter-ego’ has a probability of appearing and overriding the decision of the far-sighted decision maker. Ambrus and Rozen (2013) show that a limitation on the number of selves is necessary for any multiself model to have predictive value; without such a limit, they show that any behaviour can be rationalized.
The classic work on time inconsistency is Strotz (1956), which formally analysed foreseeable changes in preference arising from non-geometric time discounting. Ainslie (1975) argues that time inconsistent behaviour is best fit by hyperbolic discounting, which creates a smoothly diminishing impatience. Frederick et al. (2004) provides an extensive review of the economic literature on time preference, including both hyperbolic discounting and the beta-delta discounting of Laibson (1997). Harris and Laibson (2013) provide an alternate approach to avoiding the discontinuity of predictions that come from the standard beta-delta model by introducing an element of uncertainty. In their model, there remains a discontinuous distinction between ‘now’ and ‘later’, as in standard beta-delta, but the agent, in evaluating the discounted value of rewards, is uncertain about when ‘now’ will end, and ‘later’ will begin. The interpretation of this is challenging, as it implies the agent is internally uncertain about how they themselves are discounting future rewards. However, it is shown that this model of uncertainty generates preferences equivalent to a deterministic discounting function which is qualitatively similar to true hyperbolic discounting. Thus, their work presents an alternate form of a continuous time discounting function (as opposed to the hyperbolic discounting function).
Closely related is a branch of literature which considers models of multiple selves across time. Rather than selves existing concurrently, the decision maker today, and the decision maker tomorrow, are regarded as separate selves. As one example, Laibson (1998) refits an intergenerational consumption game as an intra-personal consumption game in which a decision maker competes with future versions of themselves; he shows that hyperbolic time discounting leads to multiple intrapersonal equilibria. Jamison and Wegener (2010) draw upon neurological studies to propose that decision makers regard future selves to be truly separate persons in their decision making process. Mullainathan and Banerjee (2010) propose a class of ‘temptation’ goods which generate utility for the current self, but not for previous selves that anticipate their consumption; this division implicitly creates a distinction in preference between present and future selves.
In the realm of bargaining, an existing strand of literature explores methods of generating endogenous disagreement points in bargaining problems, for example, Vartiainen (2007) and Bozbay et al. (2012). The methods of endogenizing the disagreement point developed in this literature require strict convexity of the bargaining set. Thus, applying such methods to the dual-self bargaining considered here would exclude from consideration discrete action sets (which create utility vector sets which are not strictly convex).
A branch of literature, for example, Ozdenoren et al. (2012), building on experimental evidence, models willpower as a depletable resource. While willpower is not explicitly included in the dual-self bargaining model presented here, this view of willpower provides intuition for why it may be reasonable to view effort in the conflict game between selves as having an associated cost.
More recently Jackson and Yariv (2015) showed that collective choice between individuals was necessarily time inconsistent. They comment on the mistake made in modeling collective organizations as time consistent agents for exactly this reason. Thus there is a strong parallel in which we might argue it is a mistake to model individuals as time consistent if, indeed, they can be seen as having conflicting selves.
Future Work and Conclusion
An important future test of the model will regard the implied correlation between diminishing impatience and violations of IIA. As both are directly dependent on the difference between the time parameters of the two selves, agents that exhibit more or stronger preference reversals should also exhibit more susceptibility to the influence of tempting options. This is an intuitively satisfying implication, in the sense that both can be regarded as departures from classical rationality. The model indicates that the degree of one can be used to predict the other. Should there be no correlation between the two, it would cast doubt on the usefulness of this model.
One alteration to this model to consider is the bargaining procedure used; Nash bargaining was selected here for tractability, but the general properties of this model are dependent on the conflict game, not the bargaining procedure. Indeed, it is not difficult to show that the qualitative results of diminishing impatience, preference for commitment, and temptation, result from any bargaining procedure that satisfies two uncontroversial properties. First, invariance to affine transformations of utility. Second, the utility granted to each bargainer by the outcome should be monotonic in the utility granted to that self by the disagreement point. Other bargaining procedures which satisfy these properties, then, generate the same qualitative results.
A future extension will be the infinite-time version of the model, and an equilibrium concept for such. A compelling concept would seem to consist of a set of beliefs held by the selves about the current-action dependent distribution of future actions of the agent which, when acted upon by the selves in the conflict/bargaining process, result in the realized distribution of actions coinciding with the beliefs. If the selves believe that the agent will save 40 per cent at each decision and this causes the bargaining procedure to result in a savings rate of 40 per cent, for example. It is not immediately clear that this equilibrium concept guarantees existence, however, or if a less restrictive concept is desirable for stochastic action plans. Other future work will certainly include symmetric multiself models in which the selves vary in dimensions other than time discounting. In particular, selves that vary in a risk aversion parameter is of interest in attempting to generate regularities related to risk.
This work formalizes intuition about conflicting internal preferences into a model that provides a unified explanation for a number of behavioural regularities. A smooth time inconsistency arises from the relative difference in time preference between the selves. Temptation effects, or violations of independence of irrelevant alternatives, result from the differing incentives of the selves in conflict over control. All of these come from a tightly parameterized difference in time preference between the selves, which additionally creates novel intuition about the nature of such behaviour within an individual decision-maker.
Footnotes
Acknowledgement
I would like to thank David Dillenberger, Mallesh Pai and Andrew Postlewaite for their helpful input and support throughout this project. I am also grateful for comments and input from George Mailath, Aislinn Bohren, Hanming Fang and Garth Baughman.
Declaration of Conflicting Interests
Funding
The author received no financial support for the research, authorship and/or publication of this article.
Notes
Appendix: Proofs of Results
I begin the proofs by formalizing a few results, not included in the main body of the paper, that derive from the nature of the Nash bargaining solution.
Lemma A.1 formalizes the notion that, taking the line drawn between the disagreement point and the Nash bargaining outcome, if the disagreement point is moved to the right side of that line, the outcome will also move to the right, and vice versa. In Figure A.1, the shaded area represents where a moved disagreement point would move the outcome to the right. This result will be used in several latter proofs.
Further, if
With the right inequalities strict if the Pareto frontier of U is smooth at
Now, consider the second case, where the original outcome remains a Pareto improvement on the new disagreement point. Note that the Pareto frontier of U is a continuous, not necessarily differentiable, curve with endpoints at the bliss points of the two selves. Take any continuous bijective mapping f that maps {0,1} onto this curve. Without loss of generality, assume that it maps 0 to the bliss point of the long-term self, and 1 to the bliss point of the short-term self. f can then be divided into two mappings, one for each of the two coordinates of the points on the Pareto frontier, fs: {0,1} → {Xl,Xs} and fl: {0,1} → {Y l,Y s}. For x ∈ {0,1} denote by (Us(x), Ul(x)) ≡ (fs(x), fl(x)) the point on the Pareto frontier which x is mapped to. Note that Us(x) is strictly increasing in x, while Ul(x) is strictly decreasing in x. Then, we can rewrite the outcome of the Nash bargaining procedure to be the solution to:
Denote x1 the argument of the solution to this maximization (Nash bargaining gives us a unique solution). Then, the Nash bargaining outcome can be written as
so that the first order condition is:
So, if the frontier is smooth at the Nash bargaining outcome,
Now consider the new disagreement point (). Denote
Assuming still that the Pareto frontier is smooth at
This last line shows that the Nash product, with the new disagreement point, is now decreasing in x at x1. This, implies x2 < x1, since the Nash product is single-peaked along the Pareto frontier. Thus,
The rest of the cases for a smooth Pareto frontier follow similarly.
Now, we need to consider the case where the Pareto frontier is not smooth at
Thus, md > m1. As
Two subcases, now. If Δdl > 0, then md < 0, (d2 is to the left and above), and so d2 is above
Thus, d2 lies above
Now, consider some set of utility vectors, V, that has a Pareto frontier consisting of the line segment connecting U1 and U2. Note that all points on the line between U1 and U2 are necessarily elements of U, as U is convex. Consider now the bargaining problem given by < V, d1 >, and denote its solution vector as V 1. Vs1 ≤ Us1. This is because the Nash product is single-peaked along the Pareto frontier. Thus, if Vs1 > Us1, then any point to the right of U1 along the Pareto frontier of V induces a greater Nash bargaining product than that of U1. But, this would imply that U1 was not the Nash product maximizing choice from < U, d1 >, since there are points to the right of U that are elements of U. Therefore, Vs1 ≤ Us1. This means that that the slope between d1 and V 1, denoted , is higher than .
Similarly, for the bargaining problem < V, d2 >, with solution V2,
However, since the slope of the Pareto frontier of V is smooth, from earlier in the proof we know that the line connecting a disagreement point to the outcome vector should be the negative of the slope of the frontier. This implies that
Lemma A.2 shows that if we have two different utility vector sets whose Pareto frontiers coincide in an open interval around the Nash bargaining outcome, then small variations in the disagreement point will have the same effect on the outcome chosen from both sets. This allow us to apply Lemma 1 for different sets when the Pareto frontier around the outcome is the same for both sets.
Further, if
With the right inequalities strict if the Pareto frontier of U1 is smooth at
Now consider the second line. First, the Nash bargaining outcome chosen from U2 with disagreement point
Second, consider the Nash bargaining outcome chosen from U1 with disagreement point
where w is the weight placed on the bliss point of the short-term self. The disagreement point is given by:
To find the Nash bargaining solution, then, we choose w to maximize the product of gains over the disagreement points, given by:
The first order condition is
This is exactly the same weight placed on the short-term self gaining control, and thus on the bliss action of the short-term self, by the equilibrium of the conflict game.
The first and second derivatives of this objective are
similarly for the long-term self. Note that the second derivative is negative if 0 ≤ γ ≤ 1, satisfying the second order condition. We can write the first order condition of the selves as
Defining
Substituting into the short-term self’s first order condition,
We get el similarly, and the probability p of the short-term self winning the contest.
This gives us a unique interior local maximum for 0 ≤ γ ≤ 1; we must now verify that neither self prefers to deviate to ei = 0. Given the above es and el, the expected utility for the short-term self is
If the short-term self exerts zero effort, their expected utility is
Which is true for γ ≤ 1. Similarly for the long-term self. So, neither player wishes to deviate to ei = 0; note that this implies that neither bliss point is a Pareto improvement over the disagreement point. Thus, we have a unique Nash equilibrium and expected utility vector stated in Lemma 1.
Consider replacing the shared instantaneous utility function u(·) with v(·) = λu(·) + µ. Denote
Putting these into the formula for
If Un is such that (x, y) ∈ Un if and only if (y, x) ∈ Un, then the coordinates of the two bliss points are necessarily mirrors of each other.
If
Consider now the remaining case where:
so that the long-term self strictly prefers the latter payoff, and the short-term self strictly prefers the sooner payoff. Then,
Thus, the probability the short-term self wins the conflict game is decreasing in ∆1. From Lemma A.3, the decision of the agent will have the same weightings on the actions as the conflict game. Thus, the probability that the bliss point of the short-term self (the smaller, sooner payout) is chosen is decreasing in ∆1.
Consider the effects of an ε increase in ∆1. Denote with superscript ε relevant terms after this increase. So,
where
We will show first that the first two bargaining problems result in the same chosen lottery, and second that the third bargaining problem improves the outcome for the long-term self relative to the second bargaining problem. This will then immediately imply that the ε problem gives higher utility to the long-term self.
Consider
Now consider
We must consider the disagreement point both with and without the effort cost added in. Recall that with the cost added back in:
which implies:
Now, consider
Now comparing
We now show that this first part also holds for the disagreement point without the effort added back in. In this case, the disagreement point is given by:
The disagreement points for the affine transformation case and the ε case are given by:
Now, if
As
Similarly for
Now consider the second part of Proposition 1. The utility granted to the long-term self is bounded below by the utility granted to them by the disagreement point. Thus, the difference between the long-term self’s bliss point utility and the utility granted to them by the action chosen by the agent is bounded above by the difference between their bliss point utility and the utility of the disagreement point. Thus,
So, let us consider the limit of the right hand side.
the last following from the fact that ρl – ρs < 0.
The result also follows with the disagreement point without effort costs added back in.
So, for either disagreement point,
Recalling that S =
From these,
with the last step from application of Lemma A.2, with the inequality strict if the Pareto frontier is smooth. Since by supposition there are no Pareto improvements on the original choice, we also have
also with the inequality strict if the Pareto frontier is smooth.
Now consider the case of the disagreement point without effort costs added back in,
The remainder of the proof follows identically to the first case.
the last connection from Lemma 1.
