Abstract
This article introduces a near-optimal bidding algorithm for use in real-time display advertising auctions. These auctions constitute a dominant distribution channel for internet display advertising and a potential funding model for addressable media. The proposed efficient, implementable learning algorithm is proven to rapidly converge to the optimal strategy while achieving zero regret and constituting a competitive equilibrium. This is the first algorithmic solution to the online knapsack problem to offer such theoretical guarantees without assuming a priori knowledge of object values or costs. Furthermore, it meets advertiser requirements by accommodating any valuation metric while satisfying budget constraints. Across a series of 100 simulated and 10 real-world campaigns, the algorithm delivers 98% of the value achievable with perfect foresight and outperforms the best available alternative by 11%. Finally, we show how the algorithm can be augmented to simultaneously estimate impression values and learn the bidding policy. Across a series of simulations, we show that the total regret delivered under this dual objective is less than that from any competing algorithm required only to learn the bidding policy.
Keywords
In 2018, U.S. advertisers spent $48 billion on internet display advertising, a 20% increase year-over-year. Display ad spending now represents 45% of all digital ad spending (Silverman 2019), and analysts expect double-digit growth through at least 2021 (eMarketer 2017). Much of this growth stems from advertiser preferences for more precise audience targeting, which is viewed as a key driver of return on investment ( Forbes 2015).
The demand for precise targeting has made real-time bidding (RTB) a dominant distribution channel for display ad impressions. In 2020, U.S. advertisers are expected to spend $26 billion purchasing online display advertising through real-time auctions. With a compound annual growth rate of 24%, this will represent nearly half of all display ad spending (Hoelzel and Ballve 2015). Under the RTB approach, individual impressions are sold in real time, frequently via second-price (Vickrey) auctions. Each time a user requests a web page from a publisher’s server, auctions are held to determine which advertisements will be served alongside the web page’s content. The immediacy and individual nature of RTB targeting allows advertisers to control the timing, location, and recipient of each exposure.
Because of the speed and volume of RTB transactions, advertisers must manage their campaigns via automated targeting and bidding strategies. There are, on average, 1.6 million RTB auctions per second, and each concludes within milliseconds (Shen et al. 2015). During this brief window, targeting algorithms are used to forecast the expected value of each impression opportunity. Advertisers commonly quantify this value as the probability that the recipient undertakes a desired outcome of interest (e.g., click, website visit, purchase), which is known as a conversion (Gordon et al. 2019). The advertiser’s bidding algorithm then converts the forecasted impression value to a monetary offer. Advertisers leverage a wide variety of bidding algorithms, including bidding a constant amount per opportunity, pacing campaign spend evenly over time, and dynamically adjusting bids on the basis of forecasted impression value (Google 2018b; Heise, Abou Nabout, and Skiera 2016; Zhang, Yuan, and Wang 2014). Because targeting algorithms and bidding strategies vary across advertisers, two sets of impressions that are valued equivalently by one advertiser may sell for dramatically different prices. Thus, a bidding strategy that balances each impression’s value with its cost can significantly improve campaign impact.
Despite following Vickrey protocol, bidding one’s valuation is not generally a dominant bidding strategy in RTB auctions. Under Vickrey protocol, each bidder submits a single, sealed bid, and the winner pays an amount equal to the second highest bid. When participating in a series of independent Vickrey auctions or a single such auction, bidding an amount equal to one’s true valuation is a well-known, weakly dominant strategy (Vickrey 1961). However, challenges in accurately attributing future incremental profits to individual impressions (i.e., the “attribution problem”) have led to the widespread use of intermediate valuation metrics and budgets. These budgets induce interdependence across auctions, as each bid must balance the value and cost of the current impression opportunity with the uncertain values and costs of future opportunities. As a result, truthful bidding is no longer optimal.
In this article, we develop a near-optimal bidding algorithm congruent with the features and constraints of the RTB ecosystem. We term this strategy “near-optimal” because it performs within 2% of the theoretic upper bound possible only with perfect foresight and outperforms the best available alternative by 11%. Furthermore, the algorithm leverages only the limited data available to bidders; the campaign budget, the forecasted impression value, and the sequence of impression costs incurred by the focal advertiser. It is capable of processing the enormous volume and velocity of auctions while meeting the rapid response times required by the advertising exchanges. It ensures that the advertiser’s budget constraint is satisfied, and it remains agnostic to competitors’ strategies and the distribution of auction attributes. Moreover, it constitutes a unique competitive equilibrium, ensuring that no advertiser has incentive to unilaterally deviate. Finally, it can be combined with Thompson sampling to simultaneously estimate impression values with respect to their incremental impact and optimize bidding behavior. Notably, it can do both while still outperforming the best available alternative algorithm, even when the alternative is provided accurate valuation information and asked only to optimize bidding behavior. In addition, Thompson sampling induces exogenous variation in the advertiser’s bidding behavior, mitigating selection biases common in display advertising measurement.
To achieve this, we frame the advertiser’s challenge as one of budget-constrained value maximization and draw parallels to the common online knapsack problem. The advertiser wants to purchase the set of impressions delivering greatest value conditional on its budget. In the offline version of the knapsack problem, commonly applied to constrained resource allocation problems in marketing (Anderson, Lodish, and Weitz 1987; Mantrala, Sinha, and Zoltners 1992), firms are assumed to know, with certainty and at the outset, the value and cost of each marketing instrument in which they might invest. This is decidedly unrealistic in the RTB setting, where advertisers must bid for each impression while the values and costs of future impressions remain uncertain. This uncertainty perfectly describes the online knapsack problem, and the extant literature has generally relied on algorithms assuming the distribution of future values and/or costs are at least partially known a priori. 1 In reality, these distributions are generally unknown and decidedly difficult to estimate (Cai et al. 2017; Zhang, Yuan, and Wang 2014). Without assuming knowledge of future impression values or costs, we prove near-optimal guarantees on the algorithm’s regret, a first among algorithmic solutions to the general class of online knapsack problems.
Our algorithm has immediate and significant implications for how $26 billion of display advertising impressions are purchased. Assuming an advertiser employs the best available alternative, it should expect an 11% increase in campaign effectiveness from adopting the proposed approach. Alternatively, it could obtain equivalent campaign value by spending just 89% of its current budget. This is a lower bound on the potential benefits, as many advertisers employ strategies that perform significantly worse than the best available alternative. For example, Google’s tools allow advertisers to bid a constant amount (Google 2018b), which our approach outperforms by 16% even when the constant amount is optimized with perfect foresight.
We expect this impact to grow as more marketers participate directly in these exchanges and more advertising formats adopt the RTB model. In a 2017 Association of National Advertisers survey, 35% of respondents indicated that they are expanding their in-house programmatic media buying capabilities, partially in an effort to reduce intermediary costs (Wolfe 2017). Without an efficient bidding strategy, these firms risk eroding much, if not all, of the potential cost savings. Our strategy also has implications for developing technologies such as addressable TV and radio that increasingly look to RTB as a funding model. DISH Network began testing an RTB exchange for addressable TV in 2015, reaching 8 million households across 210 designated market areas (Liyakasa 2015). In 2017, 15%–17% of advertisers purchased addressable television impressions, and an additional 20%–30% planned to experiment with the media in 2018 (Joe 2018). As the underlying auction formats mirror those in the RTB display advertising space, our bidding strategy and associated findings will be directly applicable.
The rest of the article is organized as follows. We review the RTB landscape, present the relevant literature on display advertising and bidding strategies, and examine concepts of competitive equilibrium in this ecosystem. Next, we formalize the advertiser’s problem and present the proposed algorithm. We then compare the proposed algorithm with an optimal bidding strategy requiring perfect foresight. We prove that the proposed algorithm converges rapidly to this optimal strategy, achieves zero regret, and constitutes a competitive equilibrium. Then, we show how the proposed bidding algorithm can be combined with Thompson sampling to simultaneously estimate impression values and learn the near-optimal bidding policy. We show that the total regret delivered under this dual objective is less than that from any competing algorithm required only to learn the bidding policy and seeded with a priori optimal parameters. This underscores the importance of employing an efficient bidding strategy, even when impression values are uncertain.
Background and Related Literature
The RTB Ecosystem
Within the RTB ecosystem, there are three fundamental roles: publishers, ad exchanges, and advertisers. Publishers manage web pages containing information in which users are primarily interested, and sell advertising nested within and around this content to generate revenue. Advertisers purchase this inventory to promote their brand, product, or service. They are ultimately responsible for designing ad creatives, establishing targeting rules, and specifying campaign objectives. Ad exchanges connect advertisers and publishers, creating a two-sided market in which publishers benefit from the presence of more buyers and advertisers are able to manage campaigns across multiple publisher websites.
When a user visits a publisher’s web page, an impression opportunity is created. To fill this opportunity, the publisher sends a bid request to the ad exchange, who then queries advertisers for bids. The bid request is typically accompanied by contextual information about the web page on which the ad would be served (e.g., domain name, URL, topic, keywords) and behavioral and demographic information about the user involved (e.g., cookie id, IP address, geographic location, interests inferred from browsing histories). Each advertiser then estimates the value of the impression opportunity, combining the information accompanying the bid request with any available private information on the user’s browsing and purchase history. In practice, this value is generally estimated with respect to the probability of a conversion outcome of interest (e.g., click, website visit, transaction) (Gordon et al. 2019). The advertiser then calculates a bid amount on the basis of the available inputs, and submits this to the advertising exchange. The ad exchange receives bids from all interested advertisers and determines the winner through an auction. The winning advertiser’s ad is then displayed to the user on the publisher’s web page. To deliver a seamless user experience, ad exchanges typically require advertisers to submit bids within 100 milliseconds. The challenges created by these short response windows are compounded by the volume of bid requests, 1.6 million per second on average (Shen et al. 2015). As a result, valuation and bidding decisions rely on automated algorithms.
Until recently, these auctions near-universally followed second-price rules. However, first-price auctions have risen in popularity, as publishers have begun sending the same impression opportunity to multiple exchanges simultaneously and determining the winner via a subsequent auction between the exchanges (Zawadziński 2018). Because multiple perverse outcomes can occur in the presence of intermediate second-price auctions (e.g., the highest bidder is not guaranteed to win the auction), many exchanges have moved to first-price rules. However, second-price auctions remain the standard when exchanges are the final arbiter of impression placement (see, e.g., OpenX 2019), and many in the industry advocate for publishers to adopt second-price rules for their auctions (PubMatic 2017). In this article, we focus on the scenario in which the final auction follows second-price rules and is preceded only by first-price auctions, if any. However, the dynamic nature of the RTB ecosystem highlights the need for research on effective bidding strategies for a variety of auction protocols.
Marketing Allocation and the Knapsack Problem
In presenting a near-optimal bidding algorithm for use in RTB auctions, we follow a long tradition of marketing research into the design and deployment of automated programs to help marketers allocate scarce resources (Leeflang and Wittink 2000; Little 1970). For example, Rust (1986) details a number of algorithms for allocating advertising spend across media, conditional on all costs and expected outcomes being known at the campaign outset. With such a priori knowledge, budget constrained resource allocation can be framed as an offline knapsack problem (Anderson, Lodish, and Weitz 1987; Mantrala, Sinha, and Zoltners 1992), for which there are known algorithmic solutions with appealing theoretical guarantees on regret (Cormen et al. 2009).
With the automation of media buying on the internet, interest in such systems has experienced a resurgence in marketing. For example, Skiera and Abou Nabout (2013) developed PROSAD, an optimized bidding algorithm designed to maximize keyword profit contribution for paid search media. Furthermore, some authors have tried to relax the strong assumptions on a priori knowledge. Schwartz, Bradlow, and Fader (2017) present a multi-armed bandit approach to optimizing display advertising allocation across websites, when the campaign budget and impression costs are known a priori, but impression values are uncertain. Building on the work of Danaher (1991), Paulson, Luo, and James (2018) propose a method to maximize campaign reach by optimally allocating spend across a set of predetermined publishers via RTB auctions. Notably, maximizing reach assumes that all impression values are known a priori, 2 though impression costs may vary.
In contrast, advertisers purchasing display advertising generally bid for each impression opportunity while uncertain of both the costs and values associated with future auctions. As a result, the advertiser must solve an online knapsack problem (Chakrabarty, Zhou, and Lukose 2007), learning the optimal policy to balance the value and cost of each opportunity with the uncertain values and costs of future realizations. Prior research on online knapsack problems has addressed this uncertainty by first assuming that items are drawn from a known distribution (Dean, Goemans, and Vondrák 2008; Kleywegt and Papastavrou 1998; Lueker 1998). Subsequent work relaxed this assumption, requiring only that the cost of all (Agrawal, Wang, and Ye 2014) or some (Amar and Renegar 2018) items be known a priori. Among the weakest assumptions are those made by Chakrabarty, Zhou, and Lukose (2007), who require that the bounds on the value-to-cost ratio be known for all items at the outset. However, the authors acknowledge that their algorithm’s performance can vary dramatically with these bounds, about which an advertiser is likely to be uncertain.
Overall, we contribute to this literature by presenting a general solution to the online knapsack problem that does not require the advertiser to know or even forecast future impression values and costs. Despite these relaxed assumptions, we prove near-optimal guarantees on the resulting average and total regret, a first among algorithmic solutions to the general class of online knapsack problems. For advertisers, this simplifies the bidding process and ensures that the algorithm can consistently deliver campaign value closely approximating that available with perfect foresight (i.e., via the offline knapsack), while meeting the realities of the RTB market.
RTB Strategies
Despite following Vickrey protocol, constraints of the RTB ecosystem preclude bidding one’s valuation as a dominant strategy (Choi et al. 2020). Across a series of Vickrey auctions, such truthful bidding is a well-known, weakly dominant strategy when bidders can accurately value each item and the auctions are independent (Vickrey 1961). Within RTB, this requires that individual impression values are accurately forecast in terms of the net present value of all incremental future cash flows, and the advertiser does not face a budget constraint. The former ensures that advertisers appropriately value impression opportunities, while the latter is required for the auctions to be independent. Neither of these criteria are generally met.
Calculating the net present value of all incremental future cash flows would require solving the attribution problem at the impression level. This includes accurately measuring display advertising’s impact on e-commerce and offline profits, allowing for cross-media synergies, and capturing potential carryover (Danaher and Van Heerde 2018; Dinner, Van Heerde, and Neslin 2014). Because RTB impressions are auctioned individually, such forecasts would need to be made accurately for each impression opportunity. In addition to the practical challenges of linking individual advertising exposures to all future profits, such measurement raises significant privacy concerns by linking online and offline data at the individual level. Faced with these constraints, researchers and practitioners typically rely on intermediate valuation metrics such as page views, clicks, and online sales.
Campaign budgets are also a ubiquitous, and often required, component of the RTB ecosystem. Forecasted impression values are uncertain, and this is amplified by the use of intermediate outcome metrics. Furthermore, advertisers frequently face binding capital constraints, limiting the amount they can spend on a campaign. Thus, budgets serve as a hedge against uncertain impression values and provide assurance that expenses do not exceed an advertiser’s ability to pay. As a result, platforms generally require advertisers to establish budgets before a campaign (Google 2018a), inducing interdependence between auctions. 3
Faced with these practical constraints, researchers have worked to develop feasible bidding strategies. These strategies generally take as input the forecasted value from the advertiser’s targeting algorithm and output a monetary bid amount. Early work focused on pacing algorithms (e.g., Lee, Jalali, and Dasdan 2013), selectively bidding in a subset of auctions to ensure that advertisers do not prematurely exhaust their budget and miss potentially valuable future impression opportunities. However, in doing so, they trade total campaign value for smooth budget delivery.
Others have focused on methodologies that explicitly balance impression values and costs. Zhang, Yuan, and Wang (2014) show that impression opportunities with lower conversion probabilities tended to be undervalued in the marketplace. Capturing this trade-off between bid amount and winning probability, they provide two bidding algorithms. Using a series of simulations and a field experiment, they show that their algorithms outperform several common strategies including bidding a constant amount, bidding below some threshold amount (commonly defined by expected cost per action), and linear form bidding (i.e., bidding proportional to the expected cost per action). However, their approach requires the distribution of highest competing bids to be directly estimated. This is problematic, because each opportunity is defined by a large number of attributes, only some of which are observable by each advertiser (i.e., private information is prevalent). Despite this, we use Zhang, Yuan, and Wang’s approach as a benchmark, overcoming challenges associated with estimating the distribution of highest competing bids by fixing the underlying parameters for their approach at the a posteriori optimal values. In contrast, our strategy must simultaneously bid and learn, providing a conservative test of the proposed approach.
Other researchers have directly addressed the strategic nature of the underlying auction. Balseiro, Besbes, and Weintraub (2015) introduce the fluid mean-field equilibrium (FMFE) concept. Relative to perfect Bayesian equilibrium strategies, FMFE strategies drastically reduce the information requirements for agents and dramatically increase computational feasibility. In the RTB environment, the authors show that FMFE strategies approximate best response strategies, even in thin markets with few advertisers. Balseiro and Gur (2019) develop a game-theoretic approach to bidding when all advertisers measure impression value in terms of expected profits. They cast the problem as a sequential game of incomplete information, in which advertisers try to guarantee themselves a portion of the campaign profit achievable with perfect foresight. Their strategy is based on a dynamic learning algorithm and requires only the incurred costs and a fixed learning rate parameter as inputs. This serves as an additional benchmark to the proposed strategy, and we again set the underlying parameter at its a posteriori optimal value.
To date, this literature has generally taken impression values as given, assuming that the advertiser’s targeting algorithm is both optimal and accurate. In practice, there is a mismatch between advertiser objectives and the targeting algorithms deployed. While maximizing campaign value requires that impressions be rated with respect to their incremental impact, commonly employed targeting algorithms score impressions based on the associated browser’s conversion probability (Choi et al. 2020). Waisman et al. (2019) tried to resolve this conflict, showing that when an advertiser is able to measure impression values in monetary units and faces no budget constraint, the problems of learning an optimal bidding policy and estimating the incremental effects of advertising exposure converge. This result stems from the advertiser’s incentive to bid its valuation in the absence of a budget. Under specific assumptions regarding the distribution of impression values and costs, the authors provide an algorithm capable of simultaneously learning an optimal bidding policy and estimating advertising response.
We contribute to the literature on RTB strategies by developing an algorithm that directly maximizes campaign value while satisfying the constraints faced by advertisers. In contrast to pacing strategies, our algorithm directly maximizes total campaign value, constraining spending only to the extent necessary to balance the value and cost of each impression. In contrast to proposed game-theoretic alternatives, the proposed strategy can accommodate any valuation metric, including common intermediate measures and incremental future profits, while still constituting a unique competitive equilibrium. Furthermore, we show that the distribution of competing bids need not be known or even directly estimated, greatly simplifying the advertiser’s task. Under these general conditions, we show how the algorithm can be used to learn the optimal targeting and bidding policies simultaneously, greatly expanding its applicability.
Problem Setup and Proposed Algorithm
Defining the Advertiser’s Problem
At the start of each campaign, we assume that the focal advertiser allocates a budget,
where
For now, we assume that the advertiser accurately forecasts impression value. In Web Appendix B, we examine how campaign value and the relative performance of the proposed bidding strategy are impacted by inaccurate forecasts. In the “Estimating Valuation While Learning the Bidding Policy” section, we show how the proposed strategy can be combined with Thompson sampling to also estimate impression values.
Because these are Vickrey auctions, the focal advertiser pays
To define the advertiser’s objective, we make the following assumptions. First, we assume that the value
Note that the highest competing bid,
However, an argument can be made that advertisers should value opportunities in terms of incremental profits. While this can be infeasible or even inappropriate in some instances (e.g., public universities targeting potential applicants), valuing opportunities in this manner is likely desirable in most cases. In such instances,
Motivating the Algorithm
To motivate the bidding algorithm and provide intuition, we briefly consider a version of the problem with three additional constraints. First, we require that the budget constraint in Equation 2 need only be satisfied in expectation. Second, we assume that the advertiser commits to a per-period strategy and does not update it with new information. Third, we assume that the joint distribution describing impression values and competing bids is static. We use these additional constraints only to motivate our zero-regret bidding strategy. We relax the first two assumptions when we introduce our bidding strategy, and we relax the third when we discuss potential dynamics in Web Appendix D.
Given these assumptions, the advertiser’s objective function can then be rewritten as
Because auctions are stochastically equivalent a priori, they can be treated interchangeably. Thus, maximizing the expected value from each auction also maximizes total campaign value.
Consider the Lagrangian of the optimization problem in Equation 3,
Recall that
To specify the complete bidding strategy, it remains to optimize for the choice of
Thus, the optimal value of
This bidding strategy has several intuitive features. First, it ensures that the advertiser wins the subset of auctions for which the value per expenditure,
Second,
Third, the optimal bid depends only on the focal advertiser’s valuation and
The Algorithm
In this subsection, we present a learning algorithm that converges to
From Equation 5,
where
If an advertiser were to select
Intuitively,
This algorithm relaxes the first two assumptions from the “Motivating the Algorithm” subsection. First, the algorithm stops at the end of
Analysis
In this section, we show that the proposed algorithm converges rapidly to the optimal bidding strategy, delivers a nearly optimal amount of forecasted value, and constitutes a unique competitive equilibrium. We present formal proofs for the underlying lemmas and theorems in Web Appendix F. We begin by establishing that the aforementioned update rule results in a sequence
Proving Convergence
Lemma 1 states that the sequence
Oracle and Regret
To quantify the performance of any bidding strategy, we compare the expected total value obtained through the strategy to that achievable with perfect foresight. Consider an oracle that has access to all the values and competing bids at the start of the campaign. The oracle’s objective reduces to the well-known zero-one knapsack problem. In the canonical example of the problem, a traveler is packing for a trip. He would like to bring all of his possessions but is limited by the weight he can carry. Thus, he packs the combination of possessions that will deliver the most value, subject to the amount of weight he can carry and the indivisibility of each item. For the oracle, each item is an auction, the weight constraint is the campaign budget, and the weight and value of each item are given by the cost and valuation, respectively.
Clearly, the optimal value obtained by the oracle is an upper bound on the value obtained by any bidding strategy lacking foresight into the future values and costs. The relaxed knapsack problem, in which items are assumed divisible, provides an upper bound to the zero-one knapsack solution, and has the benefit of being solved in polynomial time. With this relaxation, the optimal assortment can be selected by sorting the items in descending order according to the ratio of value and cost, and then selecting the largest prefix of items that does not violate the budget constraint. In other words, there exists a critical threshold,
We define the regret resulting from any bidding strategy to be the difference between the expected value obtained by the oracle’s solution to the relaxed knapsack problem and the expected total value obtained by the bidding strategy. A zero-regret bidding strategy is a strategy whose average regret per auction tends to zero with probability
Proving Appropriate Pacing
To prove that the proposed algorithm is a zero-regret strategy, we first establish that it is unlikely to prematurely exhaust the campaign budget and the risk of doing so decreases as the number of auctions grows large. Let
Theorem 1 states that the algorithm exhausts the capacity with at most
Proving Regret Bounds
Theorem 2 states that the algorithm achieves zero regret. Intuitively, the total regret is small, because the algorithm converges rapidly to the optimal strategy and does not prematurely exhaust the budget. Because the regret
The total regret achieved by the algorithm is also near-optimal. Marchetti-Spaccamela and Vercellis (1995) show that the expected regret even when
Establishing a Unique Competitive Equilibrium
When characterizing auction behavior, it is important to carefully consider the competitive landscape, including potential dynamic interactions between bidders. We frame these interactions as advertisers competing in a dynamic game, where the bidding strategy employed by one advertiser affects the bidding landscape for others. However, the number of potential competitors and limited competitive information generally precludes analysis relative to classic game-theoretic concepts of equilibrium (e.g., Nash equilibrium).
For advertisers, formulating direct competitive responses, a common requirement in classic game-theoretic analyses, is decidedly unrealistic and practically infeasible. In the RTB environment, bidders generally know little about the competition they face. They are typically unaware of the number of competing advertisers, much less their identities, valuations, budgets, objectives, or bidding strategies. During a campaign, advertisers generally bid on millions of impressions and compete with thousands of advertisers. Budgets vary greatly by advertiser and campaign, and not all advertisers actively bid in all auctions. Campaign objectives vary along myriad dimensions, and are not always directly comparable. Furthermore, the exchanges generally share only the second-highest bid, and only with the winning bidder. As a result, solving for classic equilibrium concepts in such a game is generally intractable, both analytically and computationally, and is unlikely to reflect the data generating process.
Instead, we leverage the concept of an FMFE to show that our strategy closely approximates the behavior of rational bidders in these markets. Introduced by Balseiro, Besbes, and Weintraub (2015), the FMFE approximation relies on two assumptions. First, the mean field approximation assumes that the number of advertisers is sufficiently large, such that no one advertiser can unilaterally influence the bidding landscape. As a result, advertisers can rely on a stationary distribution characterizing the maximum competing bids. Second, the fluid approximation assumes that the number of bidding opportunities for an advertiser is large and the expected cost per opportunity is small relative to the advertiser’s budget. Both these criteria fit well in the RTB context. Given these criteria, Balseiro, Besbes, and Weintraub (2015) show that FMFE strategies closely approximate the behavior of rational bidders, even in small markets with few advertisers, and establish the criteria for the existence and uniqueness of the equilibrium.
Theorem 3 establishes the competitive equilibrium. In the formal proof, we show that there exists a unique profile of scaling factors
Empirical Performance
In this section, we explore the empirical performance of the proposed bidding algorithm. First, we present a stylized example that provides the operational details of the algorithm and demonstrates the convergence of the bidding strategy. Next, we examine the algorithm’s finite sample performance across a series of 100 realistic simulations and 10 real-world campaigns. We evaluate this performance based on the difference in campaign value possible with perfect foresight (i.e., the oracle in the “Oracle and Regret” subsection) and that delivered by the proposed strategy. We also compare this performance with that of several alternative strategies identified from the extant literature and industry practice. Across the simulations and real-world campaigns, we consistently find that the proposed algorithm closely approximates the theoretic upper bound and significantly outperforms available alternatives.
Stylized Example
Consider an advertiser managing a retargeting campaign, who evaluates each impression opportunity based on two attributes: the age of the user and time elapsed since the user visited the advertiser’s website. The advertiser’s valuation, measured as the probability of conversion, is obtained through a logit model of the two attributes:
where
Next, we illustrate how the proposed algorithm can be applied to compute bids for the first few auctions of a campaign. We set the average budget available per auction
We arbitrarily initialize
The advertiser’s bid is then
Let the highest competing bid be
At auction 2, the advertiser observes
The advertiser’s bid is
At auction 3,
Table 1 repeats this computation for the first ten auctions, while Figure 1 demonstrates the convergence of the sequence
Stylized Example.

Stylized example: convergence of
To calculate this value, the oracle first sorts the auctions in descending order according to their value-to-cost ratio (see Table 1, Column 7). Here, the sorted order of auctions is
Simulations
We begin testing the finite sample performance of our algorithm using a series of 100 realistic simulations. For each impression opportunity, the focal advertiser’s valuation and the maximum competing bid are drawn from a joint distribution
Notably, the previous distributions are used only to simulate the underlying sequence of auctions. Throughout this article, we assume that the distribution of
The underlying distributions were selected in collaboration with a demand-side platform (DSP). This DSP purchases display advertising impressions through the real-time exchanges on behalf of hundreds of advertisers, and provided a random sample of impressions costs for 10.7 million impressions it served. This distribution is plotted in gray in Figure 2. The distribution of

Observed CPM distribution over 10.7 million impressions.
Each of the 100 simulations contains 10 million auctions, with
Numerical Efficiency
To test the speed at which the sequence

Convergence to
Comparison to Alternative Algorithms and the Theoretic Upper Bound
To understand the performance of our algorithm, we compare the total campaign value delivered by our approach to several alternatives. The details of each of the benchmark algorithms are provided in Web Appendix G. First, we focus on the ORTB1 algorithm introduced by Zhang, Yuan, and Wang (2014). This specification is designed for situations in which reserve prices are low or nonexistent, which accurately describes our simulations. Through a series of simulations and a field experiment, Zhang, Yuan, and Wang show that ORTB1 outperforms a number of alternative, common bidding strategies (e.g., bidding a constant amount, bidding below some threshold amount, and bidding in proportion to the expected cost per action). By showing that our algorithm outperforms ORTB1, we provide evidence that it also outperforms these alternatives.
To provide a strenuous test of our approach, we set the
Second, we compare our performance with the algorithm proposed by Balseiro and Gur (2019; BG hereinafter). We again set the underlying parameter at its a posteriori optimal value to provide a conservative comparison to the proposed strategy. As described in the “RTB Strategies” subsection, BG applies only when a monetary value can be assigned to each impression opportunity. Thus, we assume that the value assigned to each simulated opportunity is in monetary units. Because BG’s goal is to maximize profit, the campaign budget may not be exhausted when there exist too few profitable opportunities. Thus, we measure the total value achieved as the sum of the value obtained from purchased impressions and the budget remaining at the end of the campaign. While we present a single comparison to BG for the sake of parsimony, we have compared the approaches under a wide variety of assumptions on the prevalence of profitable opportunities. We consistently find the proposed algorithm delivers significantly greater value.
Finally, we compare our performance with an algorithm that bids a fixed amount for each opportunity (Fixed-Bid). While such a strategy has been shown to be inferior to available dynamic strategies (Zhang, Yuan, and Wang 2014), it is still commonly enabled by platforms such as Google’s ad network (Google 2018b). We provide this comparison to help researchers and advertisers understand the opportunity cost of employing such an approach. To provide the most conservative possible comparison and a lower bound on this opportunity cost, we bid the optimal fixed bid amount given perfect foresight in each simulation.
Each of these comparisons is made relative to the theoretic upper bound, provided by the oracle with perfect foresight. As described in the “Oracle and Regret” subsection, the oracle has access to all values and competing bids at the start of the campaign, making the value obtained by the oracle an upper bound on the value obtained by any algorithm that has access only to the history of values and observed costs. That is, there exists no online bidding algorithm that can outperform the oracle for a single campaign, much less in expectation. By using the oracle in our comparisons, we convey both the degree to which our approach outperforms available alternatives and the extent to which any potential future algorithm could outperform the proposed strategy.
Figure 4 plots the percentage loss from ORTB1, BG, Fixed-Bid, and our algorithm as compared to the relaxed knapsack with perfect foresight. These plots are based on 100 simulations, each containing 10 million auctions, with

Mean percentage loss: static simulations.
Empirical Performance: DSP Data
We further compare the empirical performance of our algorithm using a large panel data set of real impression values, bids, and outcomes provided by a DSP. The data detail advertising valuation, bidding, and exposures for 990,360 randomly selected users between September 29, 2014, and January 2, 2015. When the data were collected, the collaborating DSP purchased RTB impressions on behalf of 401 advertisers. Each time the DSP receives a bid request, it calculates the value of the impression opportunity for each of the campaigns it manages. This value, referred to as the “Targeting Score,” reflects the DSP’s belief that a browser will convert for a given advertiser (e.g., visit the advertiser’s website or make an online purchase) and is specific to the advertiser and impression opportunity. It is estimated using data provided by the exchange (e.g., IP address and website on which the ad will be served), purchased from third-party providers (e.g., inferred interests), and proprietary to the advertiser (e.g., purchases and browsing on its website). The DSP then decides whether and on behalf of which campaign to submit a bid.
Following bid submission, the DSP records data describing the opportunity for later analysis and reporting. This includes an identifier for the campaign for which the bid was submitted and the calculated value,

Process by which targeting scores are documented.
Stylized DSP Data.
Because impression costs are observed only when the DSP wins an auction, we restrict our attention to those opportunities. This does not restrict us to the set of impressions won by any given advertiser, as the DSP works for multiple advertisers simultaneously. Instead, for each advertiser, we observe impression values and costs for all auctions won by that advertiser as well as auctions won by other DSP clients for which the focal advertiser’s “Targeting Score” was saved. Building on Table 2, we present the resulting data for advertiser A in Table 3.
Stylized Campaign Data: Advertiser A.
These real-world data on counterfactual impression costs do not alter the sequence of data revealed to the online algorithms, but they do enrich our comparisons. The data allow us to compare the proposed bidding algorithm with that deployed by the focal DSP. They also establish the oracle’s strategy, and therefore regret. Finally, they are used to optimize parameter values for the competing algorithms a priori, providing a conservative test of the proposed approach. In contrast, the impression cost used by the online algorithms is still equal to the observed cost when the algorithm’s recommended bid exceeds that value and $.00 otherwise.
For each campaign, we thus compiled a chronologically ordered panel containing the targeting score and impression cost for a subset of opportunities. We focus on the ten campaigns with the most opportunities, the details of which are available in Table 4. Here, the second column contains the total number of opportunities for which we observe the targeting score, the third column reflects the number of impressions served by the DSP, and the fourth column contains the cost of purchasing those impressions. Note that for each campaign, the number of opportunities is greater than the number of impressions served, as discussed previously.
DSP Campaign Summary Statistics.
For each campaign, the data are used to evaluate the performance of the proposed algorithm and the alternatives described in the previous subsection. For the alternative approaches, we again fix all underlying parameters at their a posteriori optimal values while requiring our algorithm to optimize
Figure 6 plots the value of

DSP data:
As before, the efficacy of each algorithm is measured as the percentage regret relative to the theoretic upper bound. Figure 7 plots the regret produced by the DSP’s bidding strategy, ORTB1, BG, Fixed-Bid, and our algorithm for each of the ten campaigns. Across campaigns, the proposed algorithm produces regret of just 1.73% on average. That is, the proposed algorithm delivers on average 98.27% of the value achieved by the oracle. In contrast, ORTB1, BG, and Fixed-Bid deliver 80.46%, 84.03%, and 81.86% of this value on average. The DSP’s current algorithm performed the worst, delivering on average only 55.84% of the oracle’s value. Thus, across ten real-world campaigns, our algorithm performed within 2% of the theoretic maximum and outperformed the best available alternative by more than 14%. It is worth noting that the results indicate that the DSP’s loss in campaign value attributable to inefficient bidding is between 12.4% and 84% and varies greatly across campaigns.

Mean percentage loss: empirical data.
Estimating Valuation While Learning the Bidding Policy
To this point, we have assumed that the advertiser accurately forecasts the impression value for each auction. While the proposed algorithm consistently outperforms available alternatives when such forecasts are inaccurate (for further details, see Web Appendix B), total campaign value does decline. In this section, we show how the proposed bidding algorithm can be combined with Thompson sampling to simultaneously estimate impression values and learn the optimal bidding policy. For expositional simplicity, we focus on the static algorithm presented in the “Problem Setup and Proposed Algorithm” section. However, because the distribution of impression values may change as the advertiser updates its valuation function, the dynamic algorithm presented in Web Appendix D may offer improved performance. The underlying mechanism of estimating impression values does not depend on whether the bidding algorithm presented in the “Problem Setup” section or Web Appendix D is used, and the results presented here are robust to either.
Multi-Arm Bandit Formulation and Algorithm
The advertiser’s problem can be framed as a contextual multi-arm bandit. Each auction
This setting deviates from the traditional contextual bandit problem in several ways. First, the advertiser’s actions (i.e., bids) are continuous, not discrete and finite. Second, at each auction, the advertiser incurs a cost, equal to the highest competing bid if an impression is served and zero otherwise. Because of the budget constraint, this cost induces interdependence across auctions. Third, rewards are correlated across arms because multiple bids can produce the same reward. All bids exceeding
We address the second two challenges, correlated rewards across arms and contexts, using Thompson sampling, similar to Schwartz, Bradlow, and Fader (2017). Thompson sampling, a Bayesian heuristic to choose actions in a multi-arm bandit setting, parameterizes the reward distribution and learns the associated parameters through repeated interactions. At each auction
We model the probability of a conversion associated with auction
where
As this measure is based on an impression’s incremental impact on the outcome of interest, it overcomes many of the issues associated with valuing impressions based on the user’s absolute conversion probability, a common practice as discussed in the “RTB Strategies” subsection. The uncertainty in valuation arises from the uncertainty around the parameters
We now propose an algorithm using Thompson sampling to estimate these parameters. At the start of the campaign, we assume a normal prior for the parameters, such that
Each auction begins with the advertiser receiving the attribute vector
where
We now summarize the algorithm for completeness. The algorithm takes as input the total number of auctions Receives attribute vector Draws a random sample Computes Submits bid Observes whether an impression is served Computes the probability of conversion Updates its belief distribution Sets
:
The multi-arm bandit formalization of the advertiser’s problem and its solution presented previously provide several advantages. First, Thompson sampling has been shown to yield near-optimal asymptotic regret bounds (Agrawal and Goyal 2013) and state-of-the-art performance with finite samples (Chapelle and Li 2011; Kaufmann, Korda, and Munos 2012). This is reflected in the empirical performance of the proposed algorithm, as we demonstrate in the next subsection. Second, the proposed algorithm can be extended to the case where the advertiser chooses between multiple creatives in addition to the bid for each auction. Here, the probability of conversion can be modeled for each creative using creative-specific coefficients
Regret Analysis
To explore the relative impact of uncertainty over impression values and bidding policies, we use data from iPinYou, a Chinese DSP (Liao et al. 2014). The data contain information on RTB auctions for 15 display advertising campaigns managed by iPinYou from March 6–17, 2013, and October 19–27, 2013. Each auction is associated with contextual information, such as the publisher’s website, a time stamp, and a measure of ad slot visibility. iPinYou augmented these contextual attributes with user specific demographics (e.g., gender, location) as well as marketing information (e.g., customer segment assignments). Table 5 provides the complete list of attributes. Finally, the data include the bids submitted by iPinYou, the costs they incurred, and user feedback, measured by clicks on the display ad impressions.
iPinYou Auction Attributes.
We begin by constructing a binary classification model to estimate the probability a user clicks on an impression, given the contextual and user attributes. Following Amar and Renegar (2018), we opt for a binary logistic regression model; remove unique or nearly unique features such as iPinYou ID and URL; and dummy-code categorical variables such as device type, gender, and consumer segment. In our simulations, the resulting estimates serve as the ground truth for
Similar to the previous section, we then simulate 100 campaigns, each containing 10 million auctions. For each simulated auction, the contextual and user attributes are drawn randomly from their empirical distribution in the iPinYou data set. At the beginning of each campaign, we set
We again compare our algorithm’s performance to that of ORTB1, BG, and Fixed-Bid based on total regret. As in the “Empirical Performance” section, regret is defined as the number of conversions produced by an algorithm relative to the theoretic upper bound. We again seed competing algorithms with the optimal bidding parameters for each simulation, providing the most conservative possible test of our proposal. However, we do require that each algorithm learn impression values in real time using Steps 1–7 of the process outlined previously.
To compare the relative impacts of uncertainty over impression values and the bidding policy, we provide two upper bounds. The first, which we refer to as the TrueOracle, is produced by an algorithm that has perfect foresight into both the true valuations and the highest competing bid for each auction. This oracle is identical to that in the “Empirical Performance” section. The second, which we call the BiddingOracle, has perfect foresight into the highest competing bids
By comparing an algorithm’s performance with each oracle, we can identify the portions of regret attributable to uncertainty over the bidding policy and uncertainty over impression values. Because the BiddingOracle and the algorithm to which it is being compared share a belief distribution
Figure 8 presents the total regret produced by each algorithm over the series of 100 simulated campaigns, each containing 10 million auctions. The average regret attributable to uncertainty over the bidding policy (i.e., the regret relative to the BiddingOracle) is less than 2% for the proposed algorithm, but 19.62%, 16.37%, and 18.24% for ORTB1, BG, and Fixed-Bid, respectively. The regret due to uncertain valuations (i.e., the regret of each BiddingOracle relative to the TrueOracle) is approximately 8.2% for each of the bidding algorithms. This similarity across algorithms is largely a result of using the same Thompson sampling technique to estimate

Mean percentage loss: uncertain valuations and bidding policy.
Finally, it is worth noting that the total regret of the proposed algorithm, 10.1%, is less than the minimum regret attributable to the bidding policy of any other algorithm, 16.37% for BG. This indicates that the proposed algorithm consistently outperforms available alternatives, even when it is asked to simultaneously estimate impression values and learn the optimal bidding policy while the alternatives are provided with perfect foresight into impression values and seeded with optimal bidding parameters. This underscores the importance of employing an efficient bidding strategy, even when impression values are uncertain.
Conclusion
In this article, we introduced a near-optimal bidding algorithm for firms purchasing advertising through real-time auctions. This is now the dominant distribution channel for internet display advertising and a growing funding model for addressable television and radio. Because the impressions are sold as a user loads a web page, these exchanges allow advertisers to target each impression by individual, publisher, and time. With this flexibility, U.S. advertisers are expected to purchase $26 billion of display advertising through RTB exchanges in 2020, representing nearly half of all display ad spending (Hoelzel and Ballve 2015).
For the advertisers, the speed and volume of these auctions mandate the use of automated algorithms. By developing a near-optimal bidding algorithm, our work follows a long history of marketing research into the effective design and use of automated programs to help marketers allocate scarce resources (e.g., Leeflang and Wittink 2000; Little 1970; Rust 1986). It is also in line with more recent marketing research focused on the development of programmatic solutions to optimize digital advertising, (e.g., Paulson, Luo, and James 2018; Schwartz, Bradlow, and Fader 2017; Skiera and Abou Nabout 2013; Waisman et al. 2019). Finally, we contribute to the growing literature seeking solutions to the online knapsack problem. We are able to prove near-optimal average and total regret without assuming that the advertiser has any knowledge of the distributions describing future impression values and costs. To the best of our knowledge, this is unique among algorithmic solutions to the online knapsack.
In addition to being easily implemented, the proposed algorithm satisfies the real-world constraints of the RTB ecosystem. It is computationally efficient and capable of processing the volume and velocity of auctions while meeting the rapid response times required by the advertising exchanges. It is guaranteed to converge to the optimal strategy and achieves zero-regret based only on the sequence of historical auction costs incurred by the focal advertiser. Importantly, this algorithm allows advertisers to learn the optimal strategy in the course of a campaign without assuming or directly estimating the distribution describing impression values and costs. This mitigates a formidable challenge identified in the existing literature on bidding strategies. It also addresses competitive concerns, as following the proposed strategy constitutes a competitive equilibrium, allowing advertisers to be agnostic with respect to competitive strategies. Finally, we show how the bidding algorithm can be combined with Thompson sampling to simultaneously estimate impression values and learn the optimal bidding policy.
Across a series of simulations and real-world tests, the proposed algorithm significantly outperformed existing alternatives. Across 100 simulations containing 10 million auctions each, we found that our approach consistently delivered 11% more value than available alternatives and 99% of the value possible with perfect foresight. Using data from ten real-world campaigns, the proposed algorithm outperformed the next best approach by 14% and delivered 98% of the value possible with perfect foresight. Thus, any future solution to the problem can, at best, outperform the proposed algorithm by 2%. When asked to simultaneously estimate both impression values and the optimal bidding policy, the proposed algorithm continued to outperform available alternatives, even when these alternatives were provided accurate impression values and seeded with a priori optimal bidding parameters. This provides strong evidence of the algorithm’s potential impact, even when advertisers are uncertain with respect to impression values.
We expect this work to have a significant and increasing impact on how advertisers purchase media through the real-time auctions. The proposed approach outperforms the best available alternative by 11%, even when the parameters underlying competing algorithms were optimized a priori. Reframing this in terms of campaign spend, a cost-conscious advertiser that unilaterally deviates from this strategy to ours can expect to deliver equivalent campaign value while spending just 89% of the current budget. In many instances, the differences may be even more dramatic. Many large, sophisticated DSPs regularly recommend and enable advertiser bidding strategies that deviate even more significantly from the optimal (e.g., fixed-bid; Google 2018b). Partially in response to perceived inefficiencies and intermediary costs, advertisers have begun shifting programmatic media buying capabilities in-house (Wolfe 2017). Without a solid bidding strategy, these firms risk eroding much, if not all, of the potential cost savings. For advertisers, this article introduces a near-optimal bidding strategy, highlights the opportunity cost associated with alternative strategies, and presents a method to simultaneously estimate impression values based on their incremental impact. Looking forward, these implications extend beyond internet display advertising, as RTB is increasingly considered a potential funding model for addressable television and radio.
This work is not without limitations. We provide a near-optimal bidding strategy in the form of an implementable learning algorithm for use in sequential, stochastic Vickrey auctions. This is a common auction protocol in RTB for internet display advertising. However, ad exchanges and other intermediaries continually experiment with auction protocols. Soft floors and first-price auctions are two permutations currently receiving attention. Soft floors set a price threshold, unknown by the advertiser, below which the auction follows first-price protocols. While Zeithammer (2016) shows that this is revenue neutral at best, it is still used occasionally. Because of “header bidding,” many exchanges now operate as intermediaries rather than the final arbiter of impression placement. As a result, some exchanges have adopted first-price auction protocols, while some publishers, which now operate the terminal auctions, have adopted second-price auctions. So long as the terminal auction follows second-price rules and preceding auctions, if any, follow first-price rules, the proposed algorithm will perform as described. However, these industry dynamics highlight the need for more research on optimal bidding strategies, especially when the auctions are sequential and the protocols are mixed.
Supplemental Material
Supplemental Material, WebAppendices - A Near-Optimal Bidding Strategy for Real-Time Display Advertising Auctions
Supplemental Material, WebAppendices for A Near-Optimal Bidding Strategy for Real-Time Display Advertising Auctions by Srinivas Tunuguntla and Paul R. Hoban in Journal of Marketing Research
Footnotes
Associate Editor
Peter Danaher
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
