A new optimization layer for real-time bidding advertising campaigns

Abstract

While it is relatively easy to start an online advertising campaign, obtaining a high Key Performance Indicator (KPI) can be challenging. A large body of work on this subject has already been performed and platforms known as DSPs are available on the market that deal with such an optimization. From the advertiser’s point of view, each DSP is a different black box, with its pros and cons, that needs to be configured. In order to take advantage of the pros of every DSP, advertisers are well-advised to use a combination of them when setting up their campaigns. In this paper, we propose an algorithm for advertisers to add an optimization layer on top of DSPs. The algorithm we introduce, called SKOTT, maximizes the chosen KPI by optimally configuring the DSPs and putting them in competition with each other. SKOTT is a highly specialized iterative algorithm loosely based on gradient descent that is made up of three independent sub-routines, each dealing with a different problem: partitioning the budget, setting the desired average bid, and preventing under-delivery. In particular, one of the novelties of our approch lies in our taking the perspective of the advertisers rather than the DSPs. Synthetic market data is used to evaluate the efficiency of SKOTT against other state-of-the-art approaches adapted from similar problems. The results illustrate the benefits of our proposals, which greatly outperforms the other methods.

Keywords

Demand Side Platform (DSP)online advertising gradient descent optimization Real Time Bidding (RTB)

1. Introduction

Online advertising is a vast market, worth several billion USD per year [11]. It is easy to understand the importance of optimization in such a market: every percent of increase in efficiency has a value on the order of millions of USD. From the advertiser’s point of view, however, optimization is often very difficult due to the constraints imposed by the structure of the ecosystem of online advertising. Let us see why.

There are currently two main paradigms in the market: sponsored search and Real-Time Bidding (RTB) auctions. Sponsored search, the main source of revenues for search engines, consists of showing relevant ads whenever a user inputs a query. For example, typing “football shoes” in the search bar will provide the user with several links to online shops selling sports apparel. In order to appear amongst the sponsored links, an advertiser must place a bid on the queries or keywords to which it wants to be connected.

Typically, advertisers pay if and only if their sponsored link is clicked on; this creates a collaborative effort between search engine and advertiser to show the most promising ad. Sponsored search has been the subject of lots of research papers over the years, dealing with topics like budget optimization [8, 12] and click-through rate prediction [21, 28], for example.

The RTB paradigm, which is the focus of our optimization work, is inherently different. In the case, advertisers participate in auctions to buy the available ad space on a website. The winning advertiser is allowed to display its ad (in the technical jargon, it has bought an “impression”). Unlike sponsored search, advertisers pay for all impressions they buy, even if they do not “generate a click”, i.e., users do not click on them to be redirected to the advertiser’s page. This changes everything as it removes any interest for the auctioneer to find an ad that is a good match to the current inventory, a task that is now completely left to the advertisers. As a consequence of these differences, the optimization results obtained in sponsored search are not directly applicable to RTB campaigns. As we already mentioned, this optimization process is made difficult by the very structure of the market that, in its simplest approximation, looks as follows (cf. Fig. 1).

Figure 1.

A simple, approximate representation of the RTB auctions market structure. The proposed algorithm, SKOTT, would work as an interface between advertisers and DSPs.

The central entity is the AdExchange, whose job is to run real-time bidding auctions and assign all available inventory (i.e., the screen space on which the advertising should be published) to its corresponding winning advertiser; On one side of AdExchanges there are Supply Side Platforms (SSPs), which provide the inventory and are in direct contact with the publishers (e.g., the owner of a web page); On the other side, on which we focus, are Demand Side Platforms (DSPs). DSPs bid on the available inventory on behalf of the advertisers according to the advertisers’ necessities.

Each step of this chain brings its own constraints and optimizations. For example, a DSP might work only with certain AdExchanges, effectively limiting the amount of inventory available to the advertiser, but it might also offer better performances on some particular indicators due to internal optimization algorithms.

It is easy for an advertiser to set up a campaign with a DSP and monitor its effectiveness by calculating certain Key Performance Indicators (KPIs). On the flip side, the use of a DSP prevents the advertiser from directly determining the bid on each individual auction, only allowing it to fix some average parameters to associate with larger sets of impressions. We call each of these abstract entities made of a set of impressions and its corresponding parameters a media object.

Each media object can almost be treated as a separate entity with its own associated budget that can change over time. The only global constraint is that the total budget of the campaign is fixed. A single media object makes the optimization easy to handle but inefficient because it treats all impressions in the same way, bidding roughly the same amount of money for all impressions and showing the same advertising to all users. A correct choice of the parameters of the media objects can therefore have a huge impact on the global optimization of the campaign.

But this setup leaves advertisers with many questions: What is the most appropriate DSP to use amongst the many available on the market? How to parametrize it? How much budget to assign to each of its media objects? These questions are often answered by human experts that base their decisions on past experience, intuition, and some off-line data analysis. However, there is no guarantee that the goals set by the experts are reachable, nor that they are optimal.

In recent decades, researchers have focused mainly on market models [10, 18, 26] and bidding algorithms for DSPs [5, 27, 9, 16, 20]. However, to the best of our knowledge, no paper tackles the optimal management of a campaign from the point of view of an advertiser using a DSP. The new constraints that come with such a perspective make other optimization algorithms studied in the literature difficult to compare with ours. For example, our need to partition the budget arises from the necessity to work with media objects: when an impression arrives that is a good fit to a particular media object, we need to be sure that the media object has a sufficient budget to spend. This problem does not exist when one can decide how much money to spend on a single impression basis, with the only constraint of the total campaign budget.

The main contribution of this paper is the a new algorithm, the SKOTT algorithm, that solves this understudied problem: SKOTT automatically handles advertising campaigns, finding the best parameters to put inside each DSP in order to maximize the performance. It also allows the contemporary use of different DSPs that are put in competition to further increase the performance. Therefore, not only does it give a recipe to fully take advantage of any single DSP, but it also adds a new layer of optimization on top of them. The algorithm reacts quickly to market variations and scales linearly with respect to the number of media objects.

The remainder of this paper is organized as follows. In Section 2 we discuss a few algorithms that deal with a similar problem and have been an inspiration for our work. In Section 3 we state the problem at hand and our goals. Sections 4.1–4.3 deal with the three independent sub-routines that compose SKOTT. The conducted experiments and results obtained are discussed in Section 5. Conclusions are drawn in Section 6. Finally, appendices give details on the creation of the synthetic market data, the algorithms we chose for comparison of the results, and the technical part of the implementation.

2. Related work

The problem of how to spend a budget in order to maximize the profit during an advertising campaign has never been studied, to the best of our knowledge, from the point of view of an advertiser using DSPs. Nevertheless, there are many works in the literature that are relevant because they try to solve a similar problem. We will cite and discuss some of them in this section, mainly to highlight what is the difference in our approach.

In [8] the authors propose to use a randomized uniform strategy for choosing how much to bid on every keyword in a sponsored search. That could be applied to our problem of Real-Time Bidding auctions through the use of DSPs problem by analogy, associating every keyword with a different media object. (We refer to the introduction and to the problem statement for the definition of media objects.) However, their model assumes a complete knowledge of the bidding landscape, that is, the probability distribution of the winning bids for each impression. This is information that advertisers don’t have in the case of RTB auctions through DSPs. Also, the model in [8] requires the bidding landscape to be static, a hypothesis that we don’t require.

The problem of collecting the largest possible reward of an advertising campaign with a constraint on the budget can be also written in the linear programming formalism. This is done, for example, in [5]. There, however, the authors: 1. take the point of view of a DSP, and 2. try to optimize the revenue of publishers and not of advertisers. In particular, they assign each piece of inventory to a different advertising campaign assuming that the total Cost-Per-Click (CPC) of each campaign is fixed. This last point is clearly not the case when we look at it from the advertiser’s side because, as we said, we consider the case where advertisers pay for the inventory they buy regardless of it being clicked on. This implies that the CPC is then determined by the ratio between the average of Cost-Per-Impression (CPM) and the Click-Through Rate (CTR) of a media object, therefore it’s not constant, as we will thoroughly see when we analyze the SKOTT algorithm.

A similar work, explicitly based on [5], is [16], where a linear programming algorithm is proposed for DSPs wanting to maximize their revenues. Besides the different perspective, the authors assume a fixed CTR, known in advance. This is not required in our approach, where we infer the CTR from market data in real time and the only assumption we make on its analytical form is that it varies slowly with the bid. A linear programming approach inspired by these works will be tested against SKOTT in the simulations.

A third approach to the problem is via reinforcement learning. In this case, the bidder is considered as an agent that learns how much to bid for every individual auction. To direct its choice, it is aware of the budget constraints, the goals of the campaign, and all context information from the impression it is trying to buy. It is studied for example in [4] where, due to the huge size of the space of possible actions to take, the authors help the decision process by using a model-based approach. This approach is extremely interesting but ineffective in our case for two main reasons. First of all, the constraints under which we work prevent us from accessing individual auctions and, secondly, we obtain information not in real time but in batches that come in at larger time scales (roughly an hour). Nevertheless, an approach inspired by reinforcement learning is tested against SKOTT in the simulations: A multi-armed bandit algorithm [1, 3] is used to allocate the budget to the different media objects according to their results.

3. Problem statement

As stated in the introduction, advertisers can participate in RTB auctions by using a DSP. Typically, several DSPs are employed in order to increase the amount of people reached and to better respond to business necessities. Each DSP needs to be configured. The details of the configuration might differ from DSP to DSP, but there is a central core of abstraction that is common to all of them. We call it a media object: it is a set of instructions given by the advertisers, some of which are qualitative and set once and for all at instantiation while others are quantitative and can be changed at any moment. An example of the former is the creative associated to the media object, i.e., the actual advertising being shown.

Another example of qualitative instructions are the filters on incoming auctions that select on which impressions to place a bid depending on the user and inventory characteristics. The hourly budget and the base bid for the auctions, alternatively, belong to the latter. It is important to note that the media object is the most precise layer of abstraction that is accessible to advertisers. The only influence that they can have on the auctions, and therefore their only possibility for optimization, lies in the parameters of the media objects.

We consider an advertising campaign to be defined by: a total budget $\mathcal{B}$ , a start date, an end date, a desired spend profile (i.e., the amount of total money spent at any moment during the campaign), and a collection of $K$ media objects spread across different DSPs. The typical duration for a campaign is on the order of a few weeks up to a few months, while the value of $K$ depends heavily on the campaign and can vary from as few as 1 to over 100 000.

During the campaign lifetime, the advertiser receives information on the behavior of every media object from the different DSPs. Each data point contains hourly information on the impressions bought, the clicks generated, the money spent, and possibly other such quantities. Since advertisers don’t have access to the auctions individually but only through the media objects, they should only consider the average effect of their optimizations over all impressions. Therefore, for each media object we take an hourly average of the information received. In practice, we consider only the Click-Through Rate (CTR, the ratio of clicks generated to impressions bought) and Cost-Per-Click (CPC, the ratio of money spent to clicks generated).

The main goal of our algorithm is to change the media objects parameters in such a way as to optimize a certain KPI while keeping the desired delivery over time. In order to demonstrate a practical case, we have chosen to optimize the total number of clicks generated in the campaign. This is often a valid indicator to optimize because a user that is interested to purchase something from the advertiser’s website will probably click directly on the ads, while only a small fraction of the people that click on the ads will actually make a purchase. The click is then correlated to the monetary return of the campaign while not being as rare as an actual purchase.

4. The design choices: SKOTT

SKOTT is an iterative algorithm made up of three subroutines: budget partitioning which rewards high-quality media objects by giving them more money; base bid setting which controls the bid of each media object separately with the goal of increasing the media object’s quality; and pacing control which prevents under-delivery. Figure 2 provides a schematic view of the different steps of the algorithm. The three sub-routines act independently one from another and can therefore be analyzed in any order.

Figure 2.

A schematic view of the SKOTT algorithm.

4.1 SKOTT: Budget partitioning

In this section we deal with the budget partitioning that defines which percentage of the hourly budget should be allocated to each media object.

A budget partition is a vector of $K$ weights $\vec{w}$ . Each element $w_{i}$ represents which fraction of the total budget is assigned to the corresponding media object $i$ . A uniform distribution, where all media objects are assigned the same budget, is represented by the vector $\vec{u}=\left({K}^{-1},K^{-1},\ldots,K^{-1}\right)$ . A greedy distribution is when a single media object takes all the available budget and is represented by the vector $\vec{w}_{g}^{(i)}=\left(0,\ldots,1,\ldots,0\right)$ .

The ideal algorithm for budget partitioning should:

•
return a list of non-negative weights that sum to one at every decision epoch $t$ ,
•
optimize a specific KPI (in our case the total number of clicks),
•
promptly react to changes in the market, be they sudden or slow.

A very important point to consider when devising the algorithm is that the data must be bought through winning auctions. Reducing the budget of a media object well below the expected CPC will result in no clicks being bought, thereby gaining little to no useful information to estimate the quality of the media object. An advertiser may spend some time and money to explore the market randomly then concentrate their money on the best performing media objects. This would likely lead to an increase in the return of the campaign on average, but the price to pay is a high risk of getting stuck on a sub-optimal media object. A more dynamic algorithm that keeps exploring over time seems therefore a more reasonable choice.

There is a balance to strike between exploration and exploitation. The former is expensive, but mitigates risks and gives a better long-term investment. The latter increases the short-term reward, but might prove catastrophic over the long-term, locking the investment on media objects that are ultimately bound to fail.

Figure 3.
Algorithm for budget partitioning.

4.1.1 The update rule

The algorithm we propose is a variation of the exponentiated gradient descent method originally proposed in [14]. At each iteration, it updates the weight vector $\vec{w}$ in order to optimize the KPI. The algorithm is continued in Fig. 3. It makes use of the concept of a reward assigned to each media object at every decision epoch. The reward is a numerical way to estimate how well the media object did in the epoch. We will see explicitly what it looks like later on.

Given a vector of weights at epoch $t$ , $\vec{w}_{t}=[w_{1,t},w_{2,t},\ldots,w_{K,t}]$ , describing the distribution of the budget allocated to each of the $K$ media objects during the $t$ -th hour of the campaign, the algorithm will return a new vector of weights $\vec{w}_{t+1}$ that is closer to the minimum of the loss function

$\displaystyle\mathcal{L}_{t}(\vec{w}_{t})=-\sum_{i}R_{i}(\vec{w}_{t})+\frac{% \lambda_{t}}{2}\lVert\vec{w}_{t}-\vec{u}\rVert^{2}$ (1)

Here, $R_{i}$ is the reward associated to every media object, $\lambda>0$ is the regularization parameter (that can depend on the epoch $t$ ), and $\vec{u}$ is the vector of uniform distribution with all entries equal to $1/K$ that we introduced before. The effect of the first term of the loss function is to favor the repartitions giving larger reward. The second term, known as the regularization term, requires the repartition $\vec{w}$ to be close to the uniform distribution $\vec{u}$ . In other terms, it enforces the exploration of the market, with the consequences discussed in Section 3. The relative importance of the exploration is therefore given by the numerical parameter $\lambda$ that can be set at will. The easy interpretation of this parameter and its conceptual relevance is an important feature of our algorithm. We will discuss how to choose it at the end of this section.

The update rule defined by the exponentiated gradient descent is the following:

$\displaystyle\vec{w}_{t+1}=\frac{\vec{w}_{t}\cdot\exp\left(-\alpha*\nabla% \mathcal{L}_{t}(\vec{w}_{t})\right)}{\displaystyle\sum_{i}[w_{i,t}\cdot\exp% \left(-\alpha*\nabla_{i}\mathcal{L}_{t}(\vec{w}_{t})\right)]}$ (2)

where $\alpha$ is a real positive parameter known as the learning rate and $\nabla$ indicates the gradient of a function with respect to the vector of weights $\vec{w}$ .

4.1.2 Explicit calculation of the derivative of the loss function

The rest of this section is devoted to write what is the value of $\nabla\mathcal{L}_{t}(\vec{w}_{t})$ that is needed to update the weights. To do so, we need to explicitly define what is the reward and to find its gradient. In general, the reward is given by the goal of the advertising campaign. As we already mentioned, we will use the maximization of the number of clicks as an example. The reward is then simply the number of clicks that a media object obtains during an epoch: $\vec{R}=\vec{C}$ . Its gradient represents the relative change in the number of clicks that a media object would have generated if we had given it a slightly different budget. This clearly can not be obtained directly from the market. Our solution is to model the relation between clicks and budget analytically, derive the gradient, and then approximate it using the sampled results from the market. In equations, this reads:

$\displaystyle R_{i,t}(\vec{w}_{t})=Q_{i,t}(w_{i,t})\cdot w_{i,t}$ (3) $\displaystyle\nabla\sum R_{i,t}=\vec{Q}_{t}(\vec{w_{t}})$ (4)

where $\vec{Q}_{t}(\vec{w_{t}})$ is the (unknown) vector of the coefficients that represents conceptually the quality of the media objects. Notice that to pass from Eq. (3) to Eq. (4) we have made the assumption that $\vec{Q}_{t}(\vec{w_{t}})$ varies slowly with the weight vector so that its derivative becomes negligible. This is clearly an approximation, but a useful one whose price we happily pay.

Since $\vec{w}_{t}=\vec{B}_{t}/\sum_{i}B_{i,t}$ , where $\vec{B}_{t}$ is the vector of budgets associated to each media object at time $t$ , the quality vector can be written as:

$\displaystyle\vec{Q}_{t}=\frac{\vec{R}_{t}}{\vec{w}_{t}}=\sum_{i}B_{i,t}\cdot% \frac{\vec{C}_{t}}{\vec{B}_{t}}$ (5)

Let us notice that $\vec{Q}$ is quite similar to the vector of inverse CPCs, the two differences being the (unimportant) global positive multiplicative factor $\sum_{i}B_{i,t}$ and the presence at the denominator of $\vec{B}_{t}$ instead of $\vec{S}_{t}$ (the budget allocated instead of the money actually spent during the epoch). This is in accordance with our intuitive identification of $\vec{Q}$ with the quality of a media object because lower CPCs are desirable.

Let us also mention that, since the quality of the media objects depends on external factors, a rescaling is needed to ensure the relative importance of the regularization parameter (hence the uselessness of the global multiplicative factor $\sum_{i}B_{i,t}$ ). We thus use the rescaled quality $\vec{\widetilde{Q}}_{t}$ defined as:

$\displaystyle\vec{\widetilde{Q}}_{t}=\frac{\vec{Q}_{t}}{\text{max}(\vec{Q}_{t})}$ (6)

which ensures all the elements of the vector to be positive and not larger than 1.

Under these conditions, we can rewrite the derivative of Eq. (1) as:

$\displaystyle\nabla\mathcal{L}_{t}(\vec{w}_{t})=-\vec{\widetilde{Q}}_{t}+% \lambda_{t}(\vec{w}_{t}-\vec{u})$ (7)

4.1.3 Fighting the noise

Due to the stochastic nature of the data coming from the market, there are a few corrections to make to the model of the quality vector in order to improve the precision and the stability of the results.

Let us consider the quality factor as defined in Eq. (5). The problem is that clicks are extremely rare: A typical CTR is 0.1%, meaning that only one impression out of a thousand generates a click. However, it is always possible, albeit rare, that a media object buys a small amount of impressions and generates a click. This is, of course, just sampling noise due to the very nature of the quantities we are dealing with. However, if not taken into account, it would dominate the response of the algorithm and lead to very unstable situations. Even worse, it could lock the algorithm into a strategy whereby it puts all its money into a single, sub-optimal media object for a long time. To deal with that we make two corrections.

First of all, we put a hard bound on the gradient between $-10/\alpha$ and $+10/\alpha$ , $\alpha$ being the learning rate of the gradient descent, to avoid exploding exponentials. This is very simple and straight-forward, but it successfully prevents media objects with unusually large rewards to take all of the budget.

We claim that a better estimation of the value of the quality of a strategy can be done using a cumulative discounted version of the clicks and budgets, i.e., a variable that takes into account not only the latest data but also past data weighted by a discount factor $\gamma$ :

$\displaystyle\vec{Q}_{t}=\mathcal{B}_{t}\cdot\frac{\vec{\widehat{C}}_{t}}{\vec% {\widehat{B}}_{t}}$ (8)

where $\vec{\widehat{C}}_{t}=\vec{C}_{t}+\gamma\ \vec{\widehat{C}}_{t-1}$ , and $\vec{\widehat{B}}_{t}=\vec{B}_{t}+\gamma\ \vec{\widehat{B}}_{t-1}$ . We call $\vec{\widehat{C}}$ the vector of cumulative discounted clicks initialized with the rule $\vec{\widehat{C}}_{0}=\vec{C}_{0}$ (and similarly for the vector of cumulative discounted budgets $\vec{\widehat{B}}$ ). Here, $\gamma\in[0,1]$ controls the importance of past data in the estimation of the quality of the media object: When $\gamma=$ 0 we have no memory and $\vec{\widehat{C}}_{t}=\vec{C}_{t}$ (same for the budgets); we are back in the situation represented by Eq. (5). On the other hand, $\gamma=$ 1 implies that the data collected at time $t_{0}$ is considered relevant for all $t>t_{0}$ and is never to be forgotten. This is desirable only when the quality is guaranteed to be constant. Since this is not the case, we use a $\gamma<$ 1 to slowly forget data that is no longer relevant. We can fix the exact value of $\gamma$ by choosing a time scale for our campaign. If, for example, we want to forget data that is $n$ time-steps old, we solve the equation $\gamma^{n}=\epsilon$ , where $\epsilon$ is a small value at our choice. In the case $\epsilon=1/e$ and $n=$ 7, we obtain $\gamma=e^{-1/n}\approx$ 0.87.

4.1.4 The regularization parameter

We have said that an important feature of our algorithm is the relevance of the regularization parameter, that decides the trade-off between exploration and exploitation. Here, we explain what we chose in our simulations and why.

The regularization parameter that we used is defied as:

$\displaystyle\lambda_{t}=\eta\cdot K\cdot\gamma_{r}^{d(t)}$ (9)

where $\eta$ is a positive number that determines the exploration-exploitation trade-off (in our simulations it is set to 1), $K$ is the number of media objects, $\gamma_{r}$ is another discount factor that determines when exploitation should dominate over exploration, and $d(t)$ is the number of days that have passed since the beginning of the campaign.

The interest in rescaling with $K$ comes from the advantage of keeping the term $\lambda_{t}*\vec{u}$ appearing in the gradient of the loss function independent from the number of media objects. (Remember that $\vec{u}$ is the uniform distribution, whose elements are all $K^{-1}$ .). This grants a comparable greediness (measured for example as the KL-divergence from the uniform distribution, see Eq. (28)) when running on campaigns with vastly different number of media objects.

The presence of the term $\gamma_{r}^{d(t)}$ stems instead from the advantage of having a larger exploration at the beginning of the campaign and a larger exploitation towards the end, where we want to monetize the knowledge we have acquired. The numerical value of $\gamma_{r}$ is determined in the same way as for the discount factor in the quality vector, just using a different time-scale. If, for example, we want to keep a large exploration for 20 out of the 30 days of the advertising campaign, we would fix $\gamma_{r}=1-1/20=$ 0.95.

4.2 SKOTT: Base bid setting

In this section, we present the algorithm that dynamically changes the base bid of a media object. The base bid of a media object represents a sort of default value that is adjusted by the DSP depending on how valuable it deems a certain piece of inventory for said media object. Many DSPs make this adjustment by multiplying the base bid by a score calculated from data about a specific item of inventory. Typically, however, the base bid will represent the average bid offered during the campaign.

Clearly, a high base bid will lead to chronic overbidding. This is indicated by the fact that the average cost per impression is significantly below the bid, assuming that the inventory is priced based on the second highest bid (as is overwhelmingly the case, cf. [25]). Overbidding is very risky because it might lead to a very large expense on non-valuable impressions if another player in the market is making the same mistake. Conversely, a low base bid can cause the inability of a media object to buy inventory deemed valuable by the DSP, leading to an under-utilized budget.

Still following the example for which our goal is to maximize the number of clicks, we write a vectorial loss function $\mathbfcal L_{t}=-\vec{C}_{t}$ where element of the vectors represents a different media object. This sets a multi-dimensional optimization problem for the number of clicks for all media objects. For sake of simplicity, since all the bids act independently, we will work only on a single media object and use everywhere scalar quantities. Notice that we do not put any regularization parameter, because we do not put any constraints on the base bids, so far. Now, we need to express the number of clicks as a function of the bids and then use gradient descent to maximize the function. In practice, the function that we use is defined as:

$\displaystyle\mathcal{L}_{t}(b_{t})=-C_{t}(b_{t})=-\frac{S_{t}(b_{t})}{CPC_{t}% (b_{t})}$ (10)

where $S_{t}(b_{t})$ , and $CPC_{t}(b_{t})$ are, respectively, the total amount of money spent and the resulting Cost Per Click in the previous epoch as a function of the base bid, and the division is calculated element-wise. In line of principle, the loss function should maximize the amount of money spent while decreasing the CPC of each inventory piece. These two objectives are contrasting: to spend more money one should increase the base bid to have access to more inventory, but to buy cheap clicks one should reduce the base bid.

In the following, we analyze separately the functions that appear on the right hand side of Eq. (10), starting with the CPC. The full result is presented at the end of the section and is also resumed in Fig. 4.

Figure 4.

Algorithm for setting the base bid.

Figure 5.

One iteration of Nadam, adapted from [6].

4.2.1 The analysis of the CPC

The CPC can be rewritten as:

$\displaystyle\text{CPC}_{t}=\frac{\text{CPM}_{t}}{1\,000\ \text{CTR}_{t}}$ (11)

where CPM is the average Cost Per Mille, that is, the average price to pay for a thousand impressions, and CTR is the Click-Through Rate that we have already introduced. First of all, we assume the CTR to be independent of the base bid and we estimate it from the market data as:

$\displaystyle\text{CTR}_{t}=\frac{C_{t}}{N_{t}}$ (12)

where $N_{t}$ is the number of impressions bought and $C_{t}$ are the clicks. The assumption of independence is justified by the fact that, if the media object filters are accurately set, all elements that are accessible by a single media object should be equally valuable. Also, a correlation between CTR and bid would mean that there is a general consensus on what is the most promising impression to buy no matter the campaign advertisers are running. The truth, as usual, lies in the middle: There is a certain correlation between CTR and bid, but in absence of better methods to estimate it we neglect it. Introducing it in later improvements of the method will only require adding a term to the bid loss function.

We now have to find the CPM. Following [27], let us assume that the probability of winning an auction with a base bid of $b$ is given by an expression of the form:

$\displaystyle P(b)=\frac{b}{b+\beta}$ (13)

where $\beta$ is the median winning bid over all the inventory (since bidding $\beta$ gives a 50% probability of winning). We can define a probability density function as:

$\displaystyle p(b)=\frac{dP(b)}{db}=\frac{\beta}{\left(b+\beta\right)^{2}}$ (14)

that gives the percentage of inventory whose winning bid is exactly $b$ .

RTB auctions often employ a second-price model to enforce truthful bidding [23, 17, 7]. In such a situation, and remembering that the bid is typically expressed in total offer per thousand impressions, the CPM is given by the total money spent divided by the total number of impressions bought:

$\displaystyle\text{CPM}_{t}=\frac{\int_{0}^{b_{t}}I_{\text{tot},t}\,x\,p(x)\,% dx}{\int_{0}^{b_{t}}I_{\text{tot},t}\,p(x)\,dx}=\beta_{t}\left[\left(1+\frac{% \beta_{t}}{b_{t}}\right)\ln\left(1+\frac{b_{t}}{\beta_{t}}\right)-1\right]$ (15)

where $I_{\text{tot},t}$ is the total amount of inventory that would be available with an infinite bid at epoch $t$ .

We can notice the logarithmic increase of the average CPM at infinite bids representing competitors placing extremely high bids to acquire inventory, a strategy that gets rarer and rarer with increasing bids. From Eq. (15) we can estimate the value of the parameter $\beta_{t}$ by comparing the estimated value of the CPM with the actual CPM returned from the market during that epoch.

The CPC as a function of all the basic quantities of the problem then reads:

$\displaystyle\text{CPC}_{t}=\frac{N_{t}}{1\,000\ C_{t}}\,\cdot\beta_{t}\left[% \left(1+\frac{\beta_{t}}{b_{t}}\right)\ln\left(1+\frac{b_{t}}{\beta_{t}}\right% )-1\right]$ (16)

To obtain the derivative of the CPC part of the loss function we thus need only derive the CPM in Eq. (15). Calculations lead to:

$\displaystyle\frac{d\text{CPM}_{t}}{db_{t}}=\frac{\beta_{t}}{b_{t}}\left[1-% \frac{\beta_{t}}{b_{t}}\ln\left(1+\frac{b_{t}}{\beta_{t}}\right)\right]$ (17) $\displaystyle\frac{d\text{CPC}_{t}}{db_{t}}=\frac{N_{t}}{1\,000\ C_{t}}\cdot% \frac{\beta_{t}}{b_{t}}\left[1-\frac{\beta_{t}}{b_{t}}\ln\left(1+\frac{b_{t}}{% \beta_{t}}\right)\right]$ (18)

4.2.2 The analysis of the amount of money spent

In Eq. (13) we have made an assumption about the probability of winning an auction based on the base bid $b_{t}$ that is well evidenced. We can try to leverage this assumption to find a relationship between $b_{t}$ and $S_{t}$ . Let us divide our discussion in two parts: The case of under-delivery and the case of correct delivery.

In the case of under-delivery, a media object buys the entire inventory that is available to it (because if more was available, it would buy it with the remaining money). This quantity can be estimated as:

$\displaystyle N_{t}(b_{t})=I_{\text{tot},t}\cdot P(b_{t})$ (19)

where $I_{\text{tot},t}$ is the total amount of inventory that would be available with an infinite bid at the epoch $t$ . The total money spent is then given exactly by:

$\displaystyle S_{t}=\frac{I_{\text{tot},t}}{1\,000}\ \int_{0}^{b_{t}}p(x)\,x\,% dx=\frac{I_{\text{tot},t}}{1\,000}\ \beta_{t}\left[\ln\left(1+\frac{b_{t}}{% \beta_{t}}\right)-\frac{b_{t}}{b_{t}+\beta_{t}}\right]$ (20)

where the factor 1 000 comes from the fact that the bid are expressed in offer per thousand impressions. The derivative of Eq. (20) with respect to the bid is given by:

$\displaystyle\frac{dS_{t}}{db_{t}}=\frac{I_{\text{tot},t}}{1\,000}\ \frac{b_{t% }\cdot\beta_{t}}{\left(b_{t}+\beta_{t}\right)^{2}}=\frac{N_{t}}{1\,000}\frac{% \beta_{t}}{b_{t}+\beta_{t}}$ (21)

We could have found the first equality also applying the fundamental theorem of calculus to Eq. (20), while the second equality comes from the substitution $I_{\text{tot},t}=N_{t}(b_{t})/P(b_{t})$ (see Eq. (19)) which does not depend on the base bid $b_{t}$ because the dependences of $N_{t}$ and $P$ cancel each other.

In case of good delivery, instead, some pieces of inventory are not bought by the media object. A change in the base bid would most probably modify the number of such pieces of inventory but won’t change the total amount of money spent. Therefore, in this case, $S_{t}$ is constant with respect to $b_{t}$ and its derivative is 0.

In order to discriminate between the two delivery regimes, we use a Heaviside step function $\theta(\tau-\frac{S_{t}}{B_{t}})$ , where $\tau$ is an under-delivery threshold, typically set to 0.95 and not to 1 because a small amount of under-delivery is inherent to the discreteness of the problem.

4.2.3 Proposed loss function and gradient

We can now give the gradient with respect to the bids of the loss function $\mathcal{L}_{t}$ proposed in Eq. (10). It reads:

$\displaystyle\nabla\mathcal{L}_{t}(b_{t})=-C_{t}\left(\frac{1}{S_{t}}\frac{dS_% {t}}{db_{t}}-\frac{1}{\text{CPC}_{t}}\frac{d\text{CPC}_{t}}{db_{t}}\right)$ (22)

which, with the results found so far, becomes

$\displaystyle\nabla\mathcal{L}_{t}(b_{t})=-C_{t}\cdot\frac{N_{t}}{1\,000\ S_{t% }}\times\left\{\frac{\beta_{t}}{b_{t}+\beta_{t}}\ \theta\!\left(\tau-\frac{S_{% t}}{B_{t}}\right)-\frac{\beta_{t}}{b_{t}}\left[1-\frac{\beta_{t}}{b_{t}}\ln% \left(1+\frac{b_{t}}{\beta_{t}}\right)\right]\right\}$ (23)

We notice that this equation is always well-defined, except when $S_{t}=$ 0. This can happen in two situations: either there is no budget assigned to the strategy, in which case we impose no changes to be made since they would have no effect anyway; or there is a budget assigned but the strategy doesn’t manage to spend anything, in which case we are probably seriously underbidding and we fix the value of the gradient to be negative.

With this loss function, we perform a Nadam gradient descent [6, 13] and then bound the result to be in between a minimal and a maximal bid set by the client. Unlike in budget partitioning, we choose an additive gradient descent because we don’t need any normalization.

As a last remark on the base bid setting, there is currently a resurgence in first price auctions. Our method is still applicable even in this situation, provided a few changes are made to the form of the equations. In particular, Eqs (15) and (20) would read respectively:

$\displaystyle\text{CPM}_{t}=b_{t}$ (24) $\displaystyle S_{t}=\frac{I_{\text{tot},t}}{1\,000}\ b_{t}\int_{0}^{b_{t}}p(x)% \,dx$ (25)

giving rise to different, but nevertheless well-defined update rules.

Recently, another research paper that deals with the bidding algorithm was published [20]. While there are similarities between their approach and ours, we chose to maximize directly the number of clicks instead of defining another utility function that needs other hyperparameters such as the monetary value of each click. Moreover, the method we propose in this paper does not need to have one data point per impression (an information that we assume is not at our disposal) but only the average over a certain period of time.

4.3 SKOTT: Pacing control

The third and last sub-routine is the one that controls the delivery ratio. It checks that the total amount of money spent in the campaign so far follows the desired profile. If that’s not the case, it increases the total budget available for the next epoch. Notice that our goal is not to determine what is the best delivery profile of the campaign over time, but only to stick to it as well as possible. This sub-routine is the simplest one since it only sets a single scalar parameter, unlike the previous two who sets a vectorial one.

Figure 6.

Algorithm for pacing control.

Typically, advertisers want to control exactly how much money they spend during the campaign. For example, the simplest delivery profile is the uniform one, where the ideal amount of money spent until $t$ is equal to the total budget of the campaign times the fraction of the campaign that has elapsed already. However, the money that was really spent on the market doesn’t always correspond to the desired amount: unforeseen technical issues, fluctuations in the available inventory, and sudden changes in the properties of the market can all contribute to a variation in the amount of money spent, typically resulting in under-delivery.

Before the budget partitioning and base bid setting sub-routines can react to the under-delivery and adapt their parameters, the actual delivery of the campaign will have lost ground to the ideal one. It is desirable then to take some measures in order to catch up with the ideal spent as soon as possible.

The hourly budget $\mathcal{B}_{t+1}$ set by the algorithm looks like this:

$\displaystyle\mathcal{B}_{t+1}=\bar{\mathcal{B}}_{t+1}+\Delta\mathcal{B}_{t+1}$ (26) $\displaystyle\Delta\mathcal{B}_{t+1}=\frac{\eta}{T-t}\left(\bar{\mathcal{S}_{t% }}-\mathcal{S}_{t}\right)$ (27)

where $\bar{\mathcal{B}}_{t+1}$ is the ideal hourly budget, $\bar{\mathcal{S}}_{t}$ and $\mathcal{S}_{t}$ are respectively the ideal and actual amount of money spent until epoch $t$ , $T-t$ is the number of epochs left, and $1\leqslant\eta\leqslant T-t$ is the aggressiveness parameter. If the aggressiveness is set to 1, the algorithm tries to evenly spread the correction over the rest of the campaign. Surprisingly, this is not good: the reason is that a small amount of under-delivery is very common and it won’t be contrasted fast enough, imposing a money rush toward the end of the campaign. Also, we typically want to regain the ideal spend curve at a higher speed. However, too large a value of aggressiveness is not desirable either, because it could mean a very large sudden injection of money, possibly reducing the quality of our inventory and breaking the simple assumptions we had to make to construct a model. We typically choose values between 2 and 20, while at the end of the campaign $\eta$ will be equal to 1.

A schematic view of the algorithm is presented in Fig. 6.

5. Experimental results

We tested our algorithm on a simulated environment. (More on the characteristics of the market simulator in Section A.) We will show the results as follows: First we will compare different budget partitioning algorithms while keeping no optimization on the base bids or on the pacing. Then, we will compare base bid setting algorithms on top of the budget partitioning we presented in this paper. Finally, we will show the advantages of introducing the pacing control algorithm on top of the budget partitioning and base bid setting algorithm we chose.

5.1 The comparison of budget partitioning algorithms

We compared our algorithm to three other algorithms: (1) A vanilla algorithm (codenamed vnl) that does absolutely nothing. (2) A multi-armed bandit algorithm inspired by [1, 3], codenamed mab. (3) A linear optimization algorithm inspired by [5], codenamed lop, that maximizes the clicks under the constraints of the total available budget and an interval of admitted budgets for every media object. More information about these algorithms can be found in Section D.

Table 1
Optimization results for budget partitioning

Algo	Spt	Clk	Cpc	Kld
vnl	91.2%	100.0%	0.990	0.000
mab	91.0%	102.6%	0.979	0.005
lop	84.5%	137.7%	0.664	0.503
skt1	96.2%	132.5%	0.785	0.095

Figure 7.

The evolution of the budget partitioning for 10 media objects, according to the four different algorithms we test, taken over a single run of the optimization. The solid lines show a smoothed version of the dotted lines for better visualization.

Figure 8.

The four metrics we use to evaluate the algorithms: money spent, clicks obtained, CPC, and divergence from the uniform distribution as obtained from an average of 20 optimizations on different randomly chosen starting points. The solid lines show a smoothed version of the dotted lines for better visualization.

Figure 7 gives the comparisons of these algorithms. We present a simulation with day parting, i.e., with a different algorithm running for each hour of the day. The total number of epochs is 30, the number of days in a month. The first thing we want to point out is that lop is quite greedy, as we expected, while mab is almost like vnl. This is due to the fact that only one media object per epoch is updated, giving just 30 small kicks to the initial situation. Our proposed algorithm, skt1, seems to strike in between: It moves quickly without becoming greedy.

To quantify this result, we measure greediness by calculating the KL-divergence [15] of the proposed budget repartition with respect to the uniform distribution $\vec{u}$ . The KL-divergence is a widely used method: for example, in reinforcement learning, it measures the distance between two policies, i.e., two different courses of action that optimize a given reward [2, 22, 19]. It is defined as:

$\displaystyle\Delta(\vec{w},\vec{u})=\sum_{i}w_{i}\log\left(\frac{w_{i}}{u_{i}% }\right)$ (28)

A value of $\Delta(\vec{w},\vec{u})=$ 0 means that the distribution of the weights is exactly the same as $\vec{u}$ . On the other hand, the maximal value is obtained by a greedy distribution $\vec{w}_{g}^{(i)}$ where one of the elements is 1 and all the others are 0. In this case, the KL-divergence measures $\log(K)$ . The values in the lower right plot of Fig. 8 are values of KL-divergence rescaled by a factor $\log(K)$ , so that they are always constrained between 0 and 1 independently of the number of media objects.

These qualitative discussions find their quantitative conclusions in Fig. 8 and in Table 1: The first column of numerical values (spt) represents the percentage of the initial budget that was spent, the second (clk) the additional clicks in percent with respect to the vnl algorithm, the third (cpc) the total CPC of the campaign, and the fourth (kld) the distance of the budget repartition from the uniform distribution.

If one considers only the total number of assigned clicks as the metric to measure the performance of the algorithms, lop wins over skt1 by a small margin. Also, its total calculated CPC is slightly lower, meaning that every click costs less money. However, on the bottom right panel of Fig. 8, we can see how greedy lop is. This reflects on the total amount of money spent on the top left panel: if the desired media object doesn’t have enough available inventory, this algorithm can not react decisively. Nothing grants us that this situation won’t happen in real life with even more damaging results, in particular, a severe under-delivery of the budget. On the contrary, skt1 manages to spend almost all the available budget (represented by the purple line) even without an explicit optimization on the bid and the pacing. The increased adaptability makes it more resistant to the real market test and thus more valuable.

5.2 The rest of the analysis

We now choose the skt1 algorithm for the budget partitioning and study the effect of different base bid setters. This time, we compare the new skt2 algorithm to other two algorithms: again vnl where bids are not changed, and also pst, an algorithm that uses a predetermined set of rules. We also add in the analysis the comparison of skt2 on top of skt1 with the full SKOTT algorithm, which also contains the pacing control sub-routine.

Let us explain first a little bit about pst. The predetermined set of rules analyzes the CPC first: if it is higher than a certain goal set at the beginning of the campaign, the bid is reduced by a certain fixed multiplicative constant unless the media object is under-delivering. In that case pst will still try to slightly increase the bid. We can see many inconveniences in this approach compared to the algorithm we presented in Section 4.2: first of all we need an additional external parameter, that is, the goal CPC. Furthermore, it makes very little use of the data from the market, reducing the adaptability.

Figure 9.

The evolution of the base bid for 10 media objects, according to the four different algorithms we test, taken over a single run of the optimization. The solid lines show a smoothed version of the dotted lines for better visualization.

Table 2

Optimization results for base bid and pacing

Algo	Spt	Clk	Cpc	Kld
skt1	96.2%	100.0%	0.785	0.095
skt1 $+$ pst	90.2%	136.9%	0.538	0.071
skt1 $+$ skt2	84.8%	231.0%	0.300	0.086
skt1 $+$ skt2 $+$ skt3	99.8%	251.6%	0.324	0.085

Figure 10.

Same as Fig. 8, but for the base bid setting algorithms. The solid lines show a smoothed version of the dotted lines for better visualization.

These limitations have an effect on the results, as can be seen from Fig. 10 and from Table 2. Differently from before, the baseline for the column (clk) is now skt1 and not vnl. The skt2 algorithm manages to outperform both vnl and pst by a vast amount, obtaining a larger number of clicks while spending less money.

From the top left corner of Fig. 10 one can see that the amount of money spent oscillates quite a bit at the beginning of our proposed algorithm. We see this stems from a similar oscillation in the algorithm’s attempts to find the appropriate bids, as can be seen from Fig. 9: the peaks of money spent correspond to bids slightly higher than the optimal and vice versa. Finally, we see that adding the third and last part of the algorithm manages to spend almost the entire initial budget, obtaining a small increase in clicks at the expense of a slightly higher CPC.

It is interesting to notice that the last plot, containing the KL-divergence, shows that the budget repartition changes slightly depending on which bidding algorithm we choose. This is understandable because, by changing the base bid, we actually modify the perceived quality of the media object and thus generate different inputs for the different iterations of the budget partitioning algorithm. However, these modifications are small enough to be neglected at first order.

6. Conclusion

We have introduced a method for advertisers to optimize the management of Demand Side Platforms when running an advertising campaign composed of many separate media objects. The method, that we call SKOTT algorithm, is an iterative method that makes only a few general assumptions on the mathematical model of the market. We present it here applied to a campaign for the optimization of the number of generated clicks.

The SKOTT algorithm is composed of three complementary parts: Firstly, the best partitioning for the budget across all media objects is calculated. This is achieved by estimating the quality of each media object and trying to obtain the maximum number of clicks through an exponentiated gradient descent method. Second, the best base bid for each media object is calculated. Here we use the assumption and corresponding evidence from [27] that relates the bid to the probability of winning the corresponding auction. We expand on this assumption to propose a model relating variation in bids to variation in the number of clicks obtained by each media object. We finally apply a Nadam technique [6, 13] on the market data to find the best base bids for maximizing the number of clicks. The third and last part determines the amount of budget to use at every epoch in order to stay as close as possible to the desired spend profile.

The proposed algorithm has been tested on a simulated environment that we created for the occasion and that we present in the appendices. Under these circumstances, the proposed algorithm gives impressive results, more than doubling the total amount of obtained clicks in the considered experiments.

Footnotes

Acknowledgments

We thank Soufian Aboulfaouz for numerous discussions (mainly concerning the market analysis as well as the mab and lop algorithms), and especially, Yiming Wu for carefully reading and commenting the manuscript. We also want to express our appreciation to MediaMath for allowing us to use their platforms and for their technical assistance.

Appendix

Model of the market

We created a back-test platform for analyzing different campaign management algorithms. The work-flow of the platform is conceptually divided in five steps:

Parameters are chosen that describe the problem we have at hand.

Data is created using these parameters.

Loss functions are chosen.

All algorithms are launched independently. They work on the same data and their goal is to maximize the total number obtained during the campaign.

The results of the different algorithms are compared: we plot budget repartition, base bids, greediness of the algorithms, cumulative CPC, spend profiles, and collected clicks over time.

In this section, we will discuss the first two steps that relate to the creation of the back-test platform itself. Steps 3 and 4 are discussed in Sections 4.1–4.3 of the main body of the article, while some plots of the results are presented in Section 5.

Dealing with time variations

As already mentioned in Section A, there can be variations in the quality of media objects with time. While the causes are various, the main effect is the double periodicity induced by the day-night cycle and the weekly cycle [24]. In particular, there is a big drop in the volume of impressions dealt and the number of clicks at night that often leads advertisers to forgo buying impressions at these times.

But variations are not always periodic, nor globally affecting all the media objects at the same time. A change in the relative quality of the media objects can happen over time due to external factors. A typical example could be a media object advertising a live event: the distance in time from the day of the event is an important parameter for users to decide whether to buy a ticket.

Finally, some changes might be due to correlations not considered in our model. For example, a change in the base bid that we offer during auctions might lead to a modification of the CTR of the impressions we are able to buy.

The solution we take in order to deal with such issues is to launch 24 different algorithms, one for every hour. The advantage is clear: in case some media objects are turned off at a certain moment of the day, they won’t affect the perceived quality of the media object during other hours. However, there are obvious disadvantages as well: we discard data that might still give valuable information and convergence is 24 times slower.

For the aperiodic modifications over time, we want to have fast responses to the changes in the market. Since information on the changes is obtained through the purchase of impressions, we try to keep the algorithm as far from greedy as possible while still increasing the number of obtained clicks.

Data pre-processing

In the real world, when we receive the data, there are often missing values that need to be filled before starting the optimization. Here, we fill the missing values using a combination of three different approaches: a) backward filling; where we propagate the next valid observation, b) linear interpolation method; which is a method of curve fitting using linear polynomials to construct the missing values, and c) weighted moving averaging approach; which is an averaging that has multiplying factors to give different weights to data points at different positions.

Let x = [ $x_{1}$ , $x_{2}$ , nan, $x_{4}$ , …, $x_{t}$ , nan, …, nan] be of assumed length $\tau$ . As one can see, there are some nans before epoch $t$ , and a series of nans between $t$ and $\tau$ . In the case of having some missing values at the beginning of the vector, we fill the nans using the backward filling method. For instance, [nan, $x_{i}$ , $x_{j}$ ] gives [ $x_{i}$ , $x_{i}$ , $x_{j}$ ].

Next, we fill the nans up to epoch $t$ using a linear interpolate method. As a result, there will be no missing values between the epochs 1 and $t$ .

Last, we fill the nans between the epoch $t$ and $\tau$ using a weighted moving averaging approach. Let us consider x’ = [ $x_{1}$ , …, $x_{t}$ , nan, …, nan], where for all $1\leqslant i\leqslant t$ , $x_{i}\neq$ nan, and for all $t+1\leqslant i\leqslant\tau$ , $x_{i}=$ nan. To do so, we consider a weight vector in the filling process. In an $t$ -index weighted moving average, the latest data point has weight $t$ , the second latest $t-1,\dots$ , terminating at one. Therefore, the estimation of data point $t+1$ is defined as:

(31) $\displaystyle x_{t+1}=\frac{(1\cdot x_{1}+2\cdot x_{2}+...+t\cdot x_{t})}{(t% \cdot(t+1)/2)}$

All the missing values $x_{i}$ for $t+1\leqslant i\leqslant\tau$ are filled using Eq. (31).

A quick glance at the competing algorithms

In Section 5 we mention three algorithms that we use as a comparison against our own. The vnl algorithm needs no explanation, because it corresponds to a non-optimized campaign in which the initial parameters are kept constant. On the other hand, mab and lop deserve a few words, which we will spend in the next paragraphs.

References

Auer

Cesa-Bianchi

Freund

and Schapire

R.E.

, The non-stochastic multi-armed bandit problem, SIAM J Comput 32(1) (2002), 48–77.

Auer

and Ortner

, UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem, Periodica Mathematica Hungarica 61(1) (2010), 55–65.

Besbes

Gur

and Zeevi

, Stochastic multi-armed-bandit problem with non-stationary rewards, Advances in Neural Information Processing Systems (NIPS), 2014, pp. 199–207.

Cai

Ren

Zhang

Malialis

Wang

and Guo

, Real-time bidding by reinforcement learning in display advertising, Proceedings of the Tenth ACM International Conference on Web Search and Data Mining – WSDM ’17, 2017, pp. 661–670.

Chen

Berkhin

Anderson

and Devanur

N.R.

, Real-time bidding algorithms for performance-based display ad allocation, In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining – KDD ’11, New York, New York, USA, 2011. ACM Press, p. 1307.

Dozat

, Incorporating nesterov momentum into adam, ICLR Workshop (1) (2016), 2013–2016.

Edelman

Ostrovsky

and Schwarz

, Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords, American Economic Review 97(1) (2007), 242–259.

Feldman

Muthukrishnan

Pal

and Stein

, Budget Optimization in Search-Based Advertising Auctions, In Proceedings of the 8th ACM conference on Electronic commerce – EC ’07, New York, New York, USA, 2006. ACM Press, p. 40.

Grigas

Lobos

Wen

and Lee

K.-C.

, Profit Maximization for Online Advertising Demand-Side Platforms, 2017.

10.

Gusev

Kroujiline

and Govorkov

, Sell the news? A news-driven model of the stock market, Academia (2014), 1–65.

11.

IAB, IAB internet advertising revenue report, Technical report, Interactive Advertising Bureau (IAB); PwC, 2017.

12.

Karande

Mehta

and Srikant

, Optimizing budget constrained spend in search advertising, In Proceedings of the sixth ACM international conference on Web search and data mining – WSDM ’13, New York, New York, USA, 2013. ACM Press, p. 697.

13.

Kingma

D.P.

and Ba

, Adam: A method for stochastic optimization, ICLR, 2014, pp. 1–15.

14.

Kivinen

and Warmuth

, Exponentiated gradient versus gradient descent for linear predictors, Information and Computation 132(1163) (1997), 1–63.

15.

Kullback

and Leibler

R.A.

, On information and sufficiency, The Annals of Mathematical Statistics 22(1) (Mar 1951), 79–86.

16.

Liu

and Wang

, Dual Based DSP Bidding Strategy and its Application, July 2017.

17.

Myerson

R.B.

, Optimal auction design, Mathematics of Operations Research 6(1) (1981), 58–73.

18.

Pietersz

and Pelsser

, A comparison of single factor Markov-functional and multi factor market models, Review of Derivatives Research 13(3) (2010), 245–272.

19.

Plappert

Houthooft

Dhariwal

Sidor

Chen

R.Y.

Chen

Asfour

Abbeel

and Openai

M.A.

, Parameter space noise for exploration, arXiv, 2017.

20.

Ren

Zhang

Chang

Rong

and Wang

, Bidding machine: learning to bid for directly optimizing profits in display advertising, IEEE Transactions on Knowledge and Data Engineering 30(4) (Apr 2018), 645–659.

21.

Richardson

Dominowska

and Ragno

, Predicting clicks, In Proceedings of the 16th international conference on World Wide Web – WWW ’07, 2007, p. 521.

22.

Schulman

Wolski

and Dhariwal

, Proximal policy optimization algorithms background: policy optimization, pp. 1–10.

23.

Vickrey

, Counterspeculation, auctions, and competitive sealed tenders, The Journal of Finance 16(1) (1961), 8–37.

24.

Yuan

Wang

and Zhao

, Real-time bidding for online advertising, Proceedings of the Seventh International Workshop on Data Mining for Online Advertising, 2013, pp. 1–8.

25.

Yuan

Wang

and Qin

, A survey on real time bidding advertising, In Service Operations and Logistics, and Informatics (SOLI), 2014 IEEE International Conference on, IEEE, 2014, pp. 418–423.

26.

Zhang

and Huang

, A quantum model for the stock market, Physica A: Statistical Mechanics and its Applications 389(24) (2010), 5769–5775.

27.

Zhang

Yuan

and Wang

, Optimal real-time bidding for display advertising, Proceedings of the 20th ACM SIGKDD, 2014, pp. 1077–1086.

28.

Zhang

Dai

Feng

Wang

Bian

Wang

and Liu

T.-Y.

, Sequential click prediction for sponsored search with recurrent neural networks, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Apr 2014, pp. 1369–1375.

A new optimization layer for real-time bidding advertising campaigns

Abstract

Keywords

1. Introduction

3. Problem statement

4. The design choices: SKOTT

5.1 The comparison of budget partitioning algorithms

Table 1 Optimization results for budget partitioning

Footnotes

Acknowledgments

Appendix

Model of the market

Dealing with time variations

Data pre-processing

A quick glance at the competing algorithms

References

Table 1
Optimization results for budget partitioning