Abstract
Optimal control models for limit order trading often assume that the underlying asset price is a Brownian motion since they deal with relatively short time scales. The resulting optimal bid and ask limit order prices tend to track the underlying price as one might expect. This is indeed the case with the model of Avellaneda and Stoikov [Quantitative Finance
Introduction
Limit orders play an essential role in today’s financial markets. How to optimally submit limit orders has therefore become an important research area. Limit order traders set the price of their orders, and the market determines how fast their orders are executed. Avellaneda and Stoikov proposed a stochastic control model [1] for a single limit order trader that optimizes an expected terminal utility of portfolio wealth. In this model, market orders are given by a Poisson flow with rate
The approach of Avellaneda and Stoikov has been analyzed and extended in [2,4,9–11,20]. In this paper, we use the same optimal control problem, but we are interested in longer time scales. On a short time scale, the reference price can be modeled by a Brownian motion as seems appropriate in high frequency trading. On a longer time scale corresponding to intermediate trading frequency, we may assume a mean reverting reference price modeled by an Ornstein–Uhlenbeck (OU) process. Reviews of mean reverting behavior in equity markets and associated time scales are presented in [6,12].
In this paper, we present a numerical study of the long-time limit of the optimal limit order prices in the Avellaneda and Stoikov model with an OU price process. In addition, we study analytically the equilibrium value function of the optimal control problem. Long time behavior of a limit order control problem is studied by Gueant, Lehalle, and Fernandez-Tapia [10]. They use the Avellaneda and Stoikov model but with a Brownian motion price process instead of a mean reverting one. They impose inventory limits which, after some transformations, reduce the problem to a finite-dimensional system of ordinary differential equations. They show that the optimal spreads converge to inventory-dependent limits when time is far away from terminal. Zhang [20] and Fodra and Labadie [4,5] also study the Avellaneda and Stoikov model with an OU price process, although they do not consider the long time limit of the trader’s optimal strategy. Fodra and Labadie analyze the case where the reference price is away from its long term mean. The trader then anticipates and takes advantage of the tendency of the price to go back to the long term mean. In this paper, we are interested in how the trader would behave if he/she expects that the reference price is likely to oscillate around its long term mean for a relatively long time. We study the case in which the trading period consists of multiple mean reversion cycles of the reference price, while Fodra and Labadie [4] consider one or just a half of such a cycle.
Our main result is that the optimal limit order prices, instead of the optimal spreads, converge to limits that are independent of all the state variables in the model. This is shown numerically by two different computational methods. The limit value function is also studied analytically which confirms the accuracy and stability of our numerics. In addition, we observe numerically that the speed at which the optimal limit order prices become insensitive to the reference price is different from that of the inventory levels where the former converges much faster. When the trading period is sufficiently long, there are three stages in the optimal trading strategy:
Far from the terminal time, the trader uses constant limit order prices to generate profit with little concern for risk aversion or leftover inventory.
At intermediate times, the trader maintains inventory levels by posting limit orders that depend on inventory levels.
Near the terminal time, the trading behavior is mostly determined by the exponential utility function.
We observe that in certain parameter regimes when time is away from terminal by several mean reversion cycles of the reference price, the trader updates limit order prices only according to the change of inventory levels independent of the reference price. These changes become smaller as time moves backwards and are effectively zero when time is far away from the terminal time, in which case the trader posts constant limit order prices. Near the terminal time, the optimal limit order prices are affected by the long-term variance of the reference price and the exponential terminal utility function. We also observe that, with other parameters fixed, the optimal limit order prices converge to their long-term limits faster when the market has more liquidity, which in this model is controlled by parameters in the Poisson flow of orders.
When the trader posts constant limit order prices, then wealth accumulates from the difference between the buy-sell limit order prices instead of from the spread, namely the difference between these prices and the reference price. This strategy is somewhat analogous to a pairs trading strategy, and when the trading period is long enough, it appears to beat the strategy of tracking the reference price. However, by posting constant limit order prices, the trader gives up the ability to control the trading rate, which is determined entirely by the fluctuations of the reference price. As a result, the variance of the inventory is large which is not desirable towards the end of the trading period due to the terminal exponential utility. Therefore, before getting close to the end of the trading period, the trader needs to keep track of the reference price so as to control the trading flow and avoid a large leftover inventory.
By linearizing the exponential trading intensity, the Avellaneda and Stoikov model with an OU reference price is reduced to a model that can be solved analytically. This is done in Zhang [20] and also in Fodra and Labadie [4]. We compare our numerical solutions with the approximation in Zhang [20] and find good agreement when time is not too far away from terminal.
The structure of this paper is as follows: We first present the model in Section 2, then introduce the numerical methods used in Section 3. The numerical methods are discussed in more detail in the Appendix. In Sections 4 and 5, we discuss our results for the long-time behavior of the optimal limit order prices and compare them with what is expected analytically. We do not have a full analytical treatment of the long-time behavior of the HJB equation at present. However, in Section 6, we carry out an equilibrium analysis on the (time-independent) HJB equation and compare the analytical results obtained with those of our long-time numerical simulations. The result confirms the accuracy and stability of our numerical methods.
Settings
We assume that the reference price
The portfolio of the limit trader consists of two parts: cash and the risky asset. We denote the cash process by
Combining empirical results from econophysics in [7,8,13,16,19], Avellaneda and Stoikov proposed that the process of the fulfilled limit orders follows a doubly stochastic Poisson process with intensity
The trader aims to solve the optimal control problem
The parameters in our model are
Dynamic programming
Consider the value function
Because of the special form of the terminal utility, namely the CARA1
utility, it is known from the studies in Zhang [20] and Gueant, Lehalle, and Fernandez-Tapia [10] that the ansatz
We make a change of time
Note that (15) is highly nonlinear because of the appearance of value function v in the exponent. Moreover, this equation involves both continuous variables, t and s, and a discrete variable q. There is no available theory on its well-posedness. On the other hand, for the case
Scaling
We use two scalings for our model, one on time and another one on price:
The optimal feedback controls are given by
We point out that:
After scaling, the price-related quantities
The function v and variable s in (17) and (18) are actually
The optimal controls in (18) are the ones in (13) scaled by γ.
In the subsequent sections, when discussing how the parameters would affect the model, we will be referring to the new parameters after the scaling instead of those in (7). Note that even though we dropped two parameters, namely α and γ, we have not lost any generality after those two scalings. For a model in (4), (5), and (6) with an arbitrary group of parameters, we can solve a model with scaled parameters constructed in (16), then convert it to a solution of the original model before scalings.
We briefly discuss two numerical methods that we will use to solve the optimal stochastic control problem described in Section 2, particularly equation (17), and produce all the results discussed in the subsequent sections.
The first method is a fully-implicit finite difference scheme. This method has advantages of being relatively simple to implement and numerically stable. However, it can be slow due to the iteration required at each time steps. Secondly, we implement what is called a split-step scheme which performs the numerics separately between the linear and nonlinear part of the equation. We briefly describe the second method here and refer to the Appendix for more detail on both methods.
We consider the following transformation of the value function v in (17)
We split the PDE in (20) to two PDEs:
Here equation (22) can be solved via the Feymann–Kac formula, and equation (23) can be solved exactly using the method in Zhang [20] if we impose finite inventory limits for our problem, in which case the transformation
The feedback optimal limit prices produced by these two methods match very well if we discretize the time space and reference-price space properly. Compared to the finite difference method, the split-step method is much faster since there is no iteration involved. Moreover, the split-step used the Feymann–Kac formula dealing with the mean reversion feature in the model, which is fully implicit, stable, and suitable for observing the long time behavior. However, because of (19), the function
Long time behavior
Studying standard Avellaneda-Stoikov model, Gueant, Lehalle, and Fernandez-Tapia [10] observed a long-term stationary behavior of the optimal spreads
In our model, we observe a long-time behavior of the optimal limit order prices
Our numerical simulations presented in Section 5 indicate that the optimal feedback limit order prices given by
We have not yet developed an analytical proof of the convergence in (28) as this is work in progress. Note that
In the rest of this section, we discuss three closely related models that can be solved analytically and compare the limit of the optimal limit prices in those models with the ones in (27).
Model with constant reference price
In our case, the final limit of the optimal prices does not depend on the long-term standard deviation of the reference price. Instead, it uses the spread
Analysis of small κ
Both Fodra and Labadie [4] and Zhang [20] considered approximations of (15) with linearization. We briefly state Zhang’s results here.

We plot optimal ask limit order prices at different times from the model with small κ (
After a linearization of the exponential terms, Zhang shows analytically that the value function v becomes independent from s exponentially fast
We compare the optimal feedback ask limit prices computed by our numerical methods to those in the limit of Zhang’s approximation2 in Figure 1. We plot the feedback ask limit prices as functions of inventory q as they have already become insensitive to the reference price s. Translation is applied on those feedback optimal prices to make them comparable. We refer to Figure 1 for more detail.
When time is far away from the terminal time, the trader has little pressure from risk aversion rooted in the terminal exponential utility, so we expect the trading pattern in such a scenario to be similar to the one in the model with linear utility
In [4], Fodra and Labadie have considered this case and have obtained the analytical solution for the optimal prices. In this case, the optimal feedback limit order prices would converge exponentially fast to
Recall that the limits of the dimensionless optimal feedback prices in our model with exponential utility are
In the linear utility case, the constant strategy is almost optimal when
Numerical results
We apply the numerical methods described in Section 3 to solve the HJB equation (17) for the value function and optimal controls in our model.
Evolution of optimal feedback limit order prices
We are interested in how the optimal feedback controls in our model, namely the optimal limit order prices, evolve as a function of the inventory and reference price. As stated in Section 4, we observe that the optimal prices converge to constants in (27) when time is away from terminal. In addition, as shown in Figure 2, we observe that the optimal limit order prices become insensitive to the reference price much faster than to the inventory, which leads to an “intermediate” regime where the optimal limit order prices only respond to the change of inventory.

Feedback optimal ask limit order prices, from top to bottom, corresponding to 0, 1, 4 and 800 mean reversion cycles from the terminal time. The prices are the ones before the price-scaling described in Section 2.3 instead of the dimensionless ones after the scaling. Each line, as a function of reference price, corresponds to a value of inventory. The optimal ask prices have already become independent from the reference price at 4 mean reversion cycles from the terminal time (the 3rd plot from top), while it took 800 mean reversion cycles (backwards in time) to become independent from the inventory as well (the bottom plot). Here the 3rd plot from top corresponds to the intermediate regime and the bottom plot corresponds to the far-away-from-terminal regime.
For a model with unscaled parameters
the terminal time;
near-terminal regime: 1 mean reversion cycle of the reference price from the terminal time;
intermediate regime: 4 mean reversion cycles from the terminal time;
limit regime: 800 mean reversion cycles from the terminal time.
We can see that the near-terminal regime is short and very quickly, backwards in time, trading gets into the intermediate regime where the optimal prices become insensitive to the reference price. In the intermediate regime, the trader updates his limit order only according to the inventory. Doing so, he could keep the variance of inventory low which reflects his risk aversion rooted in the terminal utility.
In Figure 2, it takes 800 mean reversion cycles to observe the insensitivity of the optimal prices to the inventory, as shown in the bottom plot. Namely for a large portion of a trading period, the trader would post limit orders with prices only affected by the change of his own inventory ignoring the fluctuation of the reference.
In some parameter ranges, the intermediate regime can be very long and so we observe insensitivity of optimal limit prices to the reference price but do observe dependence on inventory. That is, in the very beginning of the trading period the intermediate regime is already valid, in which case we would not observe the limit regime at all. To illustrate this, we choose two sets of parameters and show the corresponding simulation results in the next section where within the trading period we can only observe the intermediate and near-terminal regimes but not the limit regime.
Note that even though in this paper we do not calibrate our parameters to real data, there is literature on how to do this. For the liquidity parameters A and κ, their calibration is studied in Chapter 4 of [3]. A calibration framework is presented, which can be extended to the model with a mean reverting reference price. For the parameters α and σ characterizing the mean reverting reference price, which we assume is observed, we can calibrate them by a maximal likelihood estimation (MLE) studied in [14,18].
We show some simulation results of our trading models in Figure 3 and Figure 4. The unscaled parameters used in Figure 3 are

Simulation results for limit order prices, inventory, and spreads for 10 mean reversion cycles of the underlying reference price. The pattern clearly shows that near the terminal time, the trader tracks the reference price closely whereas in the intermediate regime, the optimal limit prices effectively only respond to the change of inventory.

Similar plots to Figure 3, but with larger parameter A representing greater volume of incoming market orders. When there is a trend in the reference price, for instance, between time 2 and 4, there will also be a trend in optimal prices in the same direction but with a lag. The trend in optimal prices is a result of the trend in the inventory formed during a trend of the reference price when the volume of incoming market orders is large.
Figure 3 shows a simulation result for 10 mean reversion cycles of reference price. Between time 0 and 8, the trading is in the intermediate regime in the sense that the optimal limit order prices will remain almost constant when no limit order is taken and will jump when the inventory changes.
In Figure 4, the model has the same parameters except that the market-order-volume parameter A is greater. In the top plot of Figure 4, while it seems that the optimal prices are tracking the reference price, however, a closer look shows that the pattern in this plot is essentially the same as the pattern in the top plot of Figure 3. That is, the limit order prices effectively respond only to the change of inventory and ignore the fluctuation of the reference price, which suggests that we are in the “intermediate regime.” For instance, between time 2 and 3 in the top plot of Figure 4, there is a significant drop of the reference price, but the limit prices does not drop accordingly. They begin to decrease only after the inventory increases. In this case, parameter A is sufficiently large that enough limit orders will be taken in one trend of price, which builds up a trend in the inventory and in turn creates a trend in the optimal limit order prices. This explains why on first sight, the limit order prices follow the same trend as the reference price, and why there is a lag between the trend of the reference price and that of the limit order prices.
When the model moves from the near-terminal regime to the intermediate regime, the sensitivity of the optimal prices to the inventory is mainly affected by the scaled σ. We observed that the greater the scaled σ is, the greater the jump size of the optimal prices is when the inventory changes by one unit. The magnitude of a jump decays to 0 as time goes backwards, with the decay rate affected by A and κ; for greater values of A and κ, the jump size decays faster. Note that, larger values of A and κ means a larger market order flow and a shallower order book respectively. These properties signify higher liquidity in the market. So one insight we can gain from this model is that, for a limit order trader trading a liquid asset with mean-reverting price, his optimal limit prices converge faster backwards in time than they do in the case where he trades a less liquid asset, and therefore his optimal limit prices are less sensitive to the change of inventory.
Recall that here the parameters are the ones after scalings described in Section 2.3, so A, κ, and
To check whether our numerical solution of the system in (17) is still valid even when time is far away from terminal, we analytically consider the equilibrium of that system and compare the result with our numerical solution of the time-dependent system.
As described in Section 4, our conjecture is that, for a solution v of the PDE (17) and any q and s,
Moreover, recall that Zhang [20] has obtained a closed form solution of the linearized model with small κ. We can compare the limit of a value function in our model in (37) and the one in Zhang’s small κ analysis in (30). In both equations, when
Note that the limit in (37) is indeed a solution of the HJB equation in (17), but it does not satisfy the initial condition. Now we analyze equation (38) to gain some insight into constant C and solution θ.
Schrödinger equation
First we could assume
Numerical results on the equilibrium equation
We would like to solve the equation (40) to find the constant
We need to specify the initial conditions
Consider two models with shared parameters

We compare
We numerically solve (42) for
In this paper, we consider the limit order book model of Avellaneda and Stoikov [1] with a mean reverting underlying price. Our main result is that when time is far from terminal, it is optimal to post constant limit order prices instead of tracking the underlying price. We use two different numerical methods to solve the HJB equation, and both of them confirm the long-time behavior. This result implies that when the underlying price is mean reverting then, when time is far from terminal, it is optimal to focus on the mean price and ignore the fluctuations around it. This observation, admittedly from a stylized model, confirms what limit order traders might expect.
The numerical results also show that between the time regime where constant limit order prices are optimal and the one close to the terminal time, there is an intermediate time period where limit order prices are influenced by the inventory of outstanding orders. The duration of this intermediate period depends on the parameters A and κ that quantify the liquidity of the market.
We also study the equilibrium of the optimal control problem. The equilibrium of the HJB equation can be transformed to a Schrödinger equation, as an eigenvalue problem. The solution agrees with the long-time limit of our numerical result of the time-dependent model, which confirms the validity and accuracy of our numerical methods, even for long time. When the liquidity parameter κ is small, the numerical solutions also match the analysis in Zhang [20].
Even though the numerical calculations strongly suggest convergence of the optimal limit order prices, the proof remains open and needs further study.
Footnotes
Numerical methods
Constant absolute risk aversion.
The shared parameters are
