Implied volatility parameterization based on a machine learning polynomial approach

Abstract

Implied volatility modeling is the future anticipation of price fluctuation and so has a crucial role in option pricing. Machine learning approach can be applied as a great tool to modeling implied volatility and predicting the corresponding future data working towards improving the validity of final outcomes. Usualy, the majority of traders and investors are willing to be encountered with a simple model which is easy to understand, so we provide a light method to reach the goal. In this paper, we propose a machine learning polynomial approach due to the smile shaped behavior of implied volatility and investigate it with a regularization penalty term to fit the Out-The-Money volatility data and we compare the result with the prominent counterpart SVI. Finally, the promising numerical results illustrate that the new proposed algorithm yields an implied volatility smile which is free from static arbitrage for Out-The-Money European call options most of the time and it outperforms SVI in prediction.

Keywords

Implied volatility static arbitrage parameterization machine learning regularization

1. Introduction

Market data, particularly after the stock market crash of 1987, have not shown the constant volatility constraint which is considered in Black-Scholes model [4]. One of the major focuses of investigation in the area of financial mathematics in recent years is implied volatility modeling. There are some proposed models in finance to parameterize implied volatility, working towards predicting the value of implied volatility in the future, some of which are related to the Black-Scholes model; nevertheless, most of them are involved with some complex mathematics and often make some assumptions about the underlying characteristics of the market.

Implied volatility is derived from option prices and it shows what markets tell about the stock’s variation in the future; moreover, it reveals market opinion of stock potential moves, but it does not show the direction of price changes. If the implied volatility is high, the market thinks the stock is prone to large price fluctuation in either direction, just as low implied volatility implies the stock will not move as much in the near future.

A typical type of stochastic volatility is historical volatility, the yearly standard deviation of stock price returns, which measures the price changes over the previous year. From the option traders perspective, implied volatility is more considerable than historical volatility because implied volatility covers all paths of option life but not just the previous year. If for instance, a company plans to announce earnings or expects a major court ruling, these occurrences will influence the implied volatility of options that expire on the same month. Implied volatility helps traders evaluate how much of an impact news may have on the underlying stock. Generally, options which are based on the same underlying but different strike prices and time to maturity, are assigned by different implied volatilities [8]. This fact is generally viewed as evidence that an underlying volatility is not always constant but instead depends on factors such as the underlying recent price variance, the price level of underlying and the passage of time. A parametric model of implied volatility comes with certain advantages. Observed implied volatilities, and hence call prices, can be extrapolated. Therefore a parametric implied volatility model can be used to price new contracts for which there is no price on the market. The implied volatility in a parametric model is a function of strike price and time to maturity with an explicit analytical expression.

Machine learning is a part of artificial intelligence which works with the design and development of an adequate algorithm that changes based on the input data. There are numerous applications for pattern recognition, market modeling, and more. Statistical learning plays a crucial role in many areas of science like industry and finance. In a typical scenario, we have an outcome measurement, usually quantitative, such as a stock price, or categorical, such as malignant versus benign that we attempt to predict based on a set of features such as diet and clinical methodology. There is a training set of data, in which we observe the outcome and feature. This algorithm learns in such a way that the more we introduce empirical observations, the more precise future predictions can be reached.

In the previous decades, some researchers have attempted to overcome the problem of predicting market fluctuation for stock price data, but just a few of them try to inspect arbitrage condition in their proposed models. There exist several popular models for stochastic implied volatility, with the most popular being Stochastic Volatility Inspired (SVI) parameterization [9], the stochastic alpha, beta, rho (SABR) parameterization [10] and Vanna-Volga (VV) model [2]. The most popular model for implied volatility for a fixed time to maturity is SVI since it treats conditions on parameters that guarantee the absence of static arbitrage. Roux [14] proposed a quadratic regression model in terms of time to maturity and strike price which is based on several empirical observations from a particular sample period.

Also, some investigations have been done for modeling implied volatility by machine learning approaches. Research by Malliaris and Salchenberger [11] is mainly focused on the forecasting of future market volatility using a neural network algorithm. In another work, authors [16] suggested that implied volatility could be characterized as a function of time to maturity and moneyness, and following this paper, Alentorn [1] improved the model by making implied volatility explicitly a function of moneyness and time to expiration. But none of these machine learning practitioners have tried to make their models free of arbitrage.

In this paper, we propose a polynomial method to parameterize implied volatility for the Out-The-Money Europian Black-Scholes call options with fixed time to maturity. Since the plot of implied volatility with respect to moneyness is resembled to a smile, we implement a mapping from moneyness to it’s n-th degree polynomial to get a better learning from implied volatility data, and we try to make the parameterization to be free from static arbitrage.

The organization of this paper is as follows: Section 2 describes some preliminary definitions of option pricing and implied volatility, and also some basics materials of machine learning that are needed for the rest of this paper. Our discussion of the validity of the proposed machine learning procedure to parameterize implied volatility is presented in Section 3. In Section 4 a numerical implementation is done to support the idea behind this paper. Finally, we conclude this paper in Section 5 with a suggestion for further researches.

2. Preliminary

2.1 Option pricing and Black-Scholes

One of the major areas of concentration in finance is the pricing of derivatives. There are many existing models in finance for predicting the price of an option, most of which spin around the Black-Scholes model. The Black-Scholes formula is one of the most prominent and frequently used methods of computing European option price, and it is derived under some limited assumptions [4] includig variability due to the randomness of the underlying Brownian motion, no transaction costs, and it assumes volatility and interest rate to be fixed. The Black-Scholes formula for a call option with no dividends is given by

$\displaystyle C_{BS}(S_{0},K,\tau,\sigma,r)=S_{0}N(d_{1})-e^{(-r\tau)}KN(d_{2})$

(1) $\displaystyle d_{1}=\frac{\ln\left(\frac{S_{0}}{K}\right)+\left(r+\frac{\sigma% ^{2}}{2}\right)\tau}{\sigma\sqrt{\tau}},\quad d_{2}=d_{1}-\sigma\sqrt{\tau}$

where $S_{0}$ is the stock price, $K$ is the strike price, $\tau$ is the time to expiration, $\sigma$ is the standard deviation of the stock expected return, $N$ is the cumulative distribution function of a standard normal random variable, and $r$ is the risk-free interest rate.

The interaction between the price of an option and these five variables is a complex nonlinear one, and empirical investigations have shown that the formula suffers from systematic biases known as the volatility smile, due to the underlying assumptions which account for its pricing dynamics.
2.2 Implied volatility

In the Black-Scholes formula, all parameters are given in the market except the the stock price volatility. However this parameter can be estimated by the past realizations of stock price data, it usually gives different Black-Scholes option prices than the market option prices because the assumptions of the Black-Scholes model do not hold in real markets. To overcome this drawback, option traders use implied volatility to adapt the market prices for options with the Black-Sholes formula. In fact, they consider an option price in terms of the Black-Sholes implied volatility.

Let a call option be written on the underlying $S$ at time $t$ with strike price $K$ and expiry time $T$ , and the observed market price for this option is $C$ . The implied volatility of an option is the unique value of $\sigma_{\textit{imp}}$ that solves the following equation

$\displaystyle C=C^{BS}(\tau,K,\tau\sigma_{\textit{imp}}^{2},S,r,t)$ (2)

An alternative, but similar definition of implied volatility can be stated by replacing the underlying price process with the forward price

$\displaystyle C^{B}(\tau,K,\tau\sigma_{\textit{imp}}^{2},S,r,t)=F_{\left[{t,t+% \tau}\right]}N(d_{1})-KN(d_{2})$ (3) $\displaystyle d_{1}=\frac{{\log\left({\frac{{{F_{\left[{t,t+\tau}\right]}}}}{k% }}\right)+\frac{1}{2}\tau\sigma_{\textit{imp}}^{2}}}{{\sqrt{\tau\sigma_{% \textit{imp}}^{2}}}},\quad d_{2}=\frac{{\log\left({\frac{{{F_{\left[{t,\,t+% \tau}\right]}}}}{k}}\right)-\frac{1}{2}\tau\sigma_{\textit{imp}}^{2}}}{{\sqrt{% \tau\sigma_{\textit{imp}}^{2}}}}$

where $F_{\left[{t,t+\tau}\right]}=e^{r\tau}S_{t}$ is the forward price. A significant reason for using implied volatility is that it is easier to observe the implied volatility in the market than it is to observe the volatility of the underlying price process directly since the volatility is the standard deviation of the stock price return for the past year.

2.3 Dynamic arbitrage

A dynamic arbitrage is an opportunity which one has to re-balance the portfolio. An example of this would be buying an underpriced option in the Black-Scholes world while continuously delta hedging. In other words, a dynamic arbitrage opportunity is a costless trading strategy that gives a future profit with positive probability and has no possibility of a loss. If $V_{t}$ is the value of a portfolio at time $t$ , we say the market is free from dynamic arbitrage if $V_{0}=0$ , $P(V_{t}\geqslant 0)=1$ and $P(V_{t}>0)>0$ [18]. But the problem with this definition is that the opportunity depends on a too big set of data than is desired or even available in practical situations. For instance, in continuous time the definition depends on the path properties of underlying price processes. In practice only past prices at discrete times are observable, so working with this type of arbitrage may cause to difficulty since in the real word the price path of a stock data may not be accessible in the period of 0 up to $t$ . So, in this situation, we are working toward removing another type of arbitrage called static arbitrage.

2.4 Static arbitrage

First of all we introduce static arbitrage by it’s mathematical definition mentioned in [13] and a practical view is discussed, then two major theorems are itroduced to show conditions on both call surface and volatility surface to be free from static arbitrage.

.

A call price surface C is free from static arbitrage if there is a non-negative martingale X with Marov property on a probability space $(\Omega,\Im,F=(\Im_{t})_{t\geqslant 0},P)$ which the call price formula can be earned by

$\displaystyle C(K,\tau)=E((X_{\tau}-K)^{+}|\Im_{0}),\quad\forall(K,\tau)\in[0,% \infty)\times[0,\infty)$ (4)

In other words, there exists a non-negative martingale with Markov property which is associated with underlying stock price process in distribution.

A static arbitrage opportunity is an arbitrage opportunity where positions on underlying at a particular time only can depend on time and actual corresponding price, not on the path of the underlying price process for a period of time. Also, it is an arbitrage that does not require any re-balancing of the portfolio. For example, a company offers a mini euro future for 76,000 Euros and a big Euro future worth 152,000 Euros. One could sell one big future and buy two mini futures, and this would be a static arbitrage since a mini euro is more tradable in any market. Hence, working with static arbitrage suits the problem of dynamic arbitrage because a market with static arbitrage is involved with the problem of dynamic arbitrage as well. The next two theorems [6] give some conditions on both the call and the volatility surfaces to be free from static arbitrage.

.

An observed surface of call option prices written on some underlying $S$ expiring at time $T$

$\displaystyle C:\left({0,\infty}\right)\times R\to\left({0,\infty}\right)$ $\displaystyle\left({\tau,k}\right)\to E[{({S_{T}}-k)^{+}}]$

that is in $C^{2,2}$ is free of static arbitrage if the following five conditions hold

$\partial_{\tau}C>0$

$\mathop{\lim}\limits_{k\to\infty}C(\tau,k)=0$

$\mathop{\lim}\limits_{k\to-\infty}C(\tau,k)+k=a,a\in R$

$C(\tau,k)\textit{ is convex in }k$

$C(\tau,k)\geqslant 0$

.

The conditions 1 through 5 on call prices in Theorem 1 are derived by the following arguments on the implied volatility surface

${\partial_{\tau}}{w_{\textit{imp}}}>0$

$\lim_{k\to\infty}d_{1}(k)=-\infty$

$\tau\sigma_{\textit{imp}}\geqslant 0$

${\left({1-\frac{x}{{2{w_{\textit{imp}}}}}{\partial_{x}}({w_{\textit{imp}}})}% \right)^{2}}-\frac{1}{4}\left({\frac{1}{{{w_{\textit{imp}}}}}-\frac{1}{4}}% \right){\left({{\partial_{x}}({w_{\textit{imp}}})}\right)^{2}}+\frac{1}{2}{% \partial_{xx}}({w_{\textit{imp}}})\geqslant 0$

The first condition in Theorem 2 which implies the first in Theorem 1 means that total implied variance must be an increasing function of time to maturity. Moreover, if this condition holds, the corresponding volatility surface is said to be free of calendar spread arbitrage, and if it does not the opportunity of calendar spread arbitrage emerges in the market, so one can do a costless trading strategy in a given moment. As a matter of fact, lack of calendar spread arbitrage address a trader to buy a nearby option and sell the farther in the case of the large spread between the two options and to sell the nearby and buy the farther if the spread is narrow.

The second condition in Theorem 2 is equivalent to its counterpart in Theorem 1 which reveals that for large values of strike price the option price tends to zero, which is a conceptually acceptable criterion of option market in the real world. The third argument in Theorem 2 is always satisfied since we are using the implied volatility earned by the Black-Scholes call price formula. Finally, the inequality 4 refers to as Durrleman condition [19] which is a part of the second derivative of call surface with respect to strike price and determines the existence of a non-negative probability density. Conditions 2 together with condition 4 of Theorem 2 make a volatility surface free of butterfly arbitrage. For example, let ${C_{1}}$ and ${C_{2}}$ be call options with expiry time $T$ and strike prices ${k_{i}}$ with ${k_{1}}<{k_{2}}$ , and suppose an option with the same expiry time $T$ and the strike price $K$ with ${k_{1}}<k<{k_{3}}$ is existed in the market. If one has an opportunity getting a risk-free profit by buying the two call options and selling both in a strike price $K$ , the market is involved with the problem of butterfly arbitrage.

2.5 Parameterization of the implied volatility

The SVI parameterization of the total implied variance for a fixed time to maturity [7] is:

$\displaystyle w_{\textit{imp}}^{\textit{SVI}}(x)=a+b\left(\rho(x-m)+\sqrt{{{(x% -m)}^{2}}+{\sigma^{2}}}\right)$

(5) $\displaystyle a\in\mathbb{R},\quad b\geqslant 0,\quad\left|\rho\right|<1,\quad m% \in\mathbb{R},\quad\sigma>0,\quad x=\log\frac{K}{{{F_{[t,t+\tau]}}}}$

where $x$ is moneyness and $\left\{{a,b,\sigma,\rho,m}\right\}$ is the parameter set which plays an important role as follow:

•
Changing $a$ causes a vertical change of smile upward or downward.
•
Increasing $b$ increases the slopes, making the smile more tightened.
•
Increasing $\sigma$ reduces the ATM curvature of the smile.
•
Increasing $\rho$ decreases the left slope and increases the right slope and vice versa.
•
Increasing $m$ shifts the smile to the right.

Since market volatility data are sensible to time to maturity and strike price, parameterization of implied volatility is done over $w_{\textit{imp}}^{\textit{SVI}}(x)=\tau\sigma_{\textit{imp}}^{2}$ which consists of time to maturity; moreover, total implied variance is more intrinsic than variance in Black-Scholes framework since in pricing formula volatility is always appeared with a $\sqrt{\tau}$ . Furthermore, based on market data, total implied variance is a smile shaped function of $x$ , so SVI parameterization gives a nonlinear hypothesis as a function of $x$ . The intuition behind SVI is that the first part of the model describes the extreme linear part of the smile called deep out the money, and the second part represents the nonlinear middle part of the smile; in addition, this parameterization is a spotlight in the area of volatility modeling because the author made some condition on parameters which guarantee the absence of static arbitrage.
2.6 Machine learning approach

Machine learning has the variety of applications and many algorithms which are used to model the behavior of natural phenomena and predict their future outcomes. The basic intuition behind this methodology is that there is a training set consists of empirical data $({x^{(1)}},{y^{(1)}}),({x^{(2)}},{y^{(2)}}),...,({x^{(m)}},{y^{(m)}})$ , where $m$ is the number of training examples; moreover, a learning algorithm (learning hypothesis) fits the data to determine how to learn from the training set such that the vector is estimated by by the following strategy:

$\displaystyle\mathop{\theta}\limits^{\wedge}=\arg\,\mathop{\min}\limits_{% \theta}J(\theta)=\arg\,\mathop{\min}\limits_{\theta}\frac{1}{m}\sum\limits_{i=% 1}^{m}V\left({{h_{\theta}}({x^{(i)}}),{y^{(i)}}}\right)$ (6)

However, sometimes choosing an adequate learning algorithm which best described the trend of data outside the training set is the area of difficulty, and a wrong learning algorithm may get us much time investigating without coming up with a real conclusion. So we should know what is the best promising avenue to spend our time pursuing. If our selected hypothesis does an excellent job predicting $y$ from $x$ for the examples in the training set but not for those outside the training set, we face with a problem of over-fitting (high variance), on the other hand, if the hypothesis does not do well in prediction of both training set and outside the training set, we are encountered with the problem of under-fitting (high bias). Conquering these obstacles, a regularization penalty term is added to the cost function to estimate parameters by the following representation:

$\displaystyle\mathop{\theta}\limits^{\wedge}=\arg\,\mathop{\min}\limits_{% \theta}\frac{1}{m}(V(h_{\theta}(x^{(i)}),y^{(i)})+\lambda R(h_{\theta}(x^{(i)}% )))$ (7)

The parameter is called the regularization parameter assigned to prevent both high bias and high variance by controlling the trade-off between bias and variance, and $R$ is the regularization function which introduces a penalty for the complexity of the hypothesis to impose some certain restrictions on parameters space depending what goal we are trying to reach in our investigation; furthermore, the regularization term improves the hypothesis to generalize well to the data beyond the training set [15].

There is some methods to debug a learning algorithm in the cases of high bias and high variance. To fix high variance we can get more training example, try smaller sets of features and try increasing ${\lambda}$ , and to avoid high bias some jobs like getting additional feature, adding polynomial features, and trying to decrease ${\lambda}$ are helpful [17].

3. Procedure of model selection

In this section, we propose a learning algorithm to modeling implied volatility. The idea behind our strategy is that since total implied variance of a stock price is a smile shaped curve as a function of log-moneyness, based on a machine learning viewpoint in the area of curve fitting [17], we fit a polynomial model to implied volatility data. In the other words, instead of just learning from input data $x$ , we learn from a mapping of $x$ to it’s n-th degree polynomial; moreover, in order to control the trade-off between bias and variance, and also to get rid of butterfly arbitrage we add a penalty term to the cost function.

3.1 Polynomial approach

As it is evident, fitting a simple linear model to total implied variance smile causes the problem of under-fitting, so to obey the strategy used by machine learning practitioners, we need to add some polynomial features and control the regularization parameter to overcome the drawbacks of under-fitting. Hence, a regularized polynomial model is the target which we are trying to confirm. The idea starts with the following hypothesis

$\displaystyle{w_{\theta}}({x})={\theta_{0}}+{\theta_{1}}x+{\theta_{2}}{x^{2}}+% ...+{\theta_{n}}{x^{n}}$ (8)

and the training set of this investigation includes $x$ as log-moneyness and $w$ as total implied variance. Also, the training dataset is divided into three portions: the training set, the cross-validation set and the test set whereas a normal ratio for dividing the training set is 60% for the training set, 20% for the cross-validation set and 20% for the test set. In addition, to get a better outcome, data are randomly distributed in these three portions.

First of all, using the training set data, we implement a machine learning approach for a fixed value of $\lambda$ by the following models:

$\displaystyle w_{\theta}^{{Q^{2}}}({x})={\theta_{0}}+{\theta_{1}}x+{\theta_{2}% }{x^{2}}$ $\displaystyle w_{\theta}^{{Q^{3}}}({x})={\theta_{0}}+{\theta_{1}}x+{\theta_{2}% }{x^{2}}+{\theta_{3}}{x^{3}}$ $\displaystyle w_{\theta}^{{Q^{4}}}({x})={\theta_{0}}+{\theta_{1}}x+{\theta_{2}% }{x^{2}}+{\theta_{3}}{x^{3}}+{\theta_{4}}{x^{4}}$

The main reason to use this strategy for volatility modeling is that the quadratic model is mathematically the best one to parameterize a smile shaped function like total implied variance, but sometimes, under some market properties, higher degree plynomials may be necessary to modeling total implied variance. Due to some empirical implementations [3], the graph of implied volatility versus log-moneyness is not always resembled to a quadratic form, for example in the case of low VIX in market higher degree polynomial may be needed for parameterization, probably because of the existence of more data points or huge volume of tradability, so we include some higher degree polynomials in the procedure to reach an adequate model for total impiled variance; inaddition, to avoid over-fitting which occurs in modeling with higher degree polynomial we do not use higher than 4 degree polynomial for the empirical implementation.

3.2 Cost function and optimization method

The strategy we propose to estimating model parameters in each volatility slice is as follow:

$\displaystyle\mathop{\theta}\limits^{\wedge}=\arg\,\mathop{\min}\limits_{% \theta}\frac{1}{m}\left({{{\sum\limits_{i=1}^{m}{\left({w_{\theta}^{{Q^{n}}}({% x^{(i)}})-{w^{(i)}}}\right)}}^{2}}+\lambda\sum\limits_{j=1}^{n}{\theta_{j}^{2}% }}\right)$

where ${{w^{(i)}}}$ is the observed total implied variance in market for i-th training example which is computed based on Eq. (2.2). But the reason to add the regularization term above is that it keeps parameters small to make the hypothesis relatively simple to avoid over-fitting; in addition, to guarantee the absence of over-fitting wich occurs in the case of higher degree polynomials in curve fitting, we consider the maximum degree of polynomial to fit the data equal to 4, therefore we have three polynomial models with different degrees and we are trying to choose the best one which is more compatible with the trend of market data.

To perform each of the three models, we learn the parameters vector from the training set, then the training error and the cross-validation error are computed based on the learned hypothesis in the training set, and learning curve which is the plot of the cross-validation error and the training error versus the size of the training set help us diagnose if the investigation is affected by high bias or high variance. The cross-validation error is computed for different values of $m$ by the following formula:

$\displaystyle J_{cv}(\theta)=\frac{1}{m}\left(\sum\limits_{i=1}^{m}\left(w_{% \theta}^{Q^{n}}(x_{cv}^{(i)})-w_{cv}^{(i)}\right)^{2}+\lambda\sum\limits_{j=1}% ^{n}{\theta_{j}^{2}}\right)$

To overcome the effects of bias and variance for each model the validation curve, the cross-validation error and the training error plotted versus the regularization parameter, help us select the value of $\lambda$ which minimizes the cross-validation error for the corresponding model. So, we have three models with three pairs of $(n,\lambda)$ and we are going to choose the one which best describes the behavior of total implied variance.

How to choose the best model among the three proposed model depends on movement of training data and also the amount of errror which is earned by the cost function. As it is mentiened in Section 3.1, the training set is divided into three portions to perform optimization in training set and test the validity of the model in cross-validation set and test set. After choosing the optimum value of ${\lambda}$ , we perform the model in training set and compute test error based on the new value of ${\lambda}$ to grab the model wich is more compatible with the trend of data. The test error is earned as follow:

$\displaystyle J_{\textit{test}}(\theta)=\frac{1}{m}\left({{{\sum\limits_{i=1}^% {m}{\left({w_{\theta}^{{Q^{n}}}({x_{\textit{test}}^{(i)}})-w_{\textit{test}}^{% {(i)}}}\right)}}^{2}}+\lambda\sum\limits_{j=1}^{n}{\theta_{j}^{2}}}\right)$

Eventually, by performing all estimated models using the test set, the pair with the lowest test error is the one which we are looking for. It means, using test set data along with the estimated parameters included the corresponding regularization parameters ${\lambda}$ , we compute test error separetely for ech of the three models and choose the model with the lowest test error as the the best one.

It seems the regularization term has a crucial role to remove butterfly arbitrage since most of the time a regularized cost function results in a volatility surface with no butterfly arbitrage but it is not the case for cost functions with no regularization term. It means, having a strategy to overcome high bias and high variance helps the parameterization be free from butterfly arbitrage. In Section 4, the figure of Durrleman condition for the selected model illustrates either butterfly arbitrage exists in this estimated model or not. Surprisingly the proposed method caused the volatility surface free from butterfly arbitrage for all different times to maturity. Indeed, the regularization parameter is a penalty term which helps us get rid of butterfly arbitrage.

3.3 The algorithm, step by step

In this Section, to provide a better understanding of the proposed algorithm, we itemize a simple pseudo code for the algorithm as follow:

1.
For each of the three polynomial models, start for a volatility data $(x^{(i)},w^{(i)})$ with fixed time to maturity.
2.
Using training set data for each model, estimate parameters by minimizing the cost function for a fixed value of $\lambda$ (For the first implementation let $\lambda=0$ ).
3.
Using the cost function and the estimated parameters, compute training error and cross-validation error for different values of training set size.
4.
At the same figure, plot learning curve which is the training error and the cross-validation error versus the size of the training set.
5.
a) If the learning curve shows no drawback of over-fitting and under-fitting, compute test error of each model using these estimated parameters and the test set data.

b) Otherwise, Plot the validation curve which is the training error and the cross-validation error versus the regularization parameter ${\lambda}$ , and choose the value of ${\lambda}$ which minimizes cross-validation error, then move on to step 2.
6.
Choose the model with the lowest test error as the best model and use it to plot Durrleman’s function.

3.4 Machin learning versus SVI

As it is said in Section 2.5, SVI parameters can control the variability of a smile in some limited directions like upward, downward and shrinking; moreover, it is somehow like a quadratic function and it can not describe the behavior of curves which follow more degree polynomial. But using the proposed polynomial approach we can not only adjust the quadratic behavior of the smile but also fit more variability of data by implementing higher degrees polynomial. For instance, in the case of more fluctuation and dramatic instantaneous movements of total implied variance data SVI can not do well in fitting since it is not predetermined to discover high fluctuation of total implied variance for a short range of log-moneyness and higher degree polynomial is needed to fit the data in such cases. So, the proposed machine learning approach covers more states of smile that could happen in real market in comparison with those which SVI can take care of. Hence, the proposed machine learning algorithm can be itself a helpful method of volatility parameterization.

4. Implementation and numerical results

In this section we implement the proposed machine learning algorithm to parameterize the Black-Scholes implied volatility data which are earned by S&P 500 European call options written on December 15, 2014. Furthermore, to reveal the robustness of our approach we perform it for six different times to maturity and compare the results of butterfly arbitrage with their counterparts reached by the prominent model SVI. The SVI is carried out following the methodology mentioned in the corresponding paper [9].

4.1 Data preparation

In this paper, a learning strategy is used to fit the total implied variance data observed in market, but before that market data should be well prepared. To show the robustness of the model we take out in-the-money and at-the-money data and just implement our strategy to out-the-money data since OTM option traders receive risk premium due to the higher amount of risk they bear in comparison with their ATM and ITM counterparts; Moreover, in the case of OTM option, volatility risk premium is more effective to predict option return, and it is shown that [5] returns on S&P 500 options are effectively predictable by volatility risk premium while using OTM options, but dropping down when move to the ITM options. So, because of these two main reasons, we perform our strategy on OTM data which is the most risky and the worst situation for a holder of stock options.

4.2 Optimization procedure

When the optimized parameters of the best model are obtained, we can determine whether these sets of smiles are free of static arbitrage or not. We examine condition 4 of Theorem 2, the Durrleman condition, which is a benchmark for the existence of butterfly arbitrage, but the optimization approach we use to parameters calibration has some extra penalties for different slices of total implied variance to guarantee the absence of calendar spread arbitrage. These penalties are not resembled to the one in Section 3 since they are not included in the cost function and they act as conditions on optimization of each slice working towards having the slices not to cross each other. There are some possible approaches defining penalties to avoid calendar spread arbitrage, but it is easy to eliminate it from the machine learning approach because of the simplicity of polynomial functions.

First of all, we implement the proposed algorithm in Section 3 separetely for each expiry time and determine the best degree of polynomial corresponding to any of them. Then, based on the values of estimated parameters for the shortest expiry time, we set some conditions on parameters of the second shortest expiry time in the corresponding optimization to make the second slice everywhere greater than the first one. Similarly, from the third expiry time up to the last one we make some penalties on optimizations for each slice to have a volatility surface free from calendar spread arbitrage. In fact, making some conditions for optimization of each slice related to the estimated parameters earned by optimization for the previous slice, we go step by step from the shortest time to maturity up to the longest one to calibrate parameters slice by slice. So the number of optimization we do is equal to the number of different expiry times we are dealing with for a particular option. Optimization procedure for each slice is done by the algorithm in Section 3.3. The condition we use for each slice is not predetermined and is dependent on the values earned for parameters in each slice, but moneyness $x$ is always positive for call options since we implement our method on OTM data, so it is easy to make a reasonable penalty in any slice to make the corresponding surface strictly greater than the previous one. For example, in case of the same degree of polynomial for two consecutive slices, we can let parameters of the new slice to be greater than their counterparts in the previous slice. In fact, working forward we manage in the i-th step to keep the function $\left[{w(\eta_{i+1},x)-w(\eta_{i},x)}\right]$ positive for all values of $x$ around ATM, where $\eta_{i}$ is the set of parameters for the i-th slice.

Table 1
Machine learning implementation on total implied variance for S&P 500, traded on DEC 15, 2014

Time to maturity	Expiry date	n	$\lambda$	The best pair
0.0136	12/20/2014	2	0.001	$n=$ 2
		3	0.003	$\lambda=$ 0.001
		4	0.005
0.0438	12/31/2014	2	0.3	$n=$ 3
		3	0.001	$\lambda=$ 0.001
		4	0.003
0.0684	01/09/2015	2	3	$n=$ 2
		3	9.95	$\lambda=$ 3
		4	10
0.0904	01/17/2015	2	0.01	$n=$ 2
		3	0.05	$\lambda=$ 0.01
		4	0.1
0.126	01/30/2015	2	0.05	$n=$ 2
		3	3	$\lambda=$ 0.05
		4	3
0.178	02/20/2015	2	3	$n=$ 3
		3	3	$\lambda=$ 3
		4	10

Figure 1.

Total implied variance following the forward slice-by-slice method of Section 4.2.

Figure 2.

Durrleman function of ML and SVI, implemented for $\tau=$ 0.0136.

Figure 3.

Durrleman function of ML and SVI, implemented for $\tau=$ 0.0438.

Figure 4.

Durrleman function of ML and SVI, implemented for $\tau=$ 0.0684.

Figure 5.

Durrleman function of ML and SVI, implemented for $\tau=$ 0.0904.

Figure 6.

Durrleman function of ML and SVI, implemented for $\tau=$ 0.126.

Figure 7.

Durrleman function of ML and SVI, implemented for $\tau=$ 0.178.

The result of machine learning implementation for six different times to expiration is shown in Table 1. The best value of regularization parameter for each degree of polynomial along with the best pair with the lowest test error for different slices are presented at the last two columns of the table. Figure 1 is total implied variance versus moneyness for all six times to expiration that shows no calendar spread arbitrage for the calibration since there is no cross between curves. It illustrates that the calibration method mentioned in this section makes the surface free from calendar spread arbitrage since we performed a forward strategy by making penalties slice by slice. Hence, the first condition of Theorem 2 is satisfied empirically. The second condition of Theorem 2 or equivalently the second condition of Theorem 1 is logically acceptable since for European call options large values of strike price cause the option not to be exercised and eventually it tends to zero; furthermore, as we mentioned before, the third condition of Theorem 2 is satisfied because we work with the black&Scholes implied volatility. Durelman functions for both machine learning algorithm and SVI approach are shown in Figs 2 to 7 for different slices. For each selected pair of $(n,\,\lambda)$ , followed by the proposed machine learning approach, the Durelman function is strictly positive even for far out-the-money data, so the forth condition of Theorem 2 is confirmed. Hence, all conditions of Theorem 2 are empirically proven and the parameterization is free from static arbitrage.

The SVI parameterization of the same data shows different behavior. In fact, SVI parameters do not always conduct a surface with no butterfly arbitrage because in some cases they cause Durrleman function not to be strictly positive near at the money. So, they need to be re-picked by the proposed method in [9], but our strategy results in a model with no static arbitrage most of the time and we do not need to repick the estimated parameters to remove static arbitrage on most occasions. Therefore, the proposed machin learning approach outperforms the SVI in prediction since estimated values for parameters are not supposed to change like the repicking recepie implemented for SVI. We can see from the figures that for four times to maturity, as shown in Figs 3–5 and 7, Durrleman functions for SVI method are not strictly positive except for Figs 2 and 6, but in the machine learning approach no butterfly arbitrage is seen for all slices. In fact, in the case of machine learning approach the best pair $(n,\,\lambda)$ for each of the six sets of data has a strictly positive Durrleman function resulting in the robustness of our proposed model.

According to Table 1, the third degree polynomial is the best one for the second and sixth slices; moreover, for both of them the Durrleman’s function is negative in some parts when we implement SVI but not for the machine learning algorithm, so it seems in case of more variability of data SVI does not do well describing the beahavior of volatility data. Since SVI is somehow like a quadratic function, it can not play as an adequate predictor when data do not follow a quadratic trend. As it is mentioned in Section 3.4, total implied variance is not always a quadratic function of moneyness [3] and under some market fluctuation more degree of polynomial may be necessary to take care of fitting procedure.

Also there is a nice method to repick the parameters of SVI to make it free from butterfly arbitrage, but it causes some changes in the initial estimated parameters that can impose some drawbacks for the precision of fitting procedure. In fact, in SVI method we miss a little bit of accuracy in fitting the smile to grab a volatility surface with no butterfly arbitrage, but the implementation of the machine learning method shows that it acts a better performance to remove butterfly arbitrage in comparison with the SVI when we are not going to miss the robustness of fitting procedure.

5. Conclusion

The preliminary outcome showed that using machine learning to parameterize total implied variance results in models with remarkable gain over the SVI since most of the time it makes a volatility surface free from static arbitrage and does not need repicking to remove static arbitrage. Also, SVI is a precious model of parameterization of total implied variance with nice properties, our proposed polynomial approach provides a cheap and simple model with high accuracy (low cross-validation error) mostly resulting in the absence of static arbitrage; Moreover, it opens a door to investigate stochastic volatility surfaces by machine learning methods working towards removing arbitrage.

For sum up, this area of research has room to improve. We can introduce other machine learning kernels for option pricing by considering features like the underlying price process and implied volatility to increase our confidence in eliminating arbitrage opportunity and predicting the real value of options in the market.

References

Alentorn

, Modelling the implied volatility surface, an empirical study for FTSE options, See the website: www. theponytail. net/CCFEA, 2004.

Castagna

and Mercurio

, OPTION PRICING: The vanna-volga method for implied volatilities, Risk 20(1) (2007), 106.

Zhu

, Implied Volatility Modeling, MSc dissertation, University of Wareloo, 2013.

Black

and Scholes

, The pricing of options and corporate liabilities, Journal of Political Economy 81(3) (1973), 637–654.

, The volatility risk premium and the predictability of index option returns, 2016.

Kellerer

H.G.

, Markov-Komposition und eine Anwendung auf Martingale, Mathematische Annalen 198(3) (1972), 99–122.

Gatheral

, A parsimonious arbitrage-free implied volatility parameterization with application to the valuation of volatility derivatives, Presentation at Global Derivatives & Risk Management, Madrid, 2004.

Gatheral

, The volatility surface: a practitioner’s guide. Vol. 357, John Wiley & Sons, 2011.

Gatheral

and Jacquier

, Arbitrage-free SVI volatility surfaces, Quantitative Finance 14(1) (2014), 59–71.

10.

Avellaneda

, From SABR to geodesics, Conference presentation slides, 2005.

11.

Malliaris

and Salchenberger

, Using neural networks to forecast the S&P 100 implied volatility, Neurocomputing 10(2) (1996), 95–183.

12.

Roper

, Implied volatility: General properties and asymptotics, The University of New South Wales, 2009, 2–3.

13.

Roper

, Arbitrage free implied volatility surfaces, preprint, 2010.

14.

Roux

, A long-term model of the dynamics of the S&P 500 implied volatility surface, North American Actuarial Journal 119(4) (2007), 61–75.

15.

Nilsson

N.G.

, Introduction to machine learning, Department of Computer Science, Stanford University, 2005, CA 94305.

16.

Cont

and Da Fonseca

, Dynamics of implied volatility surfaces, Quantitative Finance 2(1) (2002), 45–60.

17.

Hastie

Tibshirani

and Friedman

, The Elements of Statistical Learning, Data Mining, Inference, and Prediction, Biometrics, 2002.

18.

Bjork

, Arbitrage theory in continuous time, Oxford university press, 2009.

19.

Durrleman

, A note on initial volatility surface, Unpublished manuscript, February, 2003.

Implied volatility parameterization based on a machine learning polynomial approach

Abstract

Keywords

1. Introduction

2. Preliminary

2.1 Option pricing and Black-Scholes

2.4 Static arbitrage

.

.

.

3.1 Polynomial approach

3.3 The algorithm, step by step

4. Implementation and numerical results

4.1 Data preparation

4.2 Optimization procedure

Table 1 Machine learning implementation on total implied variance for S&P 500, traded on DEC 15, 2014

References

Table 1
Machine learning implementation on total implied variance for S&P 500, traded on DEC 15, 2014