Abstract
Implied volatility modeling is the future anticipation of price fluctuation and so has a crucial role in option pricing. Machine learning approach can be applied as a great tool to modeling implied volatility and predicting the corresponding future data working towards improving the validity of final outcomes. Usualy, the majority of traders and investors are willing to be encountered with a simple model which is easy to understand, so we provide a light method to reach the goal. In this paper, we propose a machine learning polynomial approach due to the smile shaped behavior of implied volatility and investigate it with a regularization penalty term to fit the Out-The-Money volatility data and we compare the result with the prominent counterpart SVI. Finally, the promising numerical results illustrate that the new proposed algorithm yields an implied volatility smile which is free from static arbitrage for Out-The-Money European call options most of the time and it outperforms SVI in prediction.
Introduction
Market data, particularly after the stock market crash of 1987, have not shown the constant volatility constraint which is considered in Black-Scholes model [4]. One of the major focuses of investigation in the area of financial mathematics in recent years is implied volatility modeling. There are some proposed models in finance to parameterize implied volatility, working towards predicting the value of implied volatility in the future, some of which are related to the Black-Scholes model; nevertheless, most of them are involved with some complex mathematics and often make some assumptions about the underlying characteristics of the market.
Implied volatility is derived from option prices and it shows what markets tell about the stock’s variation in the future; moreover, it reveals market opinion of stock potential moves, but it does not show the direction of price changes. If the implied volatility is high, the market thinks the stock is prone to large price fluctuation in either direction, just as low implied volatility implies the stock will not move as much in the near future.
A typical type of stochastic volatility is historical volatility, the yearly standard deviation of stock price returns, which measures the price changes over the previous year. From the option traders perspective, implied volatility is more considerable than historical volatility because implied volatility covers all paths of option life but not just the previous year. If for instance, a company plans to announce earnings or expects a major court ruling, these occurrences will influence the implied volatility of options that expire on the same month. Implied volatility helps traders evaluate how much of an impact news may have on the underlying stock. Generally, options which are based on the same underlying but different strike prices and time to maturity, are assigned by different implied volatilities [8]. This fact is generally viewed as evidence that an underlying volatility is not always constant but instead depends on factors such as the underlying recent price variance, the price level of underlying and the passage of time. A parametric model of implied volatility comes with certain advantages. Observed implied volatilities, and hence call prices, can be extrapolated. Therefore a parametric implied volatility model can be used to price new contracts for which there is no price on the market. The implied volatility in a parametric model is a function of strike price and time to maturity with an explicit analytical expression.
Machine learning is a part of artificial intelligence which works with the design and development of an adequate algorithm that changes based on the input data. There are numerous applications for pattern recognition, market modeling, and more. Statistical learning plays a crucial role in many areas of science like industry and finance. In a typical scenario, we have an outcome measurement, usually quantitative, such as a stock price, or categorical, such as malignant versus benign that we attempt to predict based on a set of features such as diet and clinical methodology. There is a training set of data, in which we observe the outcome and feature. This algorithm learns in such a way that the more we introduce empirical observations, the more precise future predictions can be reached.
In the previous decades, some researchers have attempted to overcome the problem of predicting market fluctuation for stock price data, but just a few of them try to inspect arbitrage condition in their proposed models. There exist several popular models for stochastic implied volatility, with the most popular being Stochastic Volatility Inspired (SVI) parameterization [9], the stochastic alpha, beta, rho (SABR) parameterization [10] and Vanna-Volga (VV) model [2]. The most popular model for implied volatility for a fixed time to maturity is SVI since it treats conditions on parameters that guarantee the absence of static arbitrage. Roux [14] proposed a quadratic regression model in terms of time to maturity and strike price which is based on several empirical observations from a particular sample period.
Also, some investigations have been done for modeling implied volatility by machine learning approaches. Research by Malliaris and Salchenberger [11] is mainly focused on the forecasting of future market volatility using a neural network algorithm. In another work, authors [16] suggested that implied volatility could be characterized as a function of time to maturity and moneyness, and following this paper, Alentorn [1] improved the model by making implied volatility explicitly a function of moneyness and time to expiration. But none of these machine learning practitioners have tried to make their models free of arbitrage.
In this paper, we propose a polynomial method to parameterize implied volatility for the Out-The-Money Europian Black-Scholes call options with fixed time to maturity. Since the plot of implied volatility with respect to moneyness is resembled to a smile, we implement a mapping from moneyness to it’s n-th degree polynomial to get a better learning from implied volatility data, and we try to make the parameterization to be free from static arbitrage.
The organization of this paper is as follows: Section 2 describes some preliminary definitions of option pricing and implied volatility, and also some basics materials of machine learning that are needed for the rest of this paper. Our discussion of the validity of the proposed machine learning procedure to parameterize implied volatility is presented in Section 3. In Section 4 a numerical implementation is done to support the idea behind this paper. Finally, we conclude this paper in Section 5 with a suggestion for further researches.
Preliminary
Option pricing and Black-Scholes
One of the major areas of concentration in finance is the pricing of derivatives. There are many existing models in finance for predicting the price of an option, most of which spin around the Black-Scholes model. The Black-Scholes formula is one of the most prominent and frequently used methods of computing European option price, and it is derived under some limited assumptions [4] includig variability due to the randomness of the underlying Brownian motion, no transaction costs, and it assumes volatility and interest rate to be fixed. The Black-Scholes formula for a call option with no dividends is given by
where
The interaction between the price of an option and these five variables is a complex nonlinear one, and empirical investigations have shown that the formula suffers from systematic biases known as the volatility smile, due to the underlying assumptions which account for its pricing dynamics.
In the Black-Scholes formula, all parameters are given in the market except the the stock price volatility. However this parameter can be estimated by the past realizations of stock price data, it usually gives different Black-Scholes option prices than the market option prices because the assumptions of the Black-Scholes model do not hold in real markets. To overcome this drawback, option traders use implied volatility to adapt the market prices for options with the Black-Sholes formula. In fact, they consider an option price in terms of the Black-Sholes implied volatility.
Let a call option be written on the underlying
An alternative, but similar definition of implied volatility can be stated by replacing the underlying price process with the forward price
where
A dynamic arbitrage is an opportunity which one has to re-balance the portfolio. An example of this would be buying an underpriced option in the Black-Scholes world while continuously delta hedging. In other words, a dynamic arbitrage opportunity is a costless trading strategy that gives a future profit with positive probability and has no possibility of a loss. If
Static arbitrage
First of all we introduce static arbitrage by it’s mathematical definition mentioned in [13] and a practical view is discussed, then two major theorems are itroduced to show conditions on both call surface and volatility surface to be free from static arbitrage.
.
A call price surface C is free from static arbitrage if there is a non-negative martingale X with Marov property on a probability space
In other words, there exists a non-negative martingale with Markov property which is associated with underlying stock price process in distribution.
A static arbitrage opportunity is an arbitrage opportunity where positions on underlying at a particular time only can depend on time and actual corresponding price, not on the path of the underlying price process for a period of time. Also, it is an arbitrage that does not require any re-balancing of the portfolio. For example, a company offers a mini euro future for 76,000 Euros and a big Euro future worth 152,000 Euros. One could sell one big future and buy two mini futures, and this would be a static arbitrage since a mini euro is more tradable in any market. Hence, working with static arbitrage suits the problem of dynamic arbitrage because a market with static arbitrage is involved with the problem of dynamic arbitrage as well. The next two theorems [6] give some conditions on both the call and the volatility surfaces to be free from static arbitrage.
.
An observed surface of call option prices written on some underlying
that is in
.
The conditions 1 through 5 on call prices in Theorem 1 are derived by the following arguments on the implied volatility surface
The first condition in Theorem 2 which implies the first in Theorem 1 means that total implied variance must be an increasing function of time to maturity. Moreover, if this condition holds, the corresponding volatility surface is said to be free of calendar spread arbitrage, and if it does not the opportunity of calendar spread arbitrage emerges in the market, so one can do a costless trading strategy in a given moment. As a matter of fact, lack of calendar spread arbitrage address a trader to buy a nearby option and sell the farther in the case of the large spread between the two options and to sell the nearby and buy the farther if the spread is narrow.
The second condition in Theorem 2 is equivalent to its counterpart in Theorem 1 which reveals that for large values of strike price the option price tends to zero, which is a conceptually acceptable criterion of option market in the real world. The third argument in Theorem 2 is always satisfied since we are using the implied volatility earned by the Black-Scholes call price formula. Finally, the inequality 4 refers to as Durrleman condition [19] which is a part of the second derivative of call surface with respect to strike price and determines the existence of a non-negative probability density. Conditions 2 together with condition 4 of Theorem 2 make a volatility surface free of butterfly arbitrage. For example, let
The SVI parameterization of the total implied variance for a fixed time to maturity [7] is:
where
Changing Increasing Increasing Increasing Increasing
Since market volatility data are sensible to time to maturity and strike price, parameterization of implied volatility is done over
Machine learning has the variety of applications and many algorithms which are used to model the behavior of natural phenomena and predict their future outcomes. The basic intuition behind this methodology is that there is a training set consists of empirical data
However, sometimes choosing an adequate learning algorithm which best described the trend of data outside the training set is the area of difficulty, and a wrong learning algorithm may get us much time investigating without coming up with a real conclusion. So we should know what is the best promising avenue to spend our time pursuing. If our selected hypothesis does an excellent job predicting
The parameter is called the regularization parameter assigned to prevent both high bias and high variance by controlling the trade-off between bias and variance, and
There is some methods to debug a learning algorithm in the cases of high bias and high variance. To fix high variance we can get more training example, try smaller sets of features and try increasing
In this section, we propose a learning algorithm to modeling implied volatility. The idea behind our strategy is that since total implied variance of a stock price is a smile shaped curve as a function of log-moneyness, based on a machine learning viewpoint in the area of curve fitting [17], we fit a polynomial model to implied volatility data. In the other words, instead of just learning from input data
Polynomial approach
As it is evident, fitting a simple linear model to total implied variance smile causes the problem of under-fitting, so to obey the strategy used by machine learning practitioners, we need to add some polynomial features and control the regularization parameter to overcome the drawbacks of under-fitting. Hence, a regularized polynomial model is the target which we are trying to confirm. The idea starts with the following hypothesis
and the training set of this investigation includes
First of all, using the training set data, we implement a machine learning approach for a fixed value of
The main reason to use this strategy for volatility modeling is that the quadratic model is mathematically the best one to parameterize a smile shaped function like total implied variance, but sometimes, under some market properties, higher degree plynomials may be necessary to modeling total implied variance. Due to some empirical implementations [3], the graph of implied volatility versus log-moneyness is not always resembled to a quadratic form, for example in the case of low VIX in market higher degree polynomial may be needed for parameterization, probably because of the existence of more data points or huge volume of tradability, so we include some higher degree polynomials in the procedure to reach an adequate model for total impiled variance; inaddition, to avoid over-fitting which occurs in modeling with higher degree polynomial we do not use higher than 4 degree polynomial for the empirical implementation.
The strategy we propose to estimating model parameters in each volatility slice is as follow:
where
To perform each of the three models, we learn the parameters vector from the training set, then the training error and the cross-validation error are computed based on the learned hypothesis in the training set, and learning curve which is the plot of the cross-validation error and the training error versus the size of the training set help us diagnose if the investigation is affected by high bias or high variance. The cross-validation error is computed for different values of
To overcome the effects of bias and variance for each model the validation curve, the cross-validation error and the training error plotted versus the regularization parameter, help us select the value of
How to choose the best model among the three proposed model depends on movement of training data and also the amount of errror which is earned by the cost function. As it is mentiened in Section 3.1, the training set is divided into three portions to perform optimization in training set and test the validity of the model in cross-validation set and test set. After choosing the optimum value of
Eventually, by performing all estimated models using the test set, the pair with the lowest test error is the one which we are looking for. It means, using test set data along with the estimated parameters included the corresponding regularization parameters
It seems the regularization term has a crucial role to remove butterfly arbitrage since most of the time a regularized cost function results in a volatility surface with no butterfly arbitrage but it is not the case for cost functions with no regularization term. It means, having a strategy to overcome high bias and high variance helps the parameterization be free from butterfly arbitrage. In Section 4, the figure of Durrleman condition for the selected model illustrates either butterfly arbitrage exists in this estimated model or not. Surprisingly the proposed method caused the volatility surface free from butterfly arbitrage for all different times to maturity. Indeed, the regularization parameter is a penalty term which helps us get rid of butterfly arbitrage.
The algorithm, step by step
In this Section, to provide a better understanding of the proposed algorithm, we itemize a simple pseudo code for the algorithm as follow:
For each of the three polynomial models, start for a volatility data Using training set data for each model, estimate parameters by minimizing the cost function for a fixed value of Using the cost function and the estimated parameters, compute training error and cross-validation error for different values of training set size. At the same figure, plot learning curve which is the training error and the cross-validation error versus the size of the training set. a) If the learning curve shows no drawback of over-fitting and under-fitting, compute test error of each model using these estimated parameters and the test set data. b) Otherwise, Plot the validation curve which is the training error and the cross-validation error versus the regularization parameter Choose the model with the lowest test error as the best model and use it to plot Durrleman’s function.
As it is said in Section 2.5, SVI parameters can control the variability of a smile in some limited directions like upward, downward and shrinking; moreover, it is somehow like a quadratic function and it can not describe the behavior of curves which follow more degree polynomial. But using the proposed polynomial approach we can not only adjust the quadratic behavior of the smile but also fit more variability of data by implementing higher degrees polynomial. For instance, in the case of more fluctuation and dramatic instantaneous movements of total implied variance data SVI can not do well in fitting since it is not predetermined to discover high fluctuation of total implied variance for a short range of log-moneyness and higher degree polynomial is needed to fit the data in such cases. So, the proposed machine learning approach covers more states of smile that could happen in real market in comparison with those which SVI can take care of. Hence, the proposed machine learning algorithm can be itself a helpful method of volatility parameterization.
Implementation and numerical results
In this section we implement the proposed machine learning algorithm to parameterize the Black-Scholes implied volatility data which are earned by S&P 500 European call options written on December 15, 2014. Furthermore, to reveal the robustness of our approach we perform it for six different times to maturity and compare the results of butterfly arbitrage with their counterparts reached by the prominent model SVI. The SVI is carried out following the methodology mentioned in the corresponding paper [9].
Data preparation
In this paper, a learning strategy is used to fit the total implied variance data observed in market, but before that market data should be well prepared. To show the robustness of the model we take out in-the-money and at-the-money data and just implement our strategy to out-the-money data since OTM option traders receive risk premium due to the higher amount of risk they bear in comparison with their ATM and ITM counterparts; Moreover, in the case of OTM option, volatility risk premium is more effective to predict option return, and it is shown that [5] returns on S&P 500 options are effectively predictable by volatility risk premium while using OTM options, but dropping down when move to the ITM options. So, because of these two main reasons, we perform our strategy on OTM data which is the most risky and the worst situation for a holder of stock options.
Optimization procedure
When the optimized parameters of the best model are obtained, we can determine whether these sets of smiles are free of static arbitrage or not. We examine condition 4 of Theorem 2, the Durrleman condition, which is a benchmark for the existence of butterfly arbitrage, but the optimization approach we use to parameters calibration has some extra penalties for different slices of total implied variance to guarantee the absence of calendar spread arbitrage. These penalties are not resembled to the one in Section 3 since they are not included in the cost function and they act as conditions on optimization of each slice working towards having the slices not to cross each other. There are some possible approaches defining penalties to avoid calendar spread arbitrage, but it is easy to eliminate it from the machine learning approach because of the simplicity of polynomial functions.
First of all, we implement the proposed algorithm in Section 3 separetely for each expiry time and determine the best degree of polynomial corresponding to any of them. Then, based on the values of estimated parameters for the shortest expiry time, we set some conditions on parameters of the second shortest expiry time in the corresponding optimization to make the second slice everywhere greater than the first one. Similarly, from the third expiry time up to the last one we make some penalties on optimizations for each slice to have a volatility surface free from calendar spread arbitrage. In fact, making some conditions for optimization of each slice related to the estimated parameters earned by optimization for the previous slice, we go step by step from the shortest time to maturity up to the longest one to calibrate parameters slice by slice. So the number of optimization we do is equal to the number of different expiry times we are dealing with for a particular option. Optimization procedure for each slice is done by the algorithm in Section 3.3. The condition we use for each slice is not predetermined and is dependent on the values earned for parameters in each slice, but moneyness
Machine learning implementation on total implied variance for S&P 500, traded on DEC 15, 2014
Machine learning implementation on total implied variance for S&P 500, traded on DEC 15, 2014
Total implied variance following the forward slice-by-slice method of Section 4.2.
Durrleman function of ML and SVI, implemented for 
Durrleman function of ML and SVI, implemented for 
Durrleman function of ML and SVI, implemented for 
Durrleman function of ML and SVI, implemented for 
Durrleman function of ML and SVI, implemented for 
Durrleman function of ML and SVI, implemented for 
The result of machine learning implementation for six different times to expiration is shown in Table 1. The best value of regularization parameter for each degree of polynomial along with the best pair with the lowest test error for different slices are presented at the last two columns of the table. Figure 1 is total implied variance versus moneyness for all six times to expiration that shows no calendar spread arbitrage for the calibration since there is no cross between curves. It illustrates that the calibration method mentioned in this section makes the surface free from calendar spread arbitrage since we performed a forward strategy by making penalties slice by slice. Hence, the first condition of Theorem 2 is satisfied empirically. The second condition of Theorem 2 or equivalently the second condition of Theorem 1 is logically acceptable since for European call options large values of strike price cause the option not to be exercised and eventually it tends to zero; furthermore, as we mentioned before, the third condition of Theorem 2 is satisfied because we work with the black&Scholes implied volatility. Durelman functions for both machine learning algorithm and SVI approach are shown in Figs 2 to 7 for different slices. For each selected pair of
The SVI parameterization of the same data shows different behavior. In fact, SVI parameters do not always conduct a surface with no butterfly arbitrage because in some cases they cause Durrleman function not to be strictly positive near at the money. So, they need to be re-picked by the proposed method in [9], but our strategy results in a model with no static arbitrage most of the time and we do not need to repick the estimated parameters to remove static arbitrage on most occasions. Therefore, the proposed machin learning approach outperforms the SVI in prediction since estimated values for parameters are not supposed to change like the repicking recepie implemented for SVI. We can see from the figures that for four times to maturity, as shown in Figs 3–5 and 7, Durrleman functions for SVI method are not strictly positive except for Figs 2 and 6, but in the machine learning approach no butterfly arbitrage is seen for all slices. In fact, in the case of machine learning approach the best pair
According to Table 1, the third degree polynomial is the best one for the second and sixth slices; moreover, for both of them the Durrleman’s function is negative in some parts when we implement SVI but not for the machine learning algorithm, so it seems in case of more variability of data SVI does not do well describing the beahavior of volatility data. Since SVI is somehow like a quadratic function, it can not play as an adequate predictor when data do not follow a quadratic trend. As it is mentioned in Section 3.4, total implied variance is not always a quadratic function of moneyness [3] and under some market fluctuation more degree of polynomial may be necessary to take care of fitting procedure.
Also there is a nice method to repick the parameters of SVI to make it free from butterfly arbitrage, but it causes some changes in the initial estimated parameters that can impose some drawbacks for the precision of fitting procedure. In fact, in SVI method we miss a little bit of accuracy in fitting the smile to grab a volatility surface with no butterfly arbitrage, but the implementation of the machine learning method shows that it acts a better performance to remove butterfly arbitrage in comparison with the SVI when we are not going to miss the robustness of fitting procedure.
The preliminary outcome showed that using machine learning to parameterize total implied variance results in models with remarkable gain over the SVI since most of the time it makes a volatility surface free from static arbitrage and does not need repicking to remove static arbitrage. Also, SVI is a precious model of parameterization of total implied variance with nice properties, our proposed polynomial approach provides a cheap and simple model with high accuracy (low cross-validation error) mostly resulting in the absence of static arbitrage; Moreover, it opens a door to investigate stochastic volatility surfaces by machine learning methods working towards removing arbitrage.
For sum up, this area of research has room to improve. We can introduce other machine learning kernels for option pricing by considering features like the underlying price process and implied volatility to increase our confidence in eliminating arbitrage opportunity and predicting the real value of options in the market.
