Abstract
The implied volatility plays a pivotal role in the options market, and a collection of implied volatilities across strike and maturity is known as the implied volatility surface (IVS). To capture the dynamics of IVS, this study examines the latent states of IVS and their relationship based on the regime-switching framework of the hidden Markov model (HMM). The cross-sectional models are first built for daily implied volatilities, and the obtained regression factors are regarded as the proxies of the IVS. Then, having these latent factors, the HMM is employed to model the dynamics of IVS. Take the advantages of HMM, the hidden state for each daily data is identified to achieve the corresponding time distribution, the characteristics, and the transition between the hidden states. The empirical study is conducted on the Shanghai 50ETF options, and the analysis results indicate that the HMM can capture the latent factors of IVS. The achieved states reflect different financial characteristics, and some of their typical features and transfer are associated with certain events. In addition, the HMM exploited to predict the regression factors of the cross-sectional models enables the further forecasting of implied volatilities. The autoregressive integrated moving average model, the vector auto-regression model, and the support vector regression model are regarded as benchmarks for comparison. The results show that the HMM performs better in the implied volatility prediction compared with other models.
Introduction
Volatility plays an important role in the financial market, which occurs in a broad spectrum of theories and applications [1]. Actually, the observed option prices implicitly contain the information on the volatility expectations of market participants [2]. Following the classical Black-Scholes option pricing model, there is a mapping between the observed option price and the invisible implied volatility, which realizes the comparison of the option value of different strike prices, expirations, and underlying assets. The implied volatility based on the Black-Scholes option pricing model reflects investors’ expectations on the future volatility of underlying assets. A good prediction of the volatility of asset prices is essential for making a wise decision. In recent years, numerous studies conduct on the modeling or prediction of implied volatility [3–6].
The implied volatility varies with different strike prices and expirations for options with the same underlying assets, which can be plotted against both moneyness and maturity to produce an implied volatility surface (IVS). For a given maturity, the IVS shows a “smile” or a “skewed” pattern at different strikes, and the “skewed” implied volatility extends to a non-flat volatility surface under the consideration of different maturities [7].
To explore the dynamic characteristics of IVS, it is a suitable way to build time series models to confirm and further predict the dependence of IVS on the strike price and the maturity. In this process, suppose that similar IVSs come from the same but unknown state. If the states and their transitions can be identified, this will provide more information about the characteristics of the IVS over time, which is critical for understanding and predicting it.
To realize this goal, this study models and predicts the time-varying characteristic in IVS based on the regime-switching structure of HMM. The contributions of this study can be summarized as follows:
Taking advantage of HMM, we identify the state of the IVS and capture the dynamic changes in the states over time along with their corresponding characteristics. The derived transition mechanism between the various states of the IVS provides a way to understand the changes in the options market. The proposed model offers a better prediction performance and owns the ability to describe the characteristics of the predicted IVS.
The structure of the remainder of this study is as follows. Section 2 presents a review of studies on IVS analysis and HMM application. Section 3 introduces the methodology of IVS modeling and prediction, and Section 4 describes the data used in the empirical study. Section 5 outlines the empirical analysis with the comparisons among fitting results of models. Section 6 provides a performance assessment of out-of-sample forecasting for models. The summary of this study is presented in Section 7.
Related Work
Research on the implied volatility surface (IVS) starts with the quantitative analysis of implied volatility characteristics against strike price and maturity. This is followed by a parameter-based specification method to model and calibrate the IVS, whereby the implied volatility of an option is represented as a function of its strike price and expiration [8]. Moneyness is then introduced as a description of the strike price in relation to the underlying asset’s price for parameter specification [9]. Later, the polynomial parameter model is successively improved by modifying the definition of moneyness and introducing a new dependent variable, the logarithm of implied volatility [4, 10]. The polynomial parameter model that depicts the relationship between the option implied volatility and its strike price and date to maturity is called a cross-sectional model, and the regression coefficients are treated as the reference (also referred to as the “latent factors”) of IVS when modeling or forecasting the IVS. In the cross-sectional modeling process, the model’s fitting and prediction abilities are well investigated, which confirms the predictable dynamics of the IVS [8, 11]. This process serves as the first stage of the so-called “two-stage approach”, which involves fitting the IVS to obtain the latent factors. Then, the second stage focuses on the modeling and predicting of the latent factors to indirectly study the dynamics of the IVS.
As a representative multivariate time series model, the vector auto-regression (VAR) model is adopted to analyze the latent factors owing to the dynamic correlation with the previous information [1, 2]. To improve the effectiveness of modeling, the principal component analysis [9, 13], the semi-parametric model [1, 14], and the random volatility model [15] are also employed in the dynamic modeling of IVS. In the models above, the dynamic correlation of latent factors is achieved, but without the analysis of latent factors grouped by similar financial characteristics over the sample period.
Basic notations used in this study
Basic notations used in this study

The proposed modeling process.
This section elaborates on the modeling and predicting of the implied volatility surface (IVS). As shown in Fig. 1, the modeling process includes the definition of IVS, the cross-sectional model of the IVS, and the modeling of latent factors in the cross-sectional model. Table 1 presents a summary of the basic notations used in this study.
The definition of implied volatility surface
The implied volatility is the standard deviation of the returns that makes the market price conform to the theoretical price. A high level of implied volatility typically indicates that the market expects significant price movements in the future, whereas a low level of implied volatility suggests that the market anticipates more stable price movements. Under the assumption that the market price could correctly reflect the volatility of the stock, the implied volatility could be obtained by a reverse calculation based on the Black-Scholes pricing formula according to the observed option price [27].
Based on the risk-free arbitrage, the option price can be priced by risk-neutral pricing method. The analytic solution of Black-Scholes option pricing formula is:
Following the Black-Scholes model, the implied volatility varies for options with the same underlying asset but different strikes and expirations, thus a curved surface of implied volatilities (IVS) is exhibited against strikes and expirations [7]. As the implied volatility is perceived as a market expectation of future volatility [28], the following analysis is conducted on the IVS formed by the varying implied volatilities instead of quoting the option price directly.
To characterize the information of IVS, a deterministic cross-sectional model of IVS is constructed based on moneyness and date to maturity. Let K and F = Se rτ be the strike price and the implied forward price of the option, respectively. Then, the moneyness can be calculated by:
Further, with the consideration of the influence of the maturity squared term on IVS, another cross-sectional model of IVS is:
For each day, the latent factors Homogeneous Markov hypothesizes that the state of a hidden Markov chain at any time t only depends on the state at time t - 1, which has no tie to any state or observation at other times:
The observation independence assumption is that the observation at any time t only depends on the state of the Markov chain at this time t, which has no tie to any other observation or state:
With these two basic assumptions, an HMM can be described by the state transition matrix
Since the observation sequence
This section presents the data used in the empirical analysis, the characteristics of IVS in moneyness and maturity dimensions, and the regression factors obtained based on Models (4) and (5).
Data selection and filtering
We use the daily data of European options on the Shanghai 50ETF from the Shanghai Stock Exchange over the period January 2, 2018, to June 30, 2021, which are derived from the Wind database. To avoid inactive trading, three exclusionary criteria are applied to filter the data as follows. First, drop Shanghai 50ETF option contracts with a maturity of less than 7 calendar days [29]. Second, eliminate the options whose daily number of data is less than 10. Third, based on the moneyness (Eq. (3)), filter out the in-the-money options, i.e., the call options with M < 0 and the put options with M > 0. In addition, we remain the options with an absolute value of moneyness less than 0.15 [34]. The filtered data include 37795 observations, i.e., 17503 call options and 20292 put options, respectively, covering 846 days of call options and 847 days of put options. The average number of daily call and put options is 45.
Statistical analysis of implied volatility surface
According to the range of moneyness, options can be divided into five categories: 0 < M < 0.03, 0.03 ≤ M < 0.06, 0.06 ≤ M < 0.09, 0.09 ≤ M < 0.11, and 0.11 ≤ M < 0.15 for call options, while, -0.15 < M < -0.11, -0.11 ≤ M < -0.09, -0.09 ≤ M < -0.06, -0.06 ≤ M < -0.03, and -0.03 ≤ M < 0 for put options. Meanwhile, the options are divided into three categories by the date to maturity (DTM), which are τ < 60, 60 ≤ τ < 180, and τ ≥ 180. The descriptive statistics of implied volatilities in different categories of moneyness and maturity for the call and put options are presented in Tables 2 and 3, respectively.
Table 2 shows the characteristics of IVS for call options. In the maturity dimension, the implied volatility of short-term (less than 60 days) options is larger than the medium-term (between 60 and 180 days) options and long-term (more than 180 days) options. In the dimension of moneyness, the closer to the deep-out-of-the-money options, that is, the larger the value of M is, the larger the implied volatility is. While the closer to the at-the-money options, the smaller the implied volatility is. As shown in Table 3, the characteristics of IVS in put options are similar to call options, whether in the dimension of maturity or the dimension of moneyness.
Descriptive statistics for IVS of call options by moneyness and maturity
Descriptive statistics for IVS of call options by moneyness and maturity
Descriptive statistics for IVS of put options by moneyness and maturity

Average IVS for (a) call options and (b) put options.
Figure 2 presents the average IVS of call options and put options incorporated by the cubic spline interpolation and polynomial fitting. It is observed that these two IVSs do not show a smile shape in the moneyness dimension. For call options, compared with long-term options, the implied volatilities of short-term options decline more sharply with the decrease of moneyness with a larger range of changes on the implied volatility. The variation range of implied volatilities on the skewness of short-term options is larger than that of long-term options for put options. In the dimension of maturity, for call options with a large value of moneyness, the implied volatilities change conspicuously and decrease with the extension of maturity. The implied volatilities of at-the-money options vary little among maturities. For put options, the IVS of out-of-the-money options takes on the shape of “smirk”, meaning that the implied volatilities of both short-term options and long-term options are larger than the medium-term options. The implied volatilities of at-the-money options increase with the extension of the date to maturity.
Descriptive statistics of regression factors in Model (4)
Tables 4 and 5 present the descriptive statistics for the regression factors in Models (4) and (5), respectively. According to Table 4, the curvature in the moneyness dimension poses the largest positive influence on the implied volatility for both call and put options. The interaction of moneyness and maturity on the implied volatility shows a negative effect on call options and a positive effect on put options, that is, the larger the absolute value of the product of moneyness and maturity is, the smaller the implied volatility is. In addition, moneyness presents a positive effect on call options and a negative effect on put options. As shown in Table 5, the effects of maturity become negative for call and put options with the additional consideration of maturity squared.
Descriptive statistics of regression factors in Model (5)
Descriptive statistics of regression factors in Model (5)
The values of BIC when the number of states varies from 2 to 6

Time distribution for states on regression factors in (a) Model (4) and (b) Model (5) of call options (state 1: blue; state 2: red; state 3: yellow; state 4: green).
This section employs the HMM to elaborate on the information of “hidden states” generated from the latent factors in Models (4) and (5) for both call and put options.
Fitting results on latent factors for call options
To build an HMM model, it is required to select the number of hidden states. In the experiments, the number of hidden states is determined by the Bayesian information criterion (BIC) [35]:
The mean vector and covariance matrix of the regression factors in Model (4) under each state for call options
The mean vector and covariance matrix of the regression factors in Model (5) under each state for call options
For Shanghai 50ETF call options, according to the regression factors in Models (4) and (5), the obtained states of daily IVS are presented in Fig. 3, where the corresponding optimal values of the mean vector and covariance matrix given by the Baum-Welch algorithm are presented in Tables 7 and 8. As shown in Fig. 3 and Tables 7 and 8, it can be observed as follows:
For state 1 in Models (4) and (5), the implied volatilities are significantly positively affected by the curvature of moneyness, which exist at the beginning of 2018 and almost the first half of 2021. The existence may be caused by the adjustment to the strike price and the fluctuation of the RMB exchange rate against the dollar, respectively. The second half of 2018 is mainly occupied by state 2 in Model (4) and state 4 in Model (5). For these two states, the maturity and moneyness have slight effects on the implied volatility. Consequently, the implied volatilities in this period are more stable than those in other periods. Especially, state 2 in Model (4) is the only state where the implied volatility is negatively correlated with maturity. For state 3 in Model (4) and state 2 in Model (5), the impacts of the slope and curvature of moneyness on the implied volatilities are similar, and the interaction of moneyness and maturity has a large negative effect on the implied volatilities. These two states appear frequently during the period from late 2019 to the first half of 2021, possibly due to the impact of COVID-19 on the global economy. For state 4 in Model (4) and state 3 in Model (5), the implied volatility is significantly affected by the positive effect from the curvature of moneyness. In addition, state 3 in Model (5) covers almost every month from the second half of 2018 to the first half of 2019, but no existence in 2020.
In the constructed HMM, the initial state distributions for Models (4) and (5) of call options are
From these two matrices, the probability of each state maintaining its own state is much larger than the transition to others, among which the state whose implied volatilities are affected slightly by the maturity and moneyness (state 2 in Model (4) and state 4 in Model (5)) has the largest probability to stay its own state. The state whose implied volatilities are significantly affected by the positive effect from the curvature of moneyness (state 4 in Model (4) and state 3 in Model (5)) is easy to transfer to the state whose implied volatilities are affected slightly by the maturity and moneyness (state 2 in Model (4) and state 4 in Model (5)).
In addition, some differences exist in the transition matrices for Models (4) and (5):
In Model (4), the state whose implied volatilities are affected slightly by the maturity and moneyness (state 2), and the state whose implied volatilities are deeply affected by the interaction of moneyness and maturity (state 3) most probably only transfer to the state whose implied volatilities are all most positively affected by the curvature of moneyness (state 1). In Model (5), the state whose implied volatilities are deeply affected by the interaction of moneyness and maturity (state 2) is highly unlikely to be transferred from the state whose implied volatilities are affected slightly by the maturity and moneyness (state 4), but highly possible to only transition to the state whose implied volatilities are all most positively affected by the curvature of moneyness (state 1).
In a similar manner, for Models (4) and (5), the obtained state distributions of daily IVS are shown in Fig. 4, and the corresponding statistical distributions of the regression factors in each state are shown in Tables 9 and 10. In addition, the initial state distributions are

Time distribution for states on regression factors in (a) Model (4) and (b) Model (5) of put options (state 1: blue; state 2: red; state 3: yellow; state 4: green).
The mean vector and covariance matrix of the regression factors in Model (4) under each state for put options
In contrast with call options, the interaction from maturity and moneyness has a much larger impact on the IVS of put options. The states in put options with the same state number from Models (4) and (5) have similarities in distribution and transition:
The state whose implied volatilities are slightly affected by the slope and curvature of maturity (state 2) has the smallest probability to maintain itself. From the first half of 2018 to the first half of 2019, and from July 2020 to September 2020, it is easy to transfer to and from the state whose implied volatilities are slightly affected by the slope of maturity and moneyness (state 4), which indicates the existence of the slight fluctuation during this period. From May 2019 to August 2019, and from September 2020 to June 2020, it is easy to transfer to the state whose implied volatilities are more affected by maturity than others (state 1), which indicates maturity has a strong impact on the implied volatility during this period. The state that the implied volatilities are deeply affected by maturity and moneyness, especially the curvature in the dimension of moneyness (state 3), has the largest probability to maintain itself, which occurs in January 2018, and from October 2019 to January 2020. This may be due to the release of new rules in strikes for the Shanghai 50ETF options at the beginning of 2018 and the emergence of COVID-19 at the end of 2019. Beyond that, it only transfers to the state whose IVS is slightly affected by maturity (state 2).
The mean vector and covariance matrix of the regression factors in Model (5) under each state for put options
The evaluation of predictive ability
This section further examines the performance of HMMs for the implied volatility prediction. The sample set is divided into training data and testing data. The total length of the time series is 846 in call options and 847 in put options, where the data from January 2, 2018, to May 31, 2021, are treated as the training set, and the remaining data in the last month, i.e., from June 1, 2021, to June 30, 2021, are treated as the testing set.
With the state sequence obtained in Section 5, we can calculate the latent factors utilized in Models (4) and (5) by predicting the states, which can further realize the forecast of implied volatility. Following this process, the prediction results of the latent factors and implied volatility are obtained, and meanwhile, the corresponding states to describe the characteristics of the predicted IVS are also achieved. In the following experiments, the criteria used to test the accuracy of prediction include Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) (given in percentage):
For the comparison purpose, the autoregressive integrated moving average (ARIMA) model, the vector auto-regression (VAR) model, and the support vector regression (SVR) model are adopted to be our benchmarks. Table 11 shows the results of RMSE, MAE, and MAPE of ARIMAs, VARs, SVRs, and HMMs, where the corresponding values of mean value and standard deviation for SVRs and HMMs are also presented. We can observe that the HMM has a better predictive capability than other models in the prediction of implied volatilities.
In this study, we employ the regime-switching framework of the hidden Markov model (HMM) to model the latent states of the Shanghai 50ETF options. First, we fit the cross-sectional models (4) and (5) for implied volatility surface (IVS), and the regression factors estimated by OLS are treated as the reference of IVS. Then, the HMM is built to analyze the estimated coefficients to further extract the characteristics, the time-dependent structure, and the transition mechanism existing in the IVS. The empirical results indicate that the latent factors of IVS are successfully captured, and the achieved structure reflects different financial characteristics, where some of their typical features and transition are associated with certain events. In addition, the HMM performs better in the implied volatility prediction compared with other benchmark prediction models.
The proxies of IVS used in this study are the latent factors incorporating the impact of option maturity and moneyness. Some other factors, such as dividends, exchange rates, and liquidity costs could be considered in future research. In addition, the first-order HMM used in this study could be extended to second-order or even higher-order models in future works.
