Fuzzy support vector regression model for forecasting stock market volatility

Abstract

Stock market volatility exhibits characteristics such as clustering and time-varying fluctuations. This paper proposes a two-stage method for addressing these concerns. The involved procedure is as follows: First, a fuzzy system is used to analyze clustering regimes according to the size of fluctuations. Second, the clustering regimes of Stage I are used to establish a support vector regression (SVR) model, which is used to reduce the time-varying complexity. However, the fuzzy-SVR model combines the parameters of membership functions and SVR models, further complicating the problem. Thus, this paper presents parallel research based on a genetic algorithm (GA) for estimating the parameters of the membership functions and SVR model. Data from four stock markets—the Taiwan Stock Exchange weighted stock index (Taiwan), the NASDAQ Composite index, the Hang Seng index (Hong Kong), and the Shanghai Composite index (Shanghai)—were analyzed in this study to illustrate the performance of the proposed model. According to the simulation results, the forecasting of out-of-sample volatility performance was significantly improved when the model accounted for the behavioral effect of both clustering and time-varying fluctuations.

Keywords

Support vector regression forecasting volatility fuzzy system genetic algorithm clustering

1 Introduction

Volatility is a measure of risk in financial markets [22 , 38] and is known to be a major financial tool for describing stock markets. Although the autoregressive conditional heteroscedasticity (ARCH) model, developed by Engle in 1982 [8], has been proven useful in several economic applications, some problems continue to affect its forecasting accuracy. Therefore, using the ARCH model is limited by the fixed lag structure of the conditional variance. Bollerslev used the autoregressive moving average to expand the ARCH model, creating the generalized autoregressive conditional heteroscedasticity (GARCH) model [4], which uses volatility clustering and the fat tail features of financial time series to estimate volatility.

Volatility clustering is a common feature of various financial series and refers to the phenomenon in which large changes tend to be followed by large changes and small changes tend to be followed by small changes. Based on the weighted average of the past squared residuals, the GARCH model features declining weights that never entirely achieve zero. Therefore, the leverage effect that frequently appears in financial markets decreases the forecasting accuracy of the GARCH model. The leverage effect indicates that volatility responds differently to positive and negative information, and negative returns of financial assets tend to result in greater volatility. Over the past few years, several studies have developed many improved versions of the GARCH model to resolve this problem; for example, the exponential GARCH (EGARCH) model was introduced by Nelson [34], the fuzzy switch GARCH model has been proposed in [2, 18], and the Glosten–Jagannathan–Runkle GARCH (GJR-GARCH) model was introduced by Glosten et al. [13]. These models can capture the leverage effect in financial markets; however, they do not simulate the time-varying fluctuations according to the volatility of stock markets effectively.

To overcome the time-varying characteristics of volatility, many studies have proposed using robust methods, such as neural networks and support vector regression (SVR), for financial forecasting [12 , 42]. Various volatility forecasting methods are combined through simple averaging based on the artificial neural network by Xiao et al. [41]. The volatility of foreign exchange data are forecast using the support vector machine (SVM) proposed by Gavrishchaka and Ganguli [11]. A novel neural network technique with an SVM is used in financial time-series forecasting [37]. The Bayesian framework is combined with least squares SVMs in the nonlinear regression method by Gestel [12]. These methods can efficiently capture volatility under small fluctuations; however, they degrade capture volatility when the fluctuations are large.

SVR, a variation of the SVM proposed by Vapnik et al. [39], is a powerful machine-learning method based on statistical learning theory. The method involves a structural risk minimization principle, instead of the usual empirical risk, which aims at minimizing a bound on the generalization error of a model, rather than minimizing only the mean square error over the data set [16]. Thus, SVR is useful for constructing data-driven nonlinear empirical process models and is effective in forecasting the time-varying characteristics of volatility [42]; however, SVR loses its efficiency by overfitting when volatility fluctuates greatly. The SVR model describes random errors or noise instead of the underlying relationship. Therefore, this paper proposes a fuzzy-SVR model, in which a fuzzy system is used, for analyzing the clustering regimes on the basis of the size of fluctuations; the robust characteristics of SVR are combined to forecast the volatility of stock markets.

In recent decades, fuzzy systems have been extensively applied in a wide variety of industrial systems because of their model-free approach [1 , 43] and the excellent approximation ability of fuzzy systems [10, 26]. In generally, fuzzy systems are universal approximators to uncertain nonlinear systems (i.e., they can approximate any behavior related to complexity dynamics within a predefined range of desired accuracy), such as nonlinear discrete-time systems with backlash [27] and multiinput multioutput nonlinear systems [28, 29] by an adaptive fuzzy controller to approach the desired control performance. The fuzzy systems combine the ease of implementation and the convenience of linear models with the ability to capture and approximate a wide range of functions. This paper proposes fuzzy systems as a judicious choice for analyzing the size of fluctuations that feature time-dependent variances. In the proposed method—the fuzzy-SVR model—SVR models and fuzzy systems are combined and applied to forecast the volatility of stock markets. The process of optimizing the parameters of fuzzy systems and SVR models is highly complex and nonlinear. Therefore, a genetic algorithm (GA)-based parameter estimation algorithm is proposed for deriving the optimal solution for the fuzzy-SVR model.

The GA is a parallel search method for obtaining the global optimal solution of complex optimization problems that emulate natural genetic operations such as reproduction, crossover, and mutation [7, 14]. The GA applies operations to a population of binary strings that represent potential solutions. At each generation, the GA explores different areas of potential solutions and then directs the search to the region in which a high probability of determining improved performance exists. Because the GA simultaneously evaluates many points in a parameter space, it can ultimately converge on the global solution. In particular, the algorithm can iterate several times on each data point. Accordingly, it is suitable for addressing the parameter problem of the fuzzy-SVR model.

The remainder of this paper is organized as follows. Section 2 describes the SVR and fuzzy-SVR models. Section 3 presents the GA-based optimization of the fuzzy-SVR model and an adaptive forecasting algorithm. The experimental results that illustrate the effectiveness of the proposed method are explained in Section 4. Finally, Section 5 presents the conclusion.

2 Fuzzy-SVR model

The empirical and theoretical show the fluctuations of finance market volatility is asymmetries, general falling stocks to returns give rise to higher volatility than do equivalent rising stocks to returns [13]. Moreover, the volatility clustering characteristic that large changes tend to follow large changes and small changes tend the follow small changes [13]. In this study, we used the SVR model to capture the clustering characteristic of volatility. The SVR model involves obtaining widespread acceptance in data-driven nonlinear modeling applications [15]. The SVR model is an extraction of the training data that serves as a support vector and therefore represents a stable characteristic of the data. Moreover, it entails a structural risk minimization principle—instead of the usual empirical risk—that aims at minimizing a bound on the generalization error of a model, rather than minimizing only the mean square error over the data set [30]. Thus, SVR is useful for forecasting time-varying clustering data [37]. To consider the differential effects of the propagation of volatility caused by a rising or falling stock market, we applied fuzzy systems to address the fluctuation asymmetries by falling/rising stock to returns. The general expert information can represent the fuzzy terms (e.g., low, high); this representation may be used for convenience because of a lack of more precise knowledge. In this study, we used the IF-THEN rules of fuzzy systems to appropriately simulate the fluctuation asymmetries of the stock market. The following subsections describe the proposed method indetail.

2.1 SVR model

Consider an in-sample data set {x (1) , y (1) , …, x (N) , y (N)}, where x (i) ∈ R^D, y (i) ∈ R, x (i) is the input attribute value of the ith in-sample data in D dimensional space, y (i) represents the stock market volatility ith in-sample data [36], and N is the size of the in-sample data. The linear regression function can be expressed as f (x) = w^Tx + b, where w is normal to the hyperplane and b is a constant. Methods involving the SVR [3, 5] model have been developed for determining a function f (x) with a maximal ε deviation from the actually obtained targets y (i) for all the in-sample data while being as flat as possible. In general, the input in-sample data are delivered to a nonlinear map from the input space to the output space, and the input data are mapped on a higher dimensional feature space through this mapping. The estimate function f (x) is formulated as follows: $f (x) = \sum_{i = 1}^{N} w^{T} φ (x (i)) + b,$ (1) where f (x) is the flatness of the function (i.e., one seeks a small weight vector w and φ (•) is the nonlinear mapping function). The SVR involves minimizing the following risk function:

$\begin{matrix} min_{w, b, ξ_{i}, ξ_{i}^{*}} \frac{1}{2} w^{T} w + C \sum_{i = 1}^{N} (ξ_{i} + ξ_{i}^{*}) \\ subject to {\begin{matrix} y (i) - (w^{T} φ (x (i)) + b) \underline{\leq} ɛ + ξ_{i} \\ (w^{T} φ (x (i)) + b) - y (i) \underline{\leq} ɛ + ξ_{i}^{*} \\ ξ_{i}, ξ_{i}^{*} \underline{\geq} 0, \forall i = 1, 2, \dots, N \end{matrix}, \end{matrix}$ (2) where ɛ is the error tolerance, ξ_i and $ξ_{i}^{*}$ are the slack variables that respectively specify the upper and lower training errors subject to an error tolerance ɛ, and C is the cost function of the trade-off between the flatness of f (x) and the training error. The constraints in Equation (2) indicate that most of the input data x (i) are placed inside the tube ɛ. If x (i) is outside the tube named the support vector, then a ξ_i or $ξ_{i}^{*}$ error exists (in this study, $ξ_{i}^{*}$ was primarily minimized in the objective function). The characteristics of the slack variables ξ_i and $ξ_{i}^{*}$ , error tolerance ɛ, and support vectors are shown in Fig. 1.

Equation (2) is a quadratic optimization problem with inequality constraints; according to the Karush–Kuhn–Tucker optimality conditions [39], the SVR training procedure amounts can be used to solve the convex quadratic problem by using the Lagrange multiplier. Thus, Equation (2) can be translated into a dual problem as follows:

$\begin{matrix} max_{α, α^{*}} - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} (α_{i} - α_{i}^{*}) (α_{j} - α_{j}^{*}) K (x (i), x (j)) \\ - ɛ \sum_{i = 1}^{N} (α_{i} - α_{i}^{*}) + \sum_{i = 1}^{N} y (i) (α_{i} - α_{i}^{*}) \\ subject to {\begin{matrix} \sum_{i = 1}^{N} (α_{i} - α_{i}^{*}) = 0 \\ 0 \leq α_{i}, α_{i}^{*} \leq C \end{matrix} \end{matrix}$ (3) where $α, α_{i}^{*}$ are Lagrange multipliers, samples with $α_{i} - α_{i}^{*} \neq 0$ are support vectors, and K (x (i) , x (j)) is the kernel function of the dot product 〈φ (x (i))· φ (x (j)) 〉 that facilitates inputting training data implicitly into a feature space and training the SVR in that space without representing the feature vectors explicitly [3, 5]. All kernel functions must satisfy Mercer’s condition [33]. Typically, the commonly used kernel functions include linear kernels, polynomial kernels, and radial basis function (RBF) kernels [25]. In this study, the RBF was adopted and formulated as $K (x (i), x) = \exp (- \frac{{∥ x (i) - x ∥}^{2}}{σ^{2}}),$ (4) where σ is the kernel bandwidth parameter. By solving the dual problem expressed in Equation (3), the optimal SVR model depends on a setting that is mainly determined by C, ɛ, and σ. In general, the optimization problem is a highly nonlinear function, and many local extreme values may exist. Therefore, the GA was used in this study to specify C, ɛ, and σ and solve the SVR optimization problem.

2.2 Fuzzy-SVR model

Fuzzy systems are universal approximators [26, 43] that can approximate the behavior of a system in which analytic functions or numerical relations do not exist. The empirical and theoretical appeal of the SVR model is due to it minimizing the regression error based on the structural risk minimization principle and capturing the small fluctuations of time-varying data. However, the model uses the insensitive error tolerance that contains a fixed and symmetrical margin, thus failing to accommodate the large fluctuations and sign asymmetries: Negative shocks to returns give rise to higher volatility than do equivalent positive shocks to returns. Ignoring this fact can lead to poor prognostic characteristics. According to Fama [9], stock market volatility exhibits the property that large changes tend to follow large changes and small changes tend to follow small changes. To account for the differential effects of the fluctuation size of stock market volatility, this study incorporated fuzzy systems into the proposed SVR model. The resulting fuzzy model is described using the IF-THEN rules and is used to ensure that the SVR model appropriately addresses the problem of forecasting stock market volatility. The basic configuration of fuzzy systems consists of three components: a fuzzy rule base, a fuzzy inference engine, and a defuzzifier. The fuzzy rule base consists of a collection of IF-THEN rules [18, 43]. The lth rule of the fuzzy system for SVR is described as follows:

$\begin{matrix} {Rule}^{(l)} : \\ IF z_{1} (i) is F_{l 1} and \dots and z_{n} (i) is F_{\ln}, THEN \\ f (x) = \sum_{i = 1}^{N} (α_{i, l} - α_{i, l}^{*}) K_{l} (x (i), x) + b_{l}, \\ for l = 1, 2, \dots, L, \end{matrix}$ (5) where f (x) is the system output; F_lj for j = 1, …, n and l = 1, …, L is the fuzzy set; L is the number of IF-THEN rules; n is the number of premise variables; and z₁ (i) , z₂ (i) , …, z_n (i) are the premise variables. This study used the stock market volatility property of large changes tending to follow large changes and small changes tending to follow small changes. The GARCH model proposed by Bollerslev [4] entails using prior conditional variances to estimate the degree of volatility transmission. The ability of the GARCH model to explain the transmission of volatility is a crucial advantage. Therefore, in this paper the premise variables, namely the previous values of the realized volatility, are defined as $z_{j} (i) = y (i - j), for j = 1, 2, \dots, n$ (6)

For the functional fuzzy system, defuzzification [18] can be written as $\begin{matrix} f (x) & = & \frac{\sum_{l = 1}^{L} u_{l} (z (i)) [\sum_{i = 1}^{N} (α_{i, l} - α_{i, l}^{*}) K_{l} (x (i), x) + b_{l}]}{\sum_{l = 1}^{L} u_{l} (z (i))} \\ = & \sum_{l = 1}^{L} g_{l} (z (i)) [\sum_{i = 1}^{N} (α_{i, l} - α_{i, l}^{*}) K_{l} (x (i), x) + b_{l}] \end{matrix}$ (7)

and

$\begin{matrix} u_{l} (z (i)) & = & \prod_{j = 1}^{n} F_{lj} (z_{j} (i)) \\ g_{l} (z (i)) & = & \frac{u_{l} (z (i))}{\sum_{k = 1}^{L} u_{k} (z (i))} \\ z (i) & = & {[z_{1} (i), z_{2} (i), \dots, z_{n} (i)]}^{t}, \end{matrix}$ (8) where F_lj (z_j (i)) is the grade of the membership of z_j (i) in F_lj. A Gaussian membership function is used, and u_l (z (i)) can be formulated as

$\begin{matrix} u_{l} (z (i)) & = & \prod_{j = 1}^{n} F_{lj} (z_{j} (i)) \\ = & \prod_{j = 1}^{n} exp (- \frac{1}{2} {(\frac{z_{j} (i) - m_{lj}}{δ_{lj}})}^{2}), \end{matrix}$ (9) where m_lj, δ_lj are respectively the center and spread of the lth rule membership function corresponding to the jth premise variable. This formula clearly shows the use of the product to represent the premise. $\sum_{l = 1}^{L} u_{l} (z (i)) > 0 for l = 1, 2, \dots, L$ (10)

Thus, the following expression [18] is obtained: $g_{l} (z (i)) \geq 0 for l = 1, 2, \dots, L and \sum_{l = 1}^{L} g_{l} (z (i)) = 1$ (11)

According to Equations (5)–(11), the system output can be expressed as

$\begin{matrix} f (x) & = & \frac{\sum_{l = 1}^{L} u_{l} (z (i)) [\sum_{i = 1}^{N} (α_{i, l} - α_{i, l}^{*}) K_{l} (x (i), x) + b_{l}]}{\sum_{l = 1}^{L} u_{l} (z (i))} \\ = & \frac{\sum_{l = 1}^{L} \prod_{j = 1}^{n} exp (- \frac{1}{2} {(\frac{z_{j} (i) - m_{lj}}{δ_{lj}})}^{2}) [\sum_{i = 1}^{N} (α_{i, l} - α_{i, l}^{*}) K_{l} (x (i), x) + b_{l}]}{\sum_{l = 1}^{L} \prod_{j = 1}^{n} exp (- \frac{1}{2} {(\frac{z_{j} (i) - m_{lj}}{δ_{lj}})}^{2})} \end{matrix}$ (12)

A reasonable success criterion is the minimization of the mean square error, the objective function of which is defined as $E = \sum_{i = 1}^{N} {[y (i) - f (x (i))]}^{2},$ (13) where y (i) for i = 1, 2, …, N represents target values of the stock market volatility, N is the number of in-sample data, and f (x) is the output of the fuzzy-GARCH model in Equation (12). Apparently, E is a highly nonlinear function of the fuzzy system that determines the center and spread of m_lj and δ_lj for l = 1, …, L and j = 1, …, n, respectively, and the SVR primarily determines C_l, ɛ_l, and σ_l for l = 1, …, L. This function may have several local minima. Locating the global minimum of E in Equation (13) by using conventional methods is typically extremely difficult. Therefore, in this study, the GA was used to specify the center and spread of m_lj and δ_lj, and the SVR primarily determined parameters C_l, ɛ_l, and σ_l to solve the fuzzy-SVR problem in Equation (13).

3 GA-based fuzzy-SVR model and volatility forecasting

3.1 GA-based fuzzy-SVR model

The GA is composed of probabilistic heuristic search processes based on natural genetic systems. It is highly parallel in searching for the global optimal solution of complex optimization problems [24]. The range in the daily return of the stock market is generally approximately –10% to 10% hence, the realized volatility in this paper is defined as $y (t) = \frac{1}{5} \sum_{i = 0}^{4} r^{2} (t - i)$ , where r (i) is the daily return of the stock market price at time i. Therefore, the parameter range is clearly bounded by the fuzzy-SVR model, as shown in Table 1. We use a GA to solve the fuzzy-SVR problem in Equation (12). This method can explore several points in the search space simultaneously, thereby increasing the opportunity to locate new points in the search space with an expected improvement in performance.

The GA is composed of three operations: 1) selection, 2) crossover, and 3) mutation. These operations are implemented by performing the basic tasks of copying strings, exchanging portions of strings, and changing the state of bits from 1’s to 0’s or vice versa. These operations ensure that the “fittest” members of the population survive, and their information is preserved and combined to generate “fitter” offspring. The result is an improvement in the next generation’s performance. The GA is described in the following subsections.

3.1.1 Coding

The GA featuring a population of strings and binary coding was used in this study. According to the binary coding method, each parameter of the membership functions (m_lj, δ_lj) and SVR-determined parameters (C_l, ɛ_l, σ_l) have their own string length, which consists of 0’s and 1’s. The choice of the bit number B_i for each parameter depends on the desired resolution R_i, which is calculated as $R_{i} = \frac{U_{i} - L_{i}}{2^{B_{i}} - 1},$ (14) where U_i and L_i are respectively the upper and lower bounds of the parameter. In this study, the fuzzy-SVR model parameter was coded as the following binary string:

$\begin{matrix} B_{1} \dots B_{L \times n} B_{L \times n + 1} \dots B_{2 L \times n} B_{2 L \times n + 1} \dots B_{(2 n + 1) L} \\ \underset{︸}{\begin{matrix} \leftrightarrow \\ 1 \dots 11 \end{matrix}} \dots \underset{︸}{\begin{matrix} \leftrightarrow \\ 1 \dots 10 \end{matrix}} \underset{︸}{\begin{matrix} \leftrightarrow \\ 0 \dots 11 \end{matrix}} \dots \underset{︸}{\begin{matrix} \leftrightarrow \\ 1 \dots 00 \end{matrix}} \underset{︸}{\begin{matrix} \leftrightarrow \\ 1 \dots 11 \end{matrix}} \dots \underset{︸}{\begin{matrix} \leftrightarrow \\ 0 \dots 10 \end{matrix}} \\ m_{11} \dots m_{Ln} δ_{11} \dots δ_{Ln} C_{1} \dots C_{L} \\ B_{(2 n + 1) L + 1} \dots B_{(2 n + 2) L} B_{(2 n + 2) L + 1} \dots B_{(2 n + 3) \times L} \\ \underset{︸}{\begin{matrix} \leftrightarrow \\ 1 \dots 10 \end{matrix}} \dots \underset{︸}{\begin{matrix} \leftrightarrow \\ 1 \dots 10 \end{matrix}} \underset{︸}{\begin{matrix} \leftrightarrow \\ 1 \dots 10 \end{matrix}} \dots \underset{︸}{\begin{matrix} \leftrightarrow \\ 1 \dots 00 \end{matrix}} \\ ɛ_{1} \dots ɛ_{L} σ_{1} \dots σ_{L} \end{matrix}$ (15)

3.1.2 Fitness function

The degree of fitness depends on the performance of the possible solution represented by that particular string. In this study’s design problem, locating the minimum in Equation (13) is equivalent to obtaining a maximal fitness value by using the genetic search process. A chromosome that has a lower objection function should be assigned a higher fitness value. A simple linear relationship between a fitness function and an objective function is expressed as

$\begin{matrix} Fit = mE + h, m = \frac{{Fit}_{best} - {Fit}_{worst}}{E_{smallest} - E_{l arg est}}, \\ h = {Fit}_{best} - {mE}_{smallest}, \end{matrix}$ (16) where Fit is the fitness function, and E_smallest and E_largest are the lowest and highest objective function values, respectively. Fit_best and Fit_worst are the corresponding fitness values.

3.1.3 Selection

Selection, based on the principle of survival of the fittest, is a process by which individual strings are copied and placed in a mating pool for further genetic operations consistent with their fitness value. The probability PR (j) of the jth string with a fitness value Fit (j) selected for mating and reproduction in the next generation is $PR (j) = \frac{Fit (j)}{\sum_{i = 1}^{Q} Fit (i)},$ (17) where Q is the population size specified by [7]. Once the strings are reproduced or copied for possible use in the next generation in a mating pool, they wait for the action of the other two operations: crossover and mutation.

3.1.4 Crossover

Crossover is the primary exchange of information for a chromosome. This study used the one-point crossover method [7], which is conducted in three steps. First, the two newly reproduced strings are selected from the mating pool created through selection in the previous generation. Second, a position including the two strings is selected at random. The third step involves exchanging all the characters by following the crossing sit. The crossover operation occurs only with a probability p_c (crossover probability). The choice of p_c is known to critically affect the performance of a GA. In general, the value of p_c ranges 0.5 to 1.0. When combined with selection, crossover is an effective means of exchanging information and combining various elements of high-quality solutions.

3.1.5 Mutation

Selection and crossover yield most of the processing power of GAs. However, mutation, the third operation, enhances the ability of GAs to search for the optimal solution. Mutation is the occasional flip of each bit at a particular string position with a low probability of a chromosome from 1 to 0, or vice versa. The mutation operation is used to change some elements in selected individuals with a probability p_m (the mutation probability). In general, the p_m should be used sparingly because it is a random search with a high mutation probability.

3.2 Volatility forecasting

3.2.1 Adaptive volatility forecasting

The power of a model in forecasting volatility is crucial because volatility is a measure of risk in financial markets. A common approach involves using in-sample data to construct a model, and then making one-step-ahead predictions to obtain the future solution $\hat{y} (N + 1)$ [35 , 45]. Realized volatility [31] was used to estimate the performance of forecasting volatility. For improved forecasting results, in this study, the recursive least squares (RLS) [35] error formula was used for forecasting volatility. First, the forecasting objective function is defined in this paper as $E_{f} = \sum_{t = N + 1}^{N + M} {λ^{t - N} [y (t) - \hat{y} (t)]}^{2},$ (18) where y (t) is the realized volatility, defined as $y (t) = \frac{1}{5} \sum_{i = 0}^{4} r^{2} (t - i)$ ; r (i) ={ log P (i) - log P (i - 1) } × 100 %; P (i) is the daily closing price at time i; r (i) is the daily stock return on investment at time; λ ∈ (0, 1] is a forgetting factor; and M is the amount of out-of-sample data. According to Equations (12, 18) can be expressed as

$\begin{matrix} E_{f} & = & \sum_{t = N + 1}^{N + M} λ^{t - N} {y (t) - \sum_{l = 1}^{L} g_{l} (z (t)) \\ {\sum_{i = 1}^{N} (α_{i, l} - α_{i, l}^{*}) K_{l} (x (i), x) + b_{l}}}^{2} \\ = & \sum_{t = N + 1}^{N + M} λ^{t - N} {y (t) - Λ^{t} (t) φ (N)}^{2} \end{matrix}$ (19) where $\begin{matrix} Λ (t) & = & {[\begin{matrix} g_{1} (z (t)) & \dots & g_{L} (z (t)) \end{matrix}]}^{t} \\ φ (N) & = & [\sum_{i = 1}^{N} (α_{i, 1} - α_{i, 1}^{*}) K_{1} (x (i), x) + b_{1} \dots \end{matrix}$ (20) ${\sum_{i = 1}^{N} (α_{i, L} - α_{i, L}^{*}) K_{L} (x (i), x) + b_{L}]}^{t} .$

The RLS of the following formulas are expressed as $\begin{matrix} P (t) & = & \frac{1}{λ} {P (t - 1) - P (t - 1) φ (t) (λ + φ^{t} (t) \\ P (t - 1) φ (t))^{- 1} φ^{t} (t) P (t - 1)} \end{matrix}$ (21) $\begin{matrix} Λ (t) & = & Λ (t - 1) + P (t) φ (t) (r^{2} (t) - Λ^{t} (t - 1) φ (t)) \end{matrix}$ (22) where t = N + 1, …, N + M represents the out-of-sample data and the initialized matrix P (t) is ϑI, where ϑ is a low positive constant, and I is an identity matrix.

3.2.2 Forecasting performance evaluation

The mean square error measure of Equation (13) is used to derive the forecasting model in the in-sample data process; however, it cannot be used alone as a conclusive measure for comparing different forecasting models [38]. Therefore, various statistics have been used to compare forecast errors: these include the mean square forecast error (MSFE), mean absolute forecast error (MAFE), and mean percentage forecast error (MPFE). $\begin{matrix} MSFE & = & \frac{1}{M} \sum_{t = N + 1}^{N + M} (y (t) - \hat{y} (t))^{2} \\ MAFE & = & \frac{1}{M} \sum_{t = N + 1}^{N + M} | y (t) - \hat{y} (t) |, \end{matrix}$ (23) $MPFE = \frac{1}{M} \sum_{t = N + 11}^{N + M} \frac{| y (t) - \hat{y} (t) |}{y (t)}$ where y (t) is the realized volatility at the forecasting period t and $\hat{y} (t)$ is the forecast volatility at the forecasting period t.

3.3 Design procedure

According to the aforementioned analysis, the design procedure for the fuzzy-SVR model applied to forecast stock market volatility by using GA is divided into the following steps:

Step 0: Input N in-sample data and M out-of-sample.

Step 1: Generate a random population of Q chromosomes.

Step 2: Decode each string into the corresponding parameter vector.

Step 3: Build the fuzzy-SVR model in Equation (12).

Step 4: Calculate the fitness values according to Equation (16).

Step 5: Keep the optimal chromosomes intact for the next generation.

Step 6: Use selection, crossover, and mutation to generate new chromosomes in the next generation.

Step 7: Repeat the procedure from Steps 2 to 6 until a suitable parameter set is obtained.

Step 8: Given the initial data φ (N) in Equation (20), use the parameter set from Step 7, and P (N) = ϑI, where ϑ is a small positive constant and I is the identity matrix.

Step 9: Repeat the RLS algorithm to forecast the volatility in Equations (21) and (22) until the iteration returns M out-of-sample data.

Figure 2 shows the pseudocode of the fuzzy-SVR model, and Fig. 3 shows a flowchart of the design procedure for the fuzzy-SVR model.

4 Simulation

The data consisted of daily closing values for four stock indices from January 1, 2010, to December 31, 2013. This study focused on the Taiwan Stock Exchange weighted stock index (Taiwan), the NASDAQ Composite index, the Hang Seng index (Hong Kong), and the Shanghai Composite index (Shanghai) to illustrate the performance of the proposed method. The results were compared with those of other models, including the GARCH [4], GJR-CARCH [13], EGARCH [34], dynamic evolving neural-fuzzy inference system [21], SVR based on the GA [5], and various forecasting methods with multilayer perceptron (Combing MP) [41]. All the methods were implemented in Matlab and evaluated on an Asus desktop PC with a 3.4 GHz i7-4770 CPU, RAM 16G, and Windows 8. The data were divided into two parts: the first half (January 1, 2010–December 31, 2012) of the sample comprised the in-sample data of the training set (N = 748, 762, 750, 716 of Taiwan, NASDAQ, Hong Kong, and Shanghai, respectively) and the second part (January 1, 2013–December 31, 2013) out-of-sample data of the test set (M = 240, 259, 243, 254 of Taiwan, NASDAQ, Hong Kong, and Shanghai, respectively). The GA optimizes parameters of the SVR model and proposed model. Practically, the generalization capability and accuracy of the SVR model and proposed model are determined according to searched problem parameters. In this study, the simulated results were obtained by averaging the results of 20 independent Monte Carlo (MC) runs to train the SVR model and proposed model for deriving a more reliable result [15]. For time-series forecasting, a cross-validation statistic was obtained using in-sample data to construct a model, and one-step-ahead predictions of out-of-sample data were then made to obtain the future solution [18].

Bollerslev [4] indicated that the GARCH (1,1) model accurately describes the volatility of financial data. Therefore, in this study, n = 1. Figure 4 presents histograms of the in-sample period volatility for the four financial markets, showing the concentrated clustering properties with few distribution regimes. The results indicated that volatility was concentrated in regimes with small fluctuations and few value distributions of the larger regime. The forecasting results according to various fuzzy rules are presented in Table 2. The results showed that L = 2 was more satisfactory than L = 3, but the Shanghai index has more distribution regimes than those of the other three financial markets. Therefore, the number of regimes is more than that of the other financial markets; the results are the same as those shown in Fig. 4. Moreover, the L = 3 and L = 2 performance is almost the same as that of the Shanghai index; thus, L = 2 was used. In general, the value of the daily stock return r (t) is between – 10% and 10%. Therefore, the upper and lower bounds of the parameters related to the fuzzy-SVR model are defined as

$\begin{matrix} 0 \leq m_{l 1} \leq 10 %, 0 \leq δ_{l 1} \leq 10 %, e^{- 10} \leq C_{l} \leq e^{10}, \\ e^{- 10} \leq ɛ_{l} \leq e^{5} {, e}^{- 10} \leq σ_{l} \leq e^{5} for l = 1, \dots, L \end{matrix}$ (24)

The GA parameters [7, 24] are defined as

$\begin{matrix} Q = 100, T = 20, p_{c} = 0.9, p_{m} = 0.01, \\ B_{i} = 12 for i = 1, \dots, (2 n + 3) \times L, \end{matrix}$ (25) where Q is the population size, T is the number of generations, p_c is the crossover rate, and p_m is the mutation rate. From the Table 1 show that L = 2 of the fuzzy rules, the fuzzy-SVR model with 10 parameters contains four parameters of the membership function (mean and variance) and six parameters of the SVR model; therefore, the chromosome total length is 120 bits. The simulated results were obtained by averaging 20 independent MC runs.

Table 3 lists the parameter estimates associated with the Taiwan, NASQ, Hong Kong, and Shanghai indices for the proposed method. As shown in Table 3, the two SVR models are distinct. The empirical results show that the volatility of the four markets exhibits characteristics such as clustering and exhibits the leverage effect. Table 4 depicts the various forecast statistics of the models. The results indicate that the volatility forecast from using the asymmetric GARCH models (e.g., the GJR-GARCH and EGARCH models) is superior to that of the GARCH model in the Hong Kong and Shanghai indices. This is because the realized volatility copes with the residual powers of past influence and negative shocks to returns give rise to greater volatility than equivalent positive shocks to returns [32]. Furthermore, the empirical results indicate that the Hong Kong and Shanghai stock market data are asymmetric. The information in Table 4 shows that the performance of the Taiwan, NASDAQ, and Hong Kong markets by SVR based on the GA is superior to that of the others models except for the proposed method because of the constructing data-driven capturing time-varying characteristics and the training model expending considerable time. The characteristic of the SVR is to minimize the generalization error bound for achieving generalized performance, rather than minimizing only the mean square error over the data set [30]. The Shanghai realized volatility distribution regimes is more than those of the other three financial markets. Therefore, the based on the GA performance in MAFE and MSFE of Shanghai is worse than the persistence of volatility models (GARCH, GJR-GARCH, EGARCH) but the MPFE performance is almost better than the persistence of volatility model. The characteristic of SVR is to minimize the generalization error bound for achieving generalized performance, rather than minimizing only the mean square error over the data set [37]. In addition, the MPFE performance of the proposed method is superior to that of other models and the performance of the MAFE and the MSFE was more satisfactory than that of the other models, despite the training model expending considerable time. The proposed model led to an improvement exceeding 30% in the forecasting performance of the MPFE compared with that of the other models under is not taken training time. Therefore, both clustering and adaptive methods were used to capture time-varying data.

The convergence of the objective function E_f by using a GA-based estimator corresponding to a single run is shown in Fig. 5. Note that the cost functions of the GA-based estimator have exponential and rapid convergence at the beginning of generation and converge within eight generations. Figure 6 illustrates the volatility forecasting for the four markets according to the proposed method. The model involved using fuzzy rules to generate two SVR models and an adaptive method to forecast volatility. Therefore, the model could capture market volatility clustering and time-varying characteristics but not large change fluctuations, particularly for the part of Shanghai. Figure 7 shows the forecasting volatility and zoomed part for the Taiwan market obtained using the proposed method and the GA-based SVR model. According to the zoomed part, the proposed method could capture irregular behavior and fluctuations more effectively than the GA-based SVR model could, especially the fluctuations that rapidly change. The empirical results of this study indicated that the stock market data consist of clustering and time-varying fluctuations. The proposed fuzzy-SVR model outperformed the other models, as shown by the various rules concerning the forecast errors.

5 Conclusion

Volatility exhibits clustering and time-varying characteristics; moreover, many complex factors influence volatility. Therefore, this paper proposes a fuzzy-SVR artificial intelligence method. The predictive abilities of the volatility models were examined by comparing the forecast volatility measures defined in Equation (23). The proposed method appeared to be advantageous when we attempted to model both clustering and time-varying volatility. A GA-based design method was used to estimate parameters for the fuzzy-SVR model, and an adaptive algorithm was employed to forecast the volatility of various financial markets. The simulation results indicate that the proposed method afforded substantial improvements in forecasting performance. Moreover, the GA simultaneously evaluates many points in the search space. Therefore, specifying initial conditions is unnecessary for achieving improved results. The main disadvantage of the proposed method is that it cannot effectively capture large fluctuations of market volatility. The persistence models (GARCH, GJR-GARCH, EGARCH) involving conditional mean and conditional variance are frequently used to investigate the volatility in financial time series. They demonstrate the ability to capture the persistence of volatility but do not effectively capture clustering and time-varying characteristics. Future research problems entail designing a multifuzzy-SVR model or combining fuzzy rules to switch the persistence models or the SVR model to manage the varying fluctuation effects of forecasting volatility. Moreover, the optimal of the design model will is complexity problem and many local optimal may exist. In a future study, we will vary the meta-heuristics algorithm to evaluate the performance.

References

Abbasbandy

, and Asady

, Ranking of fuzzy numbers by sign distance, Information Sciences176 (2006), 2405–2416.

Almeida

R.J.

, Baştürk

, Kaymak

, and João

M.C.

, Sousac, Estimation of flexible fuzzy GARCH models for conditional density estimation, Information Sciences267 (2014), 252–266.

Bloch

, Lauer

, Colin

, and Chamaillard

, Support vector regression from simulation data and few experimental samples, Information Sciences178 (2008), 3813–3827.

Bollerslev

, Generalized autoregressive conditional heteroskedasticity, Journal of Econometrics31 (1986), 307–327.

Ceperic

, Ceperic

, and Baric

, A strategy for short-term load forecasting by support vector regression machines, IEEE Transcations on Power Systems28 (2013), 4356–4364.

Chen

S.M.

, and Chang

Y.C.

, Multi-variable fuzzy forecasting based on fuzzy clustering and fuzzy rule interpolation techniques, Information Sciences180 (2010), 4773–4783.

Crefenstee

J.J.

, Optimization of control parameters for genetic algorithms, IEEE Transactions on Systems, Man and Cybernetics16 (1986), 122–128.

Engle

, Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation, Econometrica50 (1982), 987–1007.

Fama

E.F.

, The behavior of stock market price, Journal of Business38 (1965), 34–105.

10.

Gao

, and Liu

Y.J.

, Adaptive fuzzy optimal control using direct heuristic dynamic programming for chaotic discrete-time system, Journal of Vibration and Control (2014), 1–9.

11.

Gavrishchaka

V.V.

, and Ganguli

S.B.

, Volatility forecasting from multiscale and high-dimensional market data, Neurocomputing55 (2003), 285–305.

12.

Van Gestel

, Suykens

J.A.K.

, Baestaens

D.E.

, Lambrechts

, Lanckriet

, Vandaele

, De Moor

, and Vandewalle

, Financial time series prediction using least squares support vector machines within the evidence framework, IEEE Transactions on Neural Networks12 (2001), 809–821.

13.

Glosten

L.R.

, Jagannathan

, and Runkle

D.E.

, On the relation between the expected value and the volatility of the nominal excess return on stocks, Journal of Finance48 (1993), 1779–1801.

14.

Goldberg

D.E.

, Genetic algorithms in search, optimization, and machine learning, MA: Addition Wesley1989.

15.

Gunn

S.R.

, Support Vector Machines for Classification and Regression, University of Southampton, 1998.

16.

Hao

P.Y.

, and Chiang

J.H.

, A fuzzy model of support vector regression machine, International Journal of Fuzzy Systems9 (2007), 45–50.

17.

Huang

C.F.

, A hybrid stock selection model using genetic algorithms and support vector regression, Applied Soft Computing12 (2012), 807–818.

18.

Hung

J.C.

, Adaptive Fuzzy-GARCH model applied to forecasting volatility of stock markets using particle swarm optimum algorithm, Information Sciences181 (2011), 4673–4683.

19.

Hung

K.C.

, and Lin

K.P.

, Long-term business cycle forecasting through a potential intuitionistic fuzzy least-squares support vector regression approach, Information Sciences224 (2013), 37–48.

20.

Kan

L.J.

, Chiu

C.C.

, Lu

C.J.

, and Yang

J.L.

, Integration of nonlinear independent component analysis and support vector regression for stock price forecasting, Neurocomputing99 (2013), 534–542.

21.

Kasabov

N.K.

, and Song

, DENFIS: Dynamic evolving neural-fuzzy inference system and its application for time-series prediction, IEEE Transactions on Fuzzy Systems10 (2002), 144–153.

22.

Klassen

, Improving GARCH Volatility forecasts with regime-switching GARCH, Empirical Economics27 (2002), 363–394.

23.

Kodogiannis

, and Lolis

, Forecasting financial time series using neural network and fuzzy system-based techniques, Neural Computing & Applications11 (2002), 90–102.

24.

Kozek

, Roska

, and Chua

L.O.

, Genetic algorithm for CNN template learning, IEEE Transcations Circuits Syst I40 (1993), 392–402.

25.

Lin

T.C.

, Huang

H.C.

, Liao

B.Y.

, and Pan

J.S.

, An optimized approach on applying genetic algorithm to adaptive cluster validity index, International Journal of Computer Sciences and Engineering Systems1 (2007), 253–257.

26.

Liu

Y.J.

, Gao

, Tong

, and Li

, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete-time systems with dead-zone, IEEE Transactions on Fuzzy Systems24 (2016), 16–28.

27.

Liu

Y.J.

, and Tong

, Adaptive fuzzy control for a class of nonlinear discrete-time systems with backlash, IEEE Transactions on Fuzzy Systems22 (2014), 1359–1365.

28.

Liu

Y.J.

, and Tong

, Adaptive fuzzy identification and control for a class of nonlinear pure-feedback MIMO systems with unknown dead-zones, IEEE Transactions on Fuzzy Systems23 (2015), 1387–1398.

29.

Liu

Y.J.

, and Tong

, Adaptive fuzzy control for a class of unknown nonlinear dynamical systems, Fuzzy Sets and Systems263 (2015), 49–70.

30.

Maciel

, Gomide

, and Ballini

, Enhanced evolving participatory learning fuzzy modeling: An application for asset returns volatility forecasting, Evolving Systems5 (2014), 75–88.

31.

Martens

, and Dijk

D.V.

, Measuring volatility with the realized range, Journal of Econometrics138 (2007), 181–207.

32.

Mercer

, Functions of positive and negative type and their connection with the theory of integral equations, Philosophical Trans of the Royal Society A209 (1909), 415–446.

33.

Merton

R.C.

, On estimating the expected return on the market: An exploratory investigation, Journal of Financial Economics8 (1980), 323–361.

34.

Nelson

, Conditional heteroskedasticity in asset returns: A new approach, Econometrica59 (1991), 347–370.

35.

San

L.M.

, Jose-Revuelta, A new adaptive genetic algorithm for fixed channel assignment, Information Sciences177 (2007), 2655–2278.

36.

Tashman

L.J.

, Out-of-sample tests of forecasting accuracy: An analysis and review, International Journal of Forecasting16 (2000), 437–450.

37.

Tay

F.E.H.

, and Cao

, Application of support vector machines in financial time series forecasting, Omega: The International Journal of Management Science29 (2001), 309–317.

38.

Tsay

R.S.

, Analysis of financial time series. John Wiley & Sons, USA, 2002.

39.

Vapnik

, Golowich

, and Smola

, Support vector method for function approximation, regression estimation, and signal processing, Neural Information Processing Systems9 (1997), 281–287.

40.

Wang

, Huang

, and Wang

, A support vector machine based MSM model for financial short-term volatility forecasting, Neural Computing and Applications22 (2013), 21–28.

41.

Xiao

, Fei

, and Chen

, Forecasting Chinese Stock Markets Volatility Based on Neural Network Combining, pp, Fourth International Conference on Natural Computation IEEE-ICNC (2008), 23–27.

42.

Yang

, Chan

, and King

, Support Vector Machine Regression for Volatile Stock Market Prediction, pp, Intelligent Data Engineering and Automated Learning: IDEAL (2002), 391–396.

43.

Zadeh

L.A.

, Fuzzy set, Information and Control8 (1965), 338–353.

44.

Zheng

, and Chen

B.M.

, Identification of stock market forces in the system adaptation framework, Information Sciences265 (2014), 105–122.

45.

Zhiqiang

, Huaiqing

, and Quan

, Financial time series forecasting using LPP and SVM optimized by PSO, Soft Computing17 (2013), 805–818.