Financial asset yield series forecasting based on risk-neutral fuzzy bilinear regression and probabilistic neural network

Abstract

Application of quantitative methods for forecasting purposes in financial markets has attracted significant attention from researchers and managers in recent years when conventional time series forecasting models can hardly develop the inherent rules of complex nonlinear dynamic financial systems. In this paper, based on the fuzzy technique integrated with the statistical tools and artificial neural network, a new hybrid forecasting system consisting of three stages is constructed to exhibit effectively improved forecasting accuracy of financial asset price. The sum of squared errors is minimized to determine the coefficients in fitting the fuzzy autoregression model stage for formulating sample groups to deal with data containing outliers. Fuzzy bilinear regression model introducing risk view based on quadratic programming algorithm that reflects the properties of both least squares and possibility approaches without expert knowledge is developed in the second stage. The main idea of the model considers the sub-models tracking the possible relations between the spread and the center, also linking the estimation deviation with risk degree of fitness of the model. In the third stage, fuzzy bilinear regression forecasting combining with the optimal architecture of probabilistic neural network classifiers indicates that the proposed method has great contribution to control over-wide interval financial data with a certain confidence level. Statistical validation and performance analysis using historical financial asset yield series on Shanghai Stock Exchange composite index all exhibit the effectiveness and stability of the proposed hybrid forecasting formulation compared with other forecasting methods.

Keywords

Financial asset yield series forecasting fuzzy bilinear regression probabilistic neural network symmetrical triangular fuzzy number risk-neutral

1 Introduction

In recent years, quantitative methods play a vital role in achieving the data fluctuation information in financial markets for forecasting purposes as well as for decisions and investments. Time series models as tools for forecasting in financial markets investigate relations on the past observations of the same variable, which has become increasingly crucial in business practices more than ever before. This modeling is particularly useful when little knowledge is available on the data generating process or when there is no satisfactory model that relates the dependent variable to other variables. Improving time series forecasting accuracy is often a challenging yet difficult task facing forecasters. Many methods to solve these problems have been introduced ranging from linear and nonlinear models to artificial intelligence algorithms over the past forty decades.

One of the most popularly used statistical models is autoregressive integrated moving average(ARIMA) model that has the advantage of accurate short-time forecasting and easily implementing. Statistical models have a wide application and build rapidly, but the results are always inaccurate in forecasting the nonlinearity and high volatility of financial data. Aritificial neural networks(ANNs) are flexible computing frameworks and universal approximators that were developed to overcome some of these limitations and can be applied to a wide range of forecasting problems with a high degree of accuracy. Though ANN models can achieve good results, they have some shortcomings such as a difficulty in parameter determination and computational complexity.

Because of the possible unstable or changing patterns in the financial data, using the hybrid model can overcome the limitations of component models, especially when the models in the ensemble are quite different. Ince and Trafalis [1] proposed a two stage forecasting model which incorporates ARIMA, co-integration with ANN techniques to forecast exchange rate. Ni and Yin [2] described a hybrid model formed by a mixture of various regressive neural network models for modelling foreign exchange rate time series. Patel et al. [3] presented a two stage fusion approach involving support vector regression(SVR) and random forest to predict future values of stock market indices. Rather [4] constituted a robust hybrid model including exponential smoothing(ES) and recurrent neural network for prediction of stock returns. Yang and Lin [5] proposed an approach by combining ARIMA with SVR to aid the financial time series forecasting. An improved hybrid forecasting model is established by Liu and Long [6] to predict the daily stock closing price series based on deep learning network.

However, these models using a single point data may contribute the huge standard deviation and fail to be tolerant of imprecision and approximation. In order to solve the problems, fuzzy techniques based on the statistics have been suggested to deal with problems involving linguistic terms. Watada [7] found an application of fuzzy regression in time series analysis. Fuzzy regression is an extension of conventional regression analysis to fuzzy environments, which can be employed as an efficient and useful tool for analyzing complex systems in fuzzy situations, such as economics, engineering, hydrology, military and education systems. Fuzzy regression techniques are usually classi?ed into two distinct categories, linear programming(LP) method [8] and least squares(LS) method [9], which are not competitive with each other but complementary. In addition to the above two classes, more approaches concerning fuzzy regression analysis have followed these pioneer papers, for instance, those studied by Wang et al. [10], Roh et al. [11], Ciavolino and Calcagni [12], Gong et al. [13], Jiang et al. [14], Chen and Nien [15], Wang et al. [16], Khammar et al. [17].

In the present research, the integration of fuzzy techniques into the financial time series forecasting approaches has been developed in order to improve the prediction accuracy and reduce the model fuzzy uncertainty. Tseng et al. [18] developed a FARIMA model considering ARIMA and fuzzy regression model for applying it to the exchange rate forecasting. Yu and Huarng [19] applied neural networks to implement a fuzzy time series model, which includes the various degrees of membership in establishing fuzzy relationships to forecast the stock index in Taiwan. A method for forecasting the Taiwan Stock Exchange Capitalization Weighted Stock Index is presented by Chen and Kao [20] based on fuzzy time series, particle swarm optimization techniques and support vector machine(SVM). Soto et al. [21] presented an approach to multiple time series prediction using many-inputs many-outputs fuzzy aggregation models with modular neural networks in the experiments including the Mexican Stock Exchange, National Association of Securities Dealers Automated Quotation and Taiwan Stock Exchange time series. Hanapi et al. [22] proposed a novel forecasting model dubbed a fuzzy linear regression sliding window GARCH model to forecast the Stock and Price Index. Hao et al. [23] developed a twin support vector machine with fuzzy hyperplane to predict stock price trends based on financial news articles. Dong and Ma [24] established a hybrid fuzzy time series forecasting model consisting of several components and processes to forecast three typical stock index datasets.

The existing literatures presented on financial time series forecasting have illustrated various combination models, which integrate statistical models, artificial neural network and fuzzy techniques together in order to overcome the deficiencies of the single model. But as observed in the literatures, these models do not include the concept of fuzzy bilinear regression(FBR) model for financial time series forecasting. Moreover, due to factors of uncertainty from the financial environment and rapid development of new technology, we usually have to forecast future situations relying on little expert knowledge. It is preferable to integrate the simultaneous regression analysis employing risk degree of fitness that accommodates data containing outliers together with intelligent method into the forecasting framework, which could produce better performance from the predictable and computational point of view. In this paper our objective is to design a systematic fuzzy financial time series forecasting procedure using symmetrical triangular fuzzy numbers based on the risk neutral FBR obtained by quadratic programming and probabilistic neural network(PNN) model to yield more stable and more accurate forecasting results. Our proposed method is supposed on a decomposition of the relationship between a fuzzy dependent variable and independent variables into two components, which is quite flexible by changing its weights and thresholds with a reasonable justification when emphasizing the central and spread tendency in conjunction with possibilistic procedures. The formulation performance is compared with other forecasting methods to show its appropriateness and robustness. The empirical results of Shanghai Stock Exchange(SSE) composite index forecasting ascertain that the proposed processes are efficient to improve forecasting accuracy and reduce the model uncertainty.

The remaining content of this paper is organized in the following way. Section 2 containing the fundamentals of fuzzy financial time series, fuzzy regression model and PNN is briefly reviewed. Section 3 forms the core of the paper explaining the derivation of the presented approach to parameter identification, introducing the risk-neutral FBR in the case of having little expert knowledge solved by quadratic programming and using PNN in order to guarantee the model to get more accurate prediction range. In Section 4, the proposed model is applied to SSE composite index forecasting and its performance is evaluated in comparison with those of other forecasting methods. Concluding remarks will be the final section of this paper.

2 Fundamental theories

In what follows some basic concepts and definitions of fuzzy financial time series, fuzzy regression model and PNN will be recalled briefly, which serve as a reference in setting up the financial time series forecasting procedure discussed in the proposed system.

2.1 Fuzzy financial time series

Fuzzy theories have various applicationsin improving forecasting models due to its capability in bridging the gap between numerical data and linguistic statement. In several substantive applications, the most utilized class of fuzzy variable is symmetrical fuzzy number [25]. Usually, a symmetrical fuzzy number is denoted by A = (a, d) _L, where a and d denote the center and spread respectively with the following membership function: $A (x) = L (\frac{x - a}{d}), d > 0,$ (1) where L (·) is a membership function of fuzzy number from R¹ to [0, 1] satisfying (i) L (x) = L (- x), (ii) L (0) =1, (iii) L is decreasing on [0, + ∞). If d = 0, the symmetrical fuzzy number A degenerates into a crisp value a.

The most common membership function for the symmetrical fuzzy number is provided by the symmetrical triangular fuzzy number, denoted by A = (a, d), where L has the form L (x) = max (0, 1 - |x|).

Definition 1. Assume that ${[x_{t}^{L}, x_{t}^{U}], t = 1, 2, . . ., n}$ is an interval series of financial product price, which is inherited by the corresponding fuzzy description { (x_t, s_t) , t = 1, 2, . . . , n}, referred to as fuzzy financial time series, such that $x_{t} = (x_{t}^{L} + x_{t}^{U}) / 2$ , $s_{t} = (x_{t}^{U} - x_{t}^{L}) / 2$ .

Definition 2. Assume that ${[x_{t}^{L}, x_{t}^{U}], t = 1, 2, . . ., n}$ is an interval series of financial product price, let us define the fuzzy financial yield series ${{\tilde{r}}_{t}, t = 2, . . ., n}$ as follows: $\begin{matrix} {\tilde{r}}_{t} & = [ln x_{t}^{L}, ln x_{t}^{U}] - [ln x_{t - 1}^{L}, ln x_{t - 1}^{U}] \\ = [ln \frac{x_{t}^{L}}{x_{t - 1}^{U}}, ln \frac{x_{t}^{U}}{x_{t - 1}^{L}}] . \end{matrix}$ (2) Accordingly the corresponding fuzzy description is { (c_t, u_t) , t = 2, . . . , n}, where $c_{t} = (ln \frac{x_{t}^{L}}{x_{t - 1}^{U}} + ln \frac{x_{t}^{U}}{x_{t - 1}^{L}}) / 2$ reflects the concentration trend of financial asset yield, $u_{t} = (ln \frac{x_{t}^{U}}{x_{t - 1}^{L}} - ln \frac{x_{t}^{L}}{x_{t - 1}^{U}}) / 2$ reflects the nonrandom volatility of financial asset yield.

Definition 3. When the central series {c_t} is stationary, ${{\tilde{r}}_{t}}$ is called conditional stationary fuzzy financial yield series.

The transformation of interval data into symmetrical triangular fuzzy number is shown in Fig. 1.

Fig. 1

Demonstration of transformation from interval data into symmetrical triangular fuzzy number.

2.2 Fuzzy regression model

Let us consider a fuzzy response variable Y and m crisp explanatory variables X₁, . . . , X_m observed on M units. Data are expressed by (Y_j, x _j), j = 1, . . . , M, where Y_j = (y_j, e_j) _L, x _j = (x_j1, . . . , x_jm) ′ (with x_j1 = 1). We have the following fuzzy linear regression(FLR) model: $Y_{j} = A_{1} x_{j 1} + \dots + A_{m} x_{jm} = {Ax}_{j}, j = 1, . . ., M,$ (3) where A = (A₁, . . . , A_m), A₁, . . . , A_m are symmetrical fuzzy parameters with A_i = (a_i, d_i) _L, i = 1, . . . , m. Then given the extansion principle, the symmetrical fuzzy output variable Y is shown below:

$\begin{matrix} Y_{j} & = (a_{1}, d_{1})_{L} x_{j 1} + \dots + (a_{m}, d_{m})_{L} x_{jm} \\ = ({a^{'} x}_{j}, d^{'} | x_{j} |)_{L}, j = 1, . . ., M, \end{matrix}$ (4) where a = (a₁, . . . , a_m) ′, d = (d₁, . . . , d_m) ′, | x _j| = (|x_j1|, . . . , |x_jm|) ′.

Tanaka [25] suggested the goal to minimize the estimated spreads of fuzzy outputs for the entire data set, so the estimation problem is formulated as follows: $\min_{a, d} J = \sum_{j = 1}^{M} d^{'} | x_{j} |$ $s . t . \begin{matrix} {\begin{matrix} a^{'} x_{j} + | L^{- 1} (h) | d^{'} | x_{j} | \geq y_{j} + | L^{- 1} (h) | e_{j}, \\ a^{'} x_{j} - | L^{- 1} (h) | d^{'} | x_{j} | \leq y_{j} - | L^{- 1} (h) | e_{j}, \\ d_{i} \geq 0, i = 1, . . ., m, j = 1, . . ., M . \end{matrix} \end{matrix}$ (5) where h is a threshold, chosen in the interval [0, 1], representing the degree assigned to the relation that the estimated fuzzy outputs should include the observed outputs.

Diamond [9] developed a least squares method directly using the distance between the level compact sets of triangular fuzzy numbers. The estimates are obtained by minimizing the following squared Euclidean distance in the least squares regression:

$\begin{matrix} Q = \sum_{j = 1}^{M} (y_{j} - a^{'} x_{j})^{2} + \sum_{j = 1}^{M} (e_{j} - d^{'} | x_{j} |)^{2} . \end{matrix}$ (6) Then, the least squares estimates of a , d can be obtained as follows: ${\begin{matrix} \begin{matrix} \hat{a} = (X^{'} X)^{- 1} X^{'} y, \\ \hat{d} = (| X |^{'} | X |)^{- 1} | X |^{'} e, \end{matrix} \end{matrix}$ (7) where X = ( x ₁, x ₂, . . . , x _M) ′, | X | = (| x ₁|, | x ₂|, . . . , | x _M|) ′, y = (y₁, . . . , y_M) ′, e = (e₁, . . . , e_M) ′.

LP and LS methods are not robust in the sense that even a small percentage of observations that deviate considerably from massive datasets can produce a distortion in the parameter estimates. Subsequently, there has been a growing literature that formalizes the FLR model providing sufficiently accurate and stable results [26 –32].

2.3 Probabilistic neural network

Specht [33] first introduced the probabilistic neural network, who demonstrated how the Bayes-Parzen classifier could be broken up into a large number of simple processes implemented in a multilayer neural network, each of which could be run independently in parallel. PNN is a feed-forward artificial neural network based on the supervised learning algorithm, which is developed on bayesian minimum risk criterion and has high performance in shortening the computing time of the training process and settling the nonlinear problems. It generally consists of four layers, namely the input layer, pattern layer, summary layer and output layer.

It assumed that an input vector X = (x₁, x₂, . . . , x_n) and classifies this vector X into a class y, y∈ { y₁, y₂, . . . , y_s }, from the input layer. The simple network architecture is showed in Fig. 2 containing n input variables and two population classes where N₁ training examples belong to class 1 and N₂ training examples belong to class 2. It should be noted that the probability density function(PDF) of each class y_j (j = 1, 2, . . . , s) for the input vector X , P_{y
_j} ( X ), is as follows: $\begin{matrix} P_{y_{j}} (X) = & \frac{1}{(2 π)^{n / 2} σ^{n}} \times \frac{1}{n_{j}} \\ \times \sum_{i = 1}^{n_{j}} \exp [- \frac{(X - X_{ji})^{T} (X - X_{ji})}{2 σ^{2}}], \end{matrix}$ (8) where n_j is the number of training samples which belong to class y_j. X _ji is the i-th training vector in class y_j, and σ is a smoothing parameter. The output layer classifies the input X based on summary layer outputs as: $Class (X) = argmax {P_{y_{j}} (X)},$ (9) where Class ( X ) is the estimated class of the input vector X [34].

Fig. 2

A simple probabilistic neural network architecture.

3 The proposed method in financial asset yield series forecasting

In this section, we introduce the linear regression model linking the components of the output variable with the input variables, devising the hybrid criteria without expert knowledge from the point of view risk, additionally, taking into consideration the neural network for the financial asset yield series forecasting.

3.1 Fuzzy bilinear regression model

Given n observations on the financial product price ${[x_{t}^{L}, x_{t}^{U}], t = 1, 2, . . . n}$ , then we can obtain the fuzzy financial yield series { (c_t, u_t) , t = 2, . . . , n}. It is natural to think that the vagueness in the measure of financial product price is relation with its intensity. For this reason, Li et al. [35] proposed fuzzy bilinear regression model for financial asset yield series where the dynamics of the spreads is somehow dependent on the magnitude of the centers, which is a reference for our approach. Following Li et al., we simultaneously model the centers and the spreads of the symmetrical triangular fuzzy variables by means of two linear regression models in matrix form as follows: $\begin{matrix} {\begin{matrix} c = c_{p} α + ɛ_{c}, \\ u = u_{q} β + γ c + ɛ_{u}, \end{matrix} \end{matrix}$ (10) where k = max (p, q), c = (c_k+2, c_k+3, . . . , c_n) ′ and u = (u_k+2, u_k+3, . . . , u_n) ′ are the (n - k - 1) ×1-vectors of observed centers and spreads, respectively. ɛ _c and ɛ _u are the error of the centers and spreads, respectively. c _p = ( 1 , c _-1, c _-2, . . . , c _-p), u _q = ( 1 , u _-1, u _-2, . . . , u _-q), here c _-i = (c_k+2-i, c_k+3-i, . . . , c_n-i) ′, i = 1, 2, . . . , p, u _-j = (u_k+2-j, u_k+3-j, . . . , u_n-j) ′, j = 1, 2, . . . , q, 1 is the (n - k - 1) ×1-vector of ones. α = (α₀, α₁, . . . , α_p) ′ is the vector of the coefficients for the regression model on the centers, β = (β₀, β₁, . . . , β_q) ′ and γ are the corresponding coefficients for the regression model on the spreads.

Remark 1. Notice that, when α = β , γ = 0, fuzzy bilinear regression model is expected to coincide with the fuzzy autoregression(FAR) model: $\begin{matrix} {\begin{matrix} c_{t} = α_{0} + α_{1} c_{t - 1} + \dots + α_{p} c_{t - p} + ɛ_{c}, \\ u_{t} = α_{0} + α_{1} u_{t - 1} + \dots + α_{p} u_{t - p} + ɛ_{u}, \end{matrix} \\ t = k + 2, . . ., n . \end{matrix}$ (11) Therefore fuzzy bilinear regression model can be regarded as a suitable extension of the FAR model.

Remark 2. We assign γ = 0, u _q = ( c _p, 1 ), $β = (\begin{matrix} b α \\ d \end{matrix})$ , b, d are constants, obtaining: $\begin{matrix} {\begin{matrix} c = c_{p} α + ɛ_{c}, \\ u = c_{p} α b + 1 d + ɛ_{u} . \end{matrix} \end{matrix}$ (12) Then the suggested model is embodied in the fuzzy regression model proposed by D^′Urso and Gastaldi. [36].

3.2 Risk-neutral fuzzy bilinear regression model without expert knowledge

Modarres et al. [37] formulated the risk-neutral FLR model based on the degree of possibility for the equality of the estimated model with respect to the given response variable in order to improve the predict ability of LP approach and decrease the computational complexity of the LS method. $\begin{matrix} \min \sum_{j = 1}^{M} {({\hat{Y}}_{j, R} (0) - Y_{j, R} (0)) + ({\hat{Y}}_{j, L} (0) - Y_{j, L} (0))}^{2} \end{matrix}$ $s . t . \begin{matrix} {\begin{matrix} Y_{j, L} (h) \leq {\hat{Y}}_{j, R} (h), Y_{j, R} (h) \geq {\hat{Y}}_{j, L} (h), \\ {\hat{Y}}_{j, R} (0) - {\hat{Y}}_{j, L} (0) \geq 0, j = 1, . . ., M, \end{matrix} \end{matrix}$ (13) where Y_j and ${\hat{Y}}_{j}$ are are the fuzzy observations and estimations, respectively; Y_j,L (h), Y_j,R (h) and ${\hat{Y}}_{j, L} (h)$ , ${\hat{Y}}_{j, R} (h)$ are lower and upper values of the h-level intervals of the fuzzy numbers Y_j and ${\hat{Y}}_{j}$ , respectively.

The constraint inequalities for some fixed h in (13) can be transformed to f^RN ≥ h, where f^RN denotes the risk-neutral fitting degree of the estimated FLR model, which is defined by $f^{RN} = \min_{j} {Pos (Y_{j} = {\hat{Y}}_{j})}$ .

Remark 3. The constraint relations concerning the threshold holding for the risk-neutral FLR model are consistent with those proposed by Tanaka et al. [8] when studying a crisp dependent variable.

Tanaka et al. [38] introduced quadratic programming(QP) based fuzzy regression in the case of the insufficient knowledge, tracking the central tendency and minimizing the spreads of the estimated fuzzy outputs for better prediction. This fuzzy regression model deals with the outliers problem by dividing the dataset into the reliable group(R) and suspicious group(S), setting different error values on each group to neutralize the effect of the presence of outliers. The model is given by: $\min_{a, d} k_{1} \sum_{j = 1}^{M} h_{j} (y_{j} - a^{'} x_{j})^{2} + k_{2} \sum_{j = 1}^{M} h_{j} d^{'} | x_{j} | | x_{j} |^{'} d + k_{3} e^{2}$ $s . t . \begin{matrix} {\begin{matrix} a^{'} x_{j} + (1 - h_{j}) d^{'} | x_{j} | \geq y_{j}, \\ a^{'} x_{j} - (1 - h_{j}) d^{'} | x_{j} | \leq y_{j}, j \in R, \\ a^{'} x_{j} + (1 - h_{j}) d^{'} | x_{j} | + e \geq y_{j}, \\ a^{'} x_{j} - (1 - h_{j}) d^{'} | x_{j} | - e \leq y_{j}, j \in S, \\ d_{i} \geq 0, i = 1, . . ., m . \end{matrix} \end{matrix}$ (14)

Notice that the response variable is numeric suggested by Tanaka et al. [38] and A_i = (a_i, d_i). e is an error term where e = 0 for R and e > 0 for S. k₁, k₂ and k₃ are positive weights given by an analyst which have the role of taking into account the corresponding variation.

This paper suggests a risk-neutral FBR model without expert knowledge, which extends the risk-neutral FLR model to multiple models, combining central and spread tendency, also considering possibilistic properties and robustness. Risk-neutral FBR model without expert knowledge can be formalized in the following compact matrix notation: $\begin{matrix} \min_{α, β, γ} k_{1} ‖ c & - c_{p} α ‖_{H}^{2} + k_{2} ‖ u - u_{q} β - γ c ‖_{H}^{2} \\ + k_{3} ‖ u_{q} β + γ c ‖_{H}^{2} + k_{4} e^{2} \end{matrix}$

$s . t . \begin{matrix} {\begin{matrix} c_{p} α + (I - H) (u_{q} β + γ c) + e F \geq c - (I - H) u, \\ c_{p} α - (I - H) (u_{q} β + γ c) + e F \leq c + (I - H) u, \end{matrix} \end{matrix}$ (15) here, I is the identity matrix. || · ||_H is the weighted norm and H is a diagonal matrix, whose diagonal elements are the thresholds h_k+2, . . . , h_n. k₁, k₂, k₃ and k₄ are positive weights. F = (I_S (k + 2) , . . . , I_S (n)) ′, I_S (·) is the indicator function of the set S.

The weights of objective function allow an analyst to predict in various angels and thresholds express the optimistic or pessimistic analysis in which we consider only intervals of data having high or low possibilities by putting a large or small value for threshold.

3.3 Financial asset yield series forecasting procedure

There are three primary stages in building a financial asset yield series forecasting model consisting in model identification, model estimation, and model enhancement. The aim of the model identification stage is to determine the order of FAR model by the stationary property with the help of autocorrelation and partial autocorrelation functions.

Based on the pre-forecasting of FAR model, the second stage is parameters estimation using risk-neutral FBR model, introducing the error term to discard the outliers in the case of having little expert knowledge, tracking the central and spread tendency, also minimizing the spreads of the estimated fuzzy outputs for better prediction.

Risk-neutral FBR model could produce the spreads of the estimated responses which become much wider, as more data are included. Consequently, the last stage is to employ PNN for identifying nonlinear patterns(more probability spaces) and determining the narrower intervals. Then the spaces that have lower existing probability are deleted from the intervals obtained by FBR, according to the results by PNN.

The steps of the algorithm for the suggested methodology in the financial asset yield series forecasting procedure are synthesized in the following scheme:

Step 1: Given n observations on the financial product price in the interval format ${[x_{t}^{L}, x_{t}^{U}], t = 1, 2, . . ., n},$ which can be written as the symmetrical triangular fuzzy number format { (c_t, u_t) , t = 2, . . . , n} based on Definition 2.

Step 2: Determine FAR model (11) by utilizing the least squares method, in which the (local) optimal solutions of the following minimization problem are obtained by setting to zero the partial derivatives with respect to each parameter: $\begin{matrix} \arg \min_{α} ‖ c - c_{p} α ‖^{2} + ‖ u - u_{p} α ‖^{2} \\ = (c_{p}^{'} c_{p} + u_{p}^{'} u_{p})^{- 1} (c_{p}^{'} c + u_{p}^{'} u) . \end{matrix}$ (16)

Step 3: Divide the data into two groups, reliable group and suspicious group, with the estimated standard deviation $\hat{σ}$ represented as $\hat{σ} = \sqrt{\frac{\sum_{t = k + 2}^{n} (c_{t} - \hat{c_{t}})^{2}}{n - k - p - 2}},$ (17) where ${\hat{c}}_{t}$ is the estimated value of center. If c_t satisfies the following conditions: $\hat{c_{t}} - l \hat{σ} \leq c_{t} \leq \hat{c_{t}} + l \hat{σ}, t = k + 2, . . ., n,$ (18) then the observation belongs to group R, otherwise belongs to group S. Here l is a positive number given by an analyst.

Step 4: Regression parameters are computed by using (15), denoted as Proposal 1, in which the results with different combinations of weights and thresholds are given, thus allowing an analyst to determine the most acceptable estimated coefficients.

We obtain the estimates $\hat{α}$ , $\hat{β}$ , $\hat{γ}$ and the corresponding vectors of estimated values of the centers and spreads of fuzzy financial yield series $\hat{c}$ , $\hat{u}$ using the expressions: $\begin{matrix} {\begin{matrix} \hat{c} = c_{p} \hat{α}, \\ \hat{u} = u_{q} \hat{β} + \hat{γ} c . \end{matrix} \end{matrix}$ (19)

Step 5: The prediction interval obtained in the previous step can be divided into v equal subintervals, w continuous subintervals is considered as a class and assigned a number. The target value(s) of PNN is (are) that the actual values(s) is included in the subinterval(s). Then the effective variables on the target value of PNN contain the observed and estimated centers at time t, lagged values of the observed and estimated centers at time t, lagged values of the observed and estimated spreads at time t, estimated lower and upper bounds of the time series at time t, lagged values of the estimated lower and upper bounds at time t. The result of this step is an interval with w/v width obtained by Proposal 1 and a confidence degree α which is related to the number of subintervals. Proposal 1 combined with PNN can be termed as Proposal 2.

4 Applicative example

In order to check the prediction accuracy of the method described in the previous section, we run an experiment in which Shanghai Stock Exchange composite index is taken from http ://quotes . mone- y . 163 . com/trade/lsjysj_-zhishu_-000001 . html, comprising 292 daily observations in the period: August 17, 2018 through Novermber 1, 2019. The fluctuation of these observations can be seen in Fig. 3(a), capturing the nonstationarity in the pattern of stock behavior also reported in the {x_t} column of Table 1. The proposed model is based on this fact that the series of actual values in the fuzzy financial yield is conditional stationary, which is shown in Fig. 3(b) and reported in the {c_t} column of Table 2. In this case, the last 11 observations are used as testing data for model validation, and the remaining observations have been used as training data to estimate the coefficients of the model.

Fig. 3

Trace of the time series of SSE composite index. (a)Interval observation data[ $x_{t}^{L}, x_{t}^{U}$ ]. (b)Fuzzy financial yield data (c_t, u_t).

Table 1

Run test for the central series of financial interval time series/fuzzy financial yield series

	{x_t}	{c_t}
Test value	2829.8	0.000294
Number of sample less than test value	160	140
Number of sample exceeding test value	132	151
Sample size	292	291
Run number	4	132
Z-value	-16.7038	-1.6221
p-value	0.0000	0.1047

Table 2

Forecasting performance comparison for the time series of SSE composite index

Model	RMSE		MAPE		DA		Run time(s)
	Training	Testing	Training	Testing	Training	Testing	Mean	Std.Dev.
FAR-LS	0.0177	0.0101	3.3024	3.3629	1.2014	1.3000	0.1899	0.0672
FAR-LP	0.0188	0.0109	2.8920	2.7029	1.4389	1.3000	0.2098	0.0701
FBR-LS	0.0144	0.0082	1.5014	1.5496	1.4676	1.6000	0.2003	0.0744
FBR-LP	0.0664	0.0383	5.7138	5.4338	1.4892	1.4000	0.2403	0.0764
FBR-RA	0.0418	0.0307	7.2656	11.4107	1.0647	0.9545	0.2074	0.0694
FBR-RN	0.0159	0.0091	2.1740	2.1713	1.4173	1.6000	0.2056	0.0727
Proposal 1	0.0144	0.0081	1.4152	1.3721	1.4460	1.6000	0.2192	0.0692

Note 1: Least squares method and linear programming method simply as LS and LP, respectively. The results of risk-averse and risk-neutral method at h=0, termed as RA and RN respectively, are used for comparison. Note 2: The results of run time are obtained from 1000 times repetition for each model procedure.

4.1 Forecasting the risk-neutral FBR model

It is important to build a FAR model with the training data before presenting the risk-neutral FBR model for distincting the given data into two groups because of the the lacking knowledge about the observed data. The FAR model is expressed as:

${\begin{matrix} \hat{c_{t}} = 0.7040 c_{t - 1}, \\ {\hat{u}}_{t} = 0.7040 u_{t - 1}, \end{matrix} t = 3, . . ., 281 .$ (20)

By (17), sample standard deviation is calculated, $\hat{σ} = 0.0119$ . Assigning l = 2, samples are divided with two groups based on (18), each group applies different possibility models. Thus based on (15), a risk-neutral FBR model is derived:

${\begin{matrix} \hat{c_{t}} = 0.0001 + 0.1213 c_{t - 1}, \\ {\hat{u}}_{t} = 0.0035 + 0.7480 u_{t - 1} - 0.0271 c_{t}, \end{matrix} t = 3, . . ., 281 .$ (21) Eq.(21) implies that the proposed model has identified the dynamics of financial asset yield series by the training process with the combination of weights k₁ = 0.1, k₂ = 7, k₃ = 0.6, k₄ = 0.01, and setting h value 0.4 for suspicious group and 0.1 for the remaining data. It can be interpreted that an increase of center value at time t - 1 causes a little increase in the estimated center value at time t. The increase in center value at time t may result in a slight decrease in the fluctuation at time t, but the estimated spread of fuzzy financial asset yield at time t will be higher as the spread at time t - 1 increases. Apparently, Eq.(21) demonstrates that the proposed model yields the economically consistent results with Li et al. [35].

We have implemented a set of fuzzy regression models. Root mean square error(RMSE), mean average percentage error(MAPE), direction accuracy(DA) values and computational performance for comparison of the results obtained from the fuzzy regression models are given in Table 1 as follows (The calculation of performance measure index is shown in Appendix).

The minimum RMSE and MAPE values calculated for the training and testing data sets provide the best performance of the proposed method. Note that DA values of the proposed method are generally higher than those of other fuzzy regression models. In terms of the fitness point, the proposed method is maybe better because of its resistance at the presence of outliers in the case of having little expert knowledge and its flexibility by changing the weights and thresholds. From the computational performance, we observe that FBR-LP model has a slightly higher run time and the proposed model consumes approximate amount of computing time as the other five fuzzy regression models. It is stated that, as these results clarify, the proposed method is significantly capable to estimate the appropriate parameters for reaching the desired outputs.

The statistical distribution of forecasting error indicator for center and spread is demonstrated by box-chart, as shown in Fig. 4. We can clearly observe that the proposed model is more compacted with less abnormalities and has smaller deviation from the mean than the other models. This is primarily due to the fact that identification of outliers, flexibility in weights and thresholds, analysis from the risk-neutral enable the proposed model effectively to produce superior results.

Fig. 4

Comparison for boxplot description of forecasting errors in the time series of SSE composite index. (a) Error of center. (b) Error of spread.

Remark 4. In the risk-neutral fuzzy bilinear regression, the parameters are determined according to the constraint conditions that for a certain h level, the intersections between the support of the estimated values from the model and the support of the observed values are non-empty sets. It is obvious that the property of risk-neutral fuzzy linear regression has higher capacity for forecasting. However, by analyzing the results, there exist some center observations out of the estimated intervals, accounting for 11.5% of the training data in which nearly a half lie away from the estimated intervals with a small extent, as can be seen in Fig. 5.

4.2 Model comparison according to interval forecasting

The estimated intervals of the previous model maybe tend to become a little fuzzier when data include the high volatility or outliers, and hence, it is wise to apply the intelligence tools to identify nonlinear pattern and improve the forecasting accuracy. Different probabilistic neural network structures are designed to achieve the optimal structure, and the final network consists of four inputs and one output neuron.

There are significant differences between the forecasting interval widths for the training data or testing data attained by Proposal 1 method shown in Fig. 5 and Proposal 2 with a confidence coefficient of 100% seen in Fig. 6. Additionally, the forecasting intervals of testing data obtained by Proposal 1 and Proposal 2 are shown in Table 3.

Fig. 5

Forecasting intervals obtained by Proposal 1 in the time series of SSE composite index. (a) Training data. (b) Testing data.

Fig. 6

Forecasting intervals obtained by Proposal 2(α = 100%) in the time series of SSE composite index. (a) Training data. (b) Testing data.

Table 3

Forecasting intervals of testing data

Data number	Center observation	Proposal 1		Proposal 2(α=86%)		Proposal 2(α=100%)
		Lower bound	Upper bound	Lower bound	Upper bound	Lower bound	Upper bound
1	-0.0061	-0.0106	0.0097	-0.0106	-0.0005	-0.0106	-0.0025
2	-0.0106	-0.0134	0.0122	-0.0134	-0.0006	-0.0134	-0.0032
3	0.0053	-0.0142	0.0120	-0.0011	0.0120	0.0015	0.0120
4	0.0005	-0.0082	0.0098	-0.0082	0.0008	-0.0046	0.0026
5	-0.0020	-0.0088	0.0092	-0.0088	0.0002	-0.0052	0.0020
6	-0.0002	-0.0098	0.0096	-0.0098	-0.0001	-0.0059	0.0018
7	0.0099	-0.0113	0.0115	0.0001	0.0115	0.0024	0.0115
8	-0.0007	-0.0099	0.0126	-0.0099	0.0013	-0.0054	0.0036
9	-0.0076	-0.0097	0.0099	-0.0097	0.0001	-0.0097	-0.0019
10	-0.0030	-0.0099	0.0083	-0.0099	-0.0008	-0.0062	0.0010
11	0.0011	-0.0090	0.0086	-0.0090	-0.0002	-0.0020	0.0051

In a comparison study, the baseline models are several traditional time series methods and neural network methods, such as, ARIMA, single exponential smoothing(SES), double exponential smoothing(DES), SVM, multilayer perceptron(MLP), back propagation(BP) neural network and Elman neural network. A pairwise comparison of the forecasting interval widths is performed in Table 4 which reveals that there are significant differences between the average forecasting interval width attained by Proposal 2 with a confidence coefficient of 100% and those obtained by the other forecasting methods with a confidence level 95%. The average width of the forecasting intervals obtained from Proposal 2 is 0.0083, which represents improvement of 80.52%, 79.84%, 78.49%, 78.03%, 76.91%, 76.26%, 74.24% and 60% in the confidence intervals of DES, ARIMA, MLP, SES, SVM, Elman, BP methods and Proposal 1, respectively. It is concluded that Proposal 2 outperforms its counterparts significantly owing to incorporating the neural network.

Table 4

Pairwise comparison of interval width for the forecasting methods in the time series of SSE composite index

Method	Interval width of yield	Improvement
		DES	ARIMA	MLP	SES	SVM	Elman	BP	Proposal 1	Proposal 2
DES	0.0426	0
ARIMA	0.0411	3.37%	0
MLP	0.0386	9.41%	6.25%	0
SES	0.0378	11.33%	8.23%	2.11%	0
SVM	0.0360	15.61%	12.66%	6.84%	4.83%	0
Elman	0.0350	17.92%	15.06%	9.39%	7.44%	2.74%	0
BP	0.0322	24.37%	21.73%	16.51%	14.71%	10.38%	7.86%	0
Proposal 1	0.0208	51.29%	49.59%	46.23%	45.07%	42.28%	40.66%	35.59%	0
Proposal 2	0.0083	80.52%	79.84%	78.49%	78.03%	76.91%	76.26%	74.24%	60.00%	0

The results of forecasting the risk-neutral FBR model show that the proposed model is applicable to the financial asset price forecasting, i.e. the situation with the lowest error measures, higher direction accuracy and no more run time than other fuzzy regression models. The features and advantages of the fuzzy regression forecasting models ever suggested is inherited by the proposed model, which holds also for the statistical distribution of forecasting error indicator. The risk-neutral FBR model by means of neural network displays significant capability of yielding the minimum widths and offering improvements in the interval estimation case.

5 Conclusion

Financial time series forecasting has been an popular research field for the last few decades. Since there are fluctuations and uncertainties for financial asset price maybe caused by human subjective perceptions or incomplete knowledge, traditional time series models do not provide a suitable prediction.

We have formulated a risk-neutral FBR model without expert knowledge in which the optimal coefficients considering the properties of not only least squares but also possibility approaches are estimated on quadratic programming algorithm, and forecasting intervals with narrower width are achieved by using PNN. In the formulations of risk-neutral fuzzy regression model without expert knowledge, the different possibility models can be applied to the distinct data. And risk-neutral fitting degree expressing the possibility of equality between a pair of fuzzy numbers is applied in the proposed approach. Multiple objectives are considered in this paper: minimizing the distances between the estimated outputs and the observed outputs, minimizing the spreads of the estimated outputs and the error term. The object of PNN is to identify the subintervals whichhave a greater probability obtained the actual values with the aim of achieveing optimal solutions less costly. It can be seen that our approach is quite flexible by changing the weights and thresholds, especially effectively to deal with data containing outliers. A simulation experiment using a data set from SSE has been carried out in order to analyze the effect of the proposed approach. In consequence, being a new approach in the financial time series forecasting literature, the proposed method is an important reference from the point of considering a robust version of FBR model without expert knowledge which introduces view risk and utilizes neural networks.

However, there exist a few lacks for studying in future with respect to this paper. The fuzzy set theory provides appropriate tools for predicting the dynamic changes in financial asset prices, which could not replace a large number of researches based on randomness. At this point, there might be a need for taking into account fuzzy random variables to handle the applications. After the identification of reliable group and suspicious group through the FAR model has been done, how to determine the acceptable combination of weights and thresholds will be developed in a subsequent paper because there is obviously trade off information between the values and objective function. Besides, some center observations of fuzzy financial yield series may stay out of the estimated intervals obtained by risk-neutral FBR model, and the developed fuzzy regression makes use of AR variables while disregarding MA variables, which clearly remain an interest to be investigated. Finally, it could be a useful topic to extend the proposed method to allow for nonlinearity in the relation between spreads and centers and apply our model in more datasets to show the performance.

Footnotes

Acknowledgment

This paper work was supported by first-class discipline(system science) construction project of Shanghai [Grant No.XTKX2012]; Priority discipline project of Shanghai [Grant No.T0502]; and Foundation of Hujiang [Grant No.B14005].

Conflict of interest

None.

Appendix

Three kinds of performance measure are adopted in this study, as described below:

(1)Root mean square error (RMSE) (A.1) $\begin{matrix} RMSE = & \sqrt{\frac{\sum_{t = k + 2}^{n} (c_{t} - {\hat{c}}_{t})^{2}}{n - k - 1}} \\ + & \sqrt{\frac{\sum_{t = k + 2}^{n} (u_{t} - {\hat{u}}_{t})^{2}}{n - k - 1}} . \end{matrix}$

(2)Mean absolute percentage error (MAPE) (A.2) $\begin{matrix} MAPE & = \frac{1}{n - k - 1} \sum_{t = k + 2}^{n} | \frac{c_{t} - {\hat{c}}_{t}}{c_{t}} | \\ + \frac{1}{n - k - 1} \sum_{t = k + 2}^{n} | \frac{u_{t} - {\hat{u}}_{t}}{u_{t}} | . \end{matrix}$

(3)Direction accuracy (DA) (A.3) $DA = \frac{1}{n - k - 2} \sum_{t = k + 2}^{n - 1} (a_{t} + b_{t}),$ where $a_{t} = {\begin{matrix} 1, (c_{t + 1} - c_{t}) ({\hat{c}}_{t + 1} - c_{t}) > 0 \\ 0, otherwise \end{matrix}$ , $b_{t} = {\begin{matrix} 1, (u_{t + 1} - u_{t}) ({\hat{u}}_{t + 1} - u_{t}) > 0 \\ 0, otherwise \end{matrix}, t = k + 2, . . ., n - 1$ .

References

Ince

and Trafalis

T.B.

, A hybrid model for exchange rate prediction, Decision Support Systems 42 (2006), 1054–1062.

and Yin

, Exchange rate prediction using hybrid neural networks and trading indicators, Neurocomputing 72 (2009), 2815–2823.

Patel

, Shah

and Thakkar

, Predicting stock market index using fusion of machine learning techniques, Expert Systems with Application 42 (2015), 2162–2172.

Rather

A.M.

, Agarwal

and Sastry

V.N.

, Recurrent neural net work and a hybrid model for prediction of stock returns, Expert Systems with Applications 42 (2015), 3234–3241.

Yang

H.L.

and Lin

H.C.

, An integrated model combined ARIMA, EMD with SVR for stock indices forecasting, International Journal on Artificial Intelligence Tools 25 (2016), 1650005.

Liu

and Long

, An improved deep learning model for pre dicting stock market price time series, Digital Signal Processing 102 (2020), 102741.

Watada

, Fuzzy time-series analysis and its forecasting of sales volume, Fuzzy Regression Analysis (1992), 211–227.

Tanaka

, Uejima

and Asai

, Linear regression analysis with fuzzy model, IEEE Transactions on Systems Man and Cy bernetics 12 (1982), 903–907.

Diamond

, Fuzzy least squares, Information Sciences 46 (1988), 141–157.

10.

Wang

, Zhang

W.X.

and Mei

C.L.

, Fuzzy nonparametric regression based on local linear smoothing technique, Information Sciences 177 (2007), 3882–3900.

11.

Roh

S.B.

, Ahn

T.C.

and Prdrycz

, Fuzzy linear regression based on polynomial neural networks, Expert Systems with Applications 39 (2012), 8909–8928.

12.

Ciavolino

and Calcagni

, A generalizedmaximumentropy (GME) estimation approach to fuzzy regression model, Applied Soft Computing 38 (2016), 51–63.

13.

Gong

, Yang

, Ma

and Ge

, Fuzzy regression model based on geometric coordinate points distance and application to performance evaluation, Journal of Intelligent & Fuzzy Systems 34 (2018), 395–404.

14.

Jiang

, Kwong

C.K.

, Chan

C.Y.

and Yung

K.L.

, A multi objective evolutionary approach for fuzzy regression analysis, Expert Systems with Application 130 (2019), 225–235.

15.

Chen

L.H.

and Nien

S.H.

, A new approach to formulate fuzzy regression models, Applied Soft Computing 86 (2020), 105915.

16.

Wang

, Reformat

, Yao

, Zhao

and Chen

, Fuzzy Linear regression based on approximate Bayesian computation, Applied Soft Computing 97 (2020), 106763.

17.

Khammar

A.H.

, Arefi

and Akbari

M.G.

, A general approach to fuzzy regression models based on different loss functions, Soft Computing 25 (2021), 835–849.

18.

Tseng

F.M.

, Tzeng

G.H.

and Yu

H.C.

, Fuzzy ARIMA model for forecasting the foreign exchange market, Fuzzy Sets and Systems 118 (2001), 9–19.

19.

T.H.K.

and Huarng

K.H.

, A neural network-based fuzzy time series model to improve forecasting, Expert Systems with Applications 37 (2009), 3366–3372.

20.

Chen

S.M.

and Kao

P.Y.

, TAIEX forecasting based on fuzzy time series, particle swarm optimization techniques and support vector machines, Information Sciences 247 (2013), 62–71.

21.

Soto

, Castillo

and Melin

, A New Approach to Multiple Time Series Prediction Using MIMO Fuzzy Aggregation Models with Modular Neural Networks, International Journal of Fuzzy Systems 21(5) (2019), 1629–1648.

22.

Hanapi

A.L.M.

, Othman

, Sokkalingam

, Ramli

and Vasant

, A novel fuzzy linear regression sliding window garch model for time-series forecasting, Applied Sciences 10 (2020), 1949.

23.

Hao

P.Y.

, Kung

C.F.

, Chang

C.Y.

and Ou

J.B.

, Predicting stock price trends based on financial news articles and using a noveltwin support vector machine with fuzzy hyperplane, Applied Soft Computing 98 (2021), 106806.

24.

Dong

and Ma

, Enhanced fuzzy time series forecasting model based on hesitant differential fuzzy sets and error learning, Expert Systems with Applications 166 (2021), 114056.

25.

Tanaka

, Fuzzy data analysis by possibilistic linear model, Fuzzy Sets and Systems 24 (1987), 363–375.

26.

D’Urso

, Massari

and Santoro

, Robust fuzzy regression analysis, Information Sciences 181 (2011), 4154–4174.

27.

Zeng

, Feng

and Li

, Fuzzy least absolute linear regression, Applied Soft Computing 52 (2017), 10091019.

28.

Chachi

, A Weighted Least Squares Fuzzy Regression for Crisp Input-Fuzzy Output Data, IEEE Transactions on Fuzzy Systems 27 (2019), 739–748.

29.

Hesamian

and Akbari

M.G.

, A robust varying coefficient approach to fuzzy multiple regression model, Journal of Computational and Applied Mathematics 371 (2020), 112704.

30.

Hesamian

and Akbari

M.G.

, A robust multiple regression model based on fuzzy random variables, Journal of Computational and Applied Mathematics 388 (2020), 113270.

31.

Khammar

, Arefi

and Akbari

M.G.

, A robust least squares fuzzy regression model based on kernel function, Iranian Journal of Fuzzy Systems 17 (2020), 105–119.

32.

Taheri

S.M.

and Chachi

, A robust variable-spread fuzzy regression model, Recent Developments and the New Direction in Soft-Computing Foundations and Applications (2021), 309–320.

33.

Specht

D.F.

, Probabilistic neural networks, Neural Networks 3 (1990), 109–118.

34.

Chung

T.Y.

, Chen

Y.M.

and Tang

S.C.

, A hybrid system integrating signal analysis and probabilistic neural network for user motion detection in wireless networks, Expert Systems with Application 39 (2012), 3392–3403.

35.

Z.Y.

, Liu

W.Y.

and Wang

T.J.

, Fuzzy bilinear regression of yields series, Statistical Research 26 (2009), 68–73.

36.

D’Urso

and Gastaldi

, A least-squares approach to fuzzy linear regression analysis, Computational Statistics and Data Analysis 34 (2000), 427–440.

37.

Modarres

, Nasrabadi

and Nasrabadi

M.M.

, Fuzzy linear regression analysis from the point of view risk, International Journal of Uncertainty Fuzziness and Knowledge Based Systems 12 (2004), 635–649.

38.

Tanaka

and Lee

, Fuzzy linear regression combining central tendency and possibilistic properties, in: Proceedings of 6th International Fuzzy Systems Conference (1997), 63–68.