Abstract
Application of quantitative methods for forecasting purposes in financial markets has attracted significant attention from researchers and managers in recent years when conventional time series forecasting models can hardly develop the inherent rules of complex nonlinear dynamic financial systems. In this paper, based on the fuzzy technique integrated with the statistical tools and artificial neural network, a new hybrid forecasting system consisting of three stages is constructed to exhibit effectively improved forecasting accuracy of financial asset price. The sum of squared errors is minimized to determine the coefficients in fitting the fuzzy autoregression model stage for formulating sample groups to deal with data containing outliers. Fuzzy bilinear regression model introducing risk view based on quadratic programming algorithm that reflects the properties of both least squares and possibility approaches without expert knowledge is developed in the second stage. The main idea of the model considers the sub-models tracking the possible relations between the spread and the center, also linking the estimation deviation with risk degree of fitness of the model. In the third stage, fuzzy bilinear regression forecasting combining with the optimal architecture of probabilistic neural network classifiers indicates that the proposed method has great contribution to control over-wide interval financial data with a certain confidence level. Statistical validation and performance analysis using historical financial asset yield series on Shanghai Stock Exchange composite index all exhibit the effectiveness and stability of the proposed hybrid forecasting formulation compared with other forecasting methods.
Keywords
Introduction
In recent years, quantitative methods play a vital role in achieving the data fluctuation information in financial markets for forecasting purposes as well as for decisions and investments. Time series models as tools for forecasting in financial markets investigate relations on the past observations of the same variable, which has become increasingly crucial in business practices more than ever before. This modeling is particularly useful when little knowledge is available on the data generating process or when there is no satisfactory model that relates the dependent variable to other variables. Improving time series forecasting accuracy is often a challenging yet difficult task facing forecasters. Many methods to solve these problems have been introduced ranging from linear and nonlinear models to artificial intelligence algorithms over the past forty decades.
One of the most popularly used statistical models is autoregressive integrated moving average(ARIMA) model that has the advantage of accurate short-time forecasting and easily implementing. Statistical models have a wide application and build rapidly, but the results are always inaccurate in forecasting the nonlinearity and high volatility of financial data. Aritificial neural networks(ANNs) are flexible computing frameworks and universal approximators that were developed to overcome some of these limitations and can be applied to a wide range of forecasting problems with a high degree of accuracy. Though ANN models can achieve good results, they have some shortcomings such as a difficulty in parameter determination and computational complexity.
Because of the possible unstable or changing patterns in the financial data, using the hybrid model can overcome the limitations of component models, especially when the models in the ensemble are quite different. Ince and Trafalis [1] proposed a two stage forecasting model which incorporates ARIMA, co-integration with ANN techniques to forecast exchange rate. Ni and Yin [2] described a hybrid model formed by a mixture of various regressive neural network models for modelling foreign exchange rate time series. Patel et al. [3] presented a two stage fusion approach involving support vector regression(SVR) and random forest to predict future values of stock market indices. Rather [4] constituted a robust hybrid model including exponential smoothing(ES) and recurrent neural network for prediction of stock returns. Yang and Lin [5] proposed an approach by combining ARIMA with SVR to aid the financial time series forecasting. An improved hybrid forecasting model is established by Liu and Long [6] to predict the daily stock closing price series based on deep learning network.
However, these models using a single point data may contribute the huge standard deviation and fail to be tolerant of imprecision and approximation. In order to solve the problems, fuzzy techniques based on the statistics have been suggested to deal with problems involving linguistic terms. Watada [7] found an application of fuzzy regression in time series analysis. Fuzzy regression is an extension of conventional regression analysis to fuzzy environments, which can be employed as an efficient and useful tool for analyzing complex systems in fuzzy situations, such as economics, engineering, hydrology, military and education systems. Fuzzy regression techniques are usually classi?ed into two distinct categories, linear programming(LP) method [8] and least squares(LS) method [9], which are not competitive with each other but complementary. In addition to the above two classes, more approaches concerning fuzzy regression analysis have followed these pioneer papers, for instance, those studied by Wang et al. [10], Roh et al. [11], Ciavolino and Calcagni [12], Gong et al. [13], Jiang et al. [14], Chen and Nien [15], Wang et al. [16], Khammar et al. [17].
In the present research, the integration of fuzzy techniques into the financial time series forecasting approaches has been developed in order to improve the prediction accuracy and reduce the model fuzzy uncertainty. Tseng et al. [18] developed a FARIMA model considering ARIMA and fuzzy regression model for applying it to the exchange rate forecasting. Yu and Huarng [19] applied neural networks to implement a fuzzy time series model, which includes the various degrees of membership in establishing fuzzy relationships to forecast the stock index in Taiwan. A method for forecasting the Taiwan Stock Exchange Capitalization Weighted Stock Index is presented by Chen and Kao [20] based on fuzzy time series, particle swarm optimization techniques and support vector machine(SVM). Soto et al. [21] presented an approach to multiple time series prediction using many-inputs many-outputs fuzzy aggregation models with modular neural networks in the experiments including the Mexican Stock Exchange, National Association of Securities Dealers Automated Quotation and Taiwan Stock Exchange time series. Hanapi et al. [22] proposed a novel forecasting model dubbed a fuzzy linear regression sliding window GARCH model to forecast the Stock and Price Index. Hao et al. [23] developed a twin support vector machine with fuzzy hyperplane to predict stock price trends based on financial news articles. Dong and Ma [24] established a hybrid fuzzy time series forecasting model consisting of several components and processes to forecast three typical stock index datasets.
The existing literatures presented on financial time series forecasting have illustrated various combination models, which integrate statistical models, artificial neural network and fuzzy techniques together in order to overcome the deficiencies of the single model. But as observed in the literatures, these models do not include the concept of fuzzy bilinear regression(FBR) model for financial time series forecasting. Moreover, due to factors of uncertainty from the financial environment and rapid development of new technology, we usually have to forecast future situations relying on little expert knowledge. It is preferable to integrate the simultaneous regression analysis employing risk degree of fitness that accommodates data containing outliers together with intelligent method into the forecasting framework, which could produce better performance from the predictable and computational point of view. In this paper our objective is to design a systematic fuzzy financial time series forecasting procedure using symmetrical triangular fuzzy numbers based on the risk neutral FBR obtained by quadratic programming and probabilistic neural network(PNN) model to yield more stable and more accurate forecasting results. Our proposed method is supposed on a decomposition of the relationship between a fuzzy dependent variable and independent variables into two components, which is quite flexible by changing its weights and thresholds with a reasonable justification when emphasizing the central and spread tendency in conjunction with possibilistic procedures. The formulation performance is compared with other forecasting methods to show its appropriateness and robustness. The empirical results of Shanghai Stock Exchange(SSE) composite index forecasting ascertain that the proposed processes are efficient to improve forecasting accuracy and reduce the model uncertainty.
The remaining content of this paper is organized in the following way. Section 2 containing the fundamentals of fuzzy financial time series, fuzzy regression model and PNN is briefly reviewed. Section 3 forms the core of the paper explaining the derivation of the presented approach to parameter identification, introducing the risk-neutral FBR in the case of having little expert knowledge solved by quadratic programming and using PNN in order to guarantee the model to get more accurate prediction range. In Section 4, the proposed model is applied to SSE composite index forecasting and its performance is evaluated in comparison with those of other forecasting methods. Concluding remarks will be the final section of this paper.
Fundamental theories
In what follows some basic concepts and definitions of fuzzy financial time series, fuzzy regression model and PNN will be recalled briefly, which serve as a reference in setting up the financial time series forecasting procedure discussed in the proposed system.
Fuzzy financial time series
Fuzzy theories have various applicationsin improving forecasting models due to its capability in bridging the gap between numerical data and linguistic statement. In several substantive applications, the most utilized class of fuzzy variable is symmetrical fuzzy number [25]. Usually, a symmetrical fuzzy number is denoted by A = (a, d)
L
, where a and d denote the center and spread respectively with the following membership function:
The most common membership function for the symmetrical fuzzy number is provided by the symmetrical triangular fuzzy number, denoted by A = (a, d), where L has the form L (x) = max (0, 1 - |x|).
The transformation of interval data into symmetrical triangular fuzzy number is shown in Fig. 1.

Demonstration of transformation from interval data into symmetrical triangular fuzzy number.
Let us consider a fuzzy response variable Y and m crisp explanatory variables X1, . . . , X
m
observed on M units. Data are expressed by (Y
j
,
Tanaka [25] suggested the goal to minimize the estimated spreads of fuzzy outputs for the entire data set, so the estimation problem is formulated as follows:
Diamond [9] developed a least squares method directly using the distance between the level compact sets of triangular fuzzy numbers. The estimates are obtained by minimizing the following squared Euclidean distance in the least squares regression:
LP and LS methods are not robust in the sense that even a small percentage of observations that deviate considerably from massive datasets can produce a distortion in the parameter estimates. Subsequently, there has been a growing literature that formalizes the FLR model providing sufficiently accurate and stable results [26–32].
Specht [33] first introduced the probabilistic neural network, who demonstrated how the Bayes-Parzen classifier could be broken up into a large number of simple processes implemented in a multilayer neural network, each of which could be run independently in parallel. PNN is a feed-forward artificial neural network based on the supervised learning algorithm, which is developed on bayesian minimum risk criterion and has high performance in shortening the computing time of the training process and settling the nonlinear problems. It generally consists of four layers, namely the input layer, pattern layer, summary layer and output layer.
It assumed that an input vector

A simple probabilistic neural network architecture.
In this section, we introduce the linear regression model linking the components of the output variable with the input variables, devising the hybrid criteria without expert knowledge from the point of view risk, additionally, taking into consideration the neural network for the financial asset yield series forecasting.
Fuzzy bilinear regression model
Given n observations on the financial product price
Modarres et al. [37] formulated the risk-neutral FLR model based on the degree of possibility for the equality of the estimated model with respect to the given response variable in order to improve the predict ability of LP approach and decrease the computational complexity of the LS method.
The constraint inequalities for some fixed h in (13) can be transformed to f
RN
≥ h, where f
RN
denotes the risk-neutral fitting degree of the estimated FLR model, which is defined by
Tanaka et al. [38] introduced quadratic programming(QP) based fuzzy regression in the case of the insufficient knowledge, tracking the central tendency and minimizing the spreads of the estimated fuzzy outputs for better prediction. This fuzzy regression model deals with the outliers problem by dividing the dataset into the reliable group(R) and suspicious group(S), setting different error values on each group to neutralize the effect of the presence of outliers. The model is given by:
Notice that the response variable is numeric suggested by Tanaka et al. [38] and A i = (a i , d i ). e is an error term where e = 0 for R and e > 0 for S. k1, k2 and k3 are positive weights given by an analyst which have the role of taking into account the corresponding variation.
This paper suggests a risk-neutral FBR model without expert knowledge, which extends the risk-neutral FLR model to multiple models, combining central and spread tendency, also considering possibilistic properties and robustness. Risk-neutral FBR model without expert knowledge can be formalized in the following compact matrix notation:
The weights of objective function allow an analyst to predict in various angels and thresholds express the optimistic or pessimistic analysis in which we consider only intervals of data having high or low possibilities by putting a large or small value for threshold.
There are three primary stages in building a financial asset yield series forecasting model consisting in model identification, model estimation, and model enhancement. The aim of the model identification stage is to determine the order of FAR model by the stationary property with the help of autocorrelation and partial autocorrelation functions.
Based on the pre-forecasting of FAR model, the second stage is parameters estimation using risk-neutral FBR model, introducing the error term to discard the outliers in the case of having little expert knowledge, tracking the central and spread tendency, also minimizing the spreads of the estimated fuzzy outputs for better prediction.
Risk-neutral FBR model could produce the spreads of the estimated responses which become much wider, as more data are included. Consequently, the last stage is to employ PNN for identifying nonlinear patterns(more probability spaces) and determining the narrower intervals. Then the spaces that have lower existing probability are deleted from the intervals obtained by FBR, according to the results by PNN.
The steps of the algorithm for the suggested methodology in the financial asset yield series forecasting procedure are synthesized in the following scheme:
We obtain the estimates
In order to check the prediction accuracy of the method described in the previous section, we run an experiment in which Shanghai Stock Exchange composite index is taken from http ://quotes . mone- y . 163 . com/trade/lsjysj-zhishu-000001 . html, comprising 292 daily observations in the period: August 17, 2018 through Novermber 1, 2019. The fluctuation of these observations can be seen in Fig. 3(a), capturing the nonstationarity in the pattern of stock behavior also reported in the {x t } column of Table 1. The proposed model is based on this fact that the series of actual values in the fuzzy financial yield is conditional stationary, which is shown in Fig. 3(b) and reported in the {c t } column of Table 2. In this case, the last 11 observations are used as testing data for model validation, and the remaining observations have been used as training data to estimate the coefficients of the model.

Trace of the time series of SSE composite index. (a)Interval observation data[
Run test for the central series of financial interval time series/fuzzy financial yield series
Forecasting performance comparison for the time series of SSE composite index
Note 1: Least squares method and linear programming method simply as LS and LP, respectively. The results of risk-averse and risk-neutral method at h=0, termed as RA and RN respectively, are used for comparison. Note 2: The results of run time are obtained from 1000 times repetition for each model procedure.
It is important to build a FAR model with the training data before presenting the risk-neutral FBR model for distincting the given data into two groups because of the the lacking knowledge about the observed data. The FAR model is expressed as:
By (17), sample standard deviation is calculated,
We have implemented a set of fuzzy regression models. Root mean square error(RMSE), mean average percentage error(MAPE), direction accuracy(DA) values and computational performance for comparison of the results obtained from the fuzzy regression models are given in Table 1 as follows (The calculation of performance measure index is shown in Appendix).
The minimum RMSE and MAPE values calculated for the training and testing data sets provide the best performance of the proposed method. Note that DA values of the proposed method are generally higher than those of other fuzzy regression models. In terms of the fitness point, the proposed method is maybe better because of its resistance at the presence of outliers in the case of having little expert knowledge and its flexibility by changing the weights and thresholds. From the computational performance, we observe that FBR-LP model has a slightly higher run time and the proposed model consumes approximate amount of computing time as the other five fuzzy regression models. It is stated that, as these results clarify, the proposed method is significantly capable to estimate the appropriate parameters for reaching the desired outputs.
The statistical distribution of forecasting error indicator for center and spread is demonstrated by box-chart, as shown in Fig. 4. We can clearly observe that the proposed model is more compacted with less abnormalities and has smaller deviation from the mean than the other models. This is primarily due to the fact that identification of outliers, flexibility in weights and thresholds, analysis from the risk-neutral enable the proposed model effectively to produce superior results.

Comparison for boxplot description of forecasting errors in the time series of SSE composite index. (a) Error of center. (b) Error of spread.
The estimated intervals of the previous model maybe tend to become a little fuzzier when data include the high volatility or outliers, and hence, it is wise to apply the intelligence tools to identify nonlinear pattern and improve the forecasting accuracy. Different probabilistic neural network structures are designed to achieve the optimal structure, and the final network consists of four inputs and one output neuron.
There are significant differences between the forecasting interval widths for the training data or testing data attained by Proposal 1 method shown in Fig. 5 and Proposal 2 with a confidence coefficient of 100% seen in Fig. 6. Additionally, the forecasting intervals of testing data obtained by Proposal 1 and Proposal 2 are shown in Table 3.

Forecasting intervals obtained by Proposal 1 in the time series of SSE composite index. (a) Training data. (b) Testing data.

Forecasting intervals obtained by Proposal 2(α = 100%) in the time series of SSE composite index. (a) Training data. (b) Testing data.
Forecasting intervals of testing data
In a comparison study, the baseline models are several traditional time series methods and neural network methods, such as, ARIMA, single exponential smoothing(SES), double exponential smoothing(DES), SVM, multilayer perceptron(MLP), back propagation(BP) neural network and Elman neural network. A pairwise comparison of the forecasting interval widths is performed in Table 4 which reveals that there are significant differences between the average forecasting interval width attained by Proposal 2 with a confidence coefficient of 100% and those obtained by the other forecasting methods with a confidence level 95%. The average width of the forecasting intervals obtained from Proposal 2 is 0.0083, which represents improvement of 80.52%, 79.84%, 78.49%, 78.03%, 76.91%, 76.26%, 74.24% and 60% in the confidence intervals of DES, ARIMA, MLP, SES, SVM, Elman, BP methods and Proposal 1, respectively. It is concluded that Proposal 2 outperforms its counterparts significantly owing to incorporating the neural network.
Pairwise comparison of interval width for the forecasting methods in the time series of SSE composite index
The results of forecasting the risk-neutral FBR model show that the proposed model is applicable to the financial asset price forecasting, i.e. the situation with the lowest error measures, higher direction accuracy and no more run time than other fuzzy regression models. The features and advantages of the fuzzy regression forecasting models ever suggested is inherited by the proposed model, which holds also for the statistical distribution of forecasting error indicator. The risk-neutral FBR model by means of neural network displays significant capability of yielding the minimum widths and offering improvements in the interval estimation case.
Financial time series forecasting has been an popular research field for the last few decades. Since there are fluctuations and uncertainties for financial asset price maybe caused by human subjective perceptions or incomplete knowledge, traditional time series models do not provide a suitable prediction.
We have formulated a risk-neutral FBR model without expert knowledge in which the optimal coefficients considering the properties of not only least squares but also possibility approaches are estimated on quadratic programming algorithm, and forecasting intervals with narrower width are achieved by using PNN. In the formulations of risk-neutral fuzzy regression model without expert knowledge, the different possibility models can be applied to the distinct data. And risk-neutral fitting degree expressing the possibility of equality between a pair of fuzzy numbers is applied in the proposed approach. Multiple objectives are considered in this paper: minimizing the distances between the estimated outputs and the observed outputs, minimizing the spreads of the estimated outputs and the error term. The object of PNN is to identify the subintervals whichhave a greater probability obtained the actual values with the aim of achieveing optimal solutions less costly. It can be seen that our approach is quite flexible by changing the weights and thresholds, especially effectively to deal with data containing outliers. A simulation experiment using a data set from SSE has been carried out in order to analyze the effect of the proposed approach. In consequence, being a new approach in the financial time series forecasting literature, the proposed method is an important reference from the point of considering a robust version of FBR model without expert knowledge which introduces view risk and utilizes neural networks.
However, there exist a few lacks for studying in future with respect to this paper. The fuzzy set theory provides appropriate tools for predicting the dynamic changes in financial asset prices, which could not replace a large number of researches based on randomness. At this point, there might be a need for taking into account fuzzy random variables to handle the applications. After the identification of reliable group and suspicious group through the FAR model has been done, how to determine the acceptable combination of weights and thresholds will be developed in a subsequent paper because there is obviously trade off information between the values and objective function. Besides, some center observations of fuzzy financial yield series may stay out of the estimated intervals obtained by risk-neutral FBR model, and the developed fuzzy regression makes use of AR variables while disregarding MA variables, which clearly remain an interest to be investigated. Finally, it could be a useful topic to extend the proposed method to allow for nonlinearity in the relation between spreads and centers and apply our model in more datasets to show the performance.
Footnotes
Acknowledgment
This paper work was supported by first-class discipline(system science) construction project of Shanghai [Grant No.XTKX2012]; Priority discipline project of Shanghai [Grant No.T0502]; and Foundation of Hujiang [Grant No.B14005].
Conflict of interest
None.
Appendix
Three kinds of performance measure are adopted in this study, as described below:
(1)Root mean square error (RMSE)
(2)Mean absolute percentage error (MAPE)
(3)Direction accuracy (DA)
