Abstract
Based on the double hierarchy linguistic term sets (DHLTS), a novel forecasting model is proposed considering both the internal fluctuation rules and the external correlation of different time series. The innovative aspects of this model consist of: (i) It can expresses more internal fluctuation and external correlation information, providing guarantees for improving the predictive performance of the model. (ii) The equivalent transformation function of DHLTS reduces the fuzzy granularity and improves the prediction accuracy. (iii) The application of similarity measures can extract the closest rules from historical states based on the distance operators of DHLTS. In addition, experiments on TAIEX considering the impact of the U.S. stock market and other data show that the model has good predictive performance.
Introduction
Accurate financial data forecasting can help investors avoid risks and obtain higher returns. However, due to the complexity, uncertainty and nonlinear volatility of the financial market, the accuracy of prediction results is the focus and challenge of scholars. Similar to other time series data, financial data also has a strong statistical dependence on its historical value. Based on this internal historical correlation of time series, some scholars have designed regression analysis models [1], ARIMA [2], seasonal ARIMA [3] and GARCH [4] models. These regression methods first extract rules from historical data and then use the extracted rules to extrapolate future trends. However, due to many random factors and the existence of noise, such rules that rely too much on historical extraction often have the problem of overfitting.
To better reduce the influence of random factors and noise, the fuzzy set theory proposed by Zadeh [5] has entered the field of view of researchers. Song and Chisson [6, 7] proposed a fuzzy time series (FTS) prediction model based on fuzzy set theory in 1993 and constructed a max-min synthetic operator model. Based on the max-min synthesis operator model, Chen [8, 9] proposed a more effective simplification algorithm and further extended the method to the category of higher-order fuzzy time series in order to express the information contained in the historical time series in more detail and obtain more accurate forecast results. Since then, an increasing number of researchers have proposed new high-order fuzzy forecast models and successfully applied them in various fields. The innovation of these prediction models mainly focuses on three levels: fuzziness [10–12], establishing fuzzy logic relationships [13–16], and defuzzification [17].
Numerous studies have shown that the division of fuzzy intervals plays an important role in the process of fuzzification in predicting results. Huarng et al. [18] first proposed two methods for fuzzy interval partitioning, verifying that interval partitioning based on distribution and average has better predictive performance. Chen et al. [19] utilized the proposed automatic clustering algorithm to optimize the partitioning of the discourse universe and improve the predictive performance of the model. Egrioglu et al. [20] applied univariate constrained optimization technology to determine the optimal interval length of fuzzy intervals and optimized the fuzzy prediction model based on this method. In the optimization research of establishing fuzzy relationships and de fuzzification, there are constantly innovative combinations of information forms and calculation methods. Yu [21] proposed a weighted model based on cyclic fuzzy relationships to predict the Taiwan stock index. Askari et al. [22] proposed a prediction algorithm based on Fuzzy clustering (CFTS), which is not based on FLR but uses linear combination of input variables for prediction. Yu et al. [23] used neural networks to extract rules from fuzzy time series, improve the extraction efficiency of fuzzy relations, and successfully forecasted the stock price index. Guan et al. [24] combined fuzzy time series with BP neural network to learn the constructed fuzzy information, and conducted experiments on the closing prices of TAIEX and SHSECI. Overall, the innovation of the three key parts of the fuzzy model provides ideas for subsequent research, but the above methods are difficult to reasonably incorporate other factors into the model when dealing with multi factor problems.
Recently, the multifactor fuzzy time series model has become a hot research topic. Considering complex relationships within or between markets, many researchers have introduced other time series with strong correlation into the forecast model to improve the original model and its performance. For instance, Wang and Chen [25] proposed a temperature prediction model that considers multiple factors, which is based on automatic clustering technology and considers two factors for prediction. Li et al. [26] proposed another temperature prediction model based on different logical structures, which uses annealing algorithm to adjust the length of fuzzy intervals to achieve more accurate prediction. Singh and Borah [27] combine fuzzy set theory with neural network, and cluster historical fluctuations into different groups based on the learning ability of artificial neural network to determine the interval of historical time series. These innovative methods provide ideas for the development and improvement of multi factor fuzzy prediction. The application of fuzzy theory in time series prediction has further developed, especially in the prediction of financial data. Many studies have established multi factor prediction models using relevant factors in the market. For example, Kumar [28] combines weighted methods to construct fuzzy logic relationships and utilizes the opening and highest prices of financial markets to construct a multi factor prediction model. This method can divide different intervals based on the main factor and the second factor, and consider the membership value to extract logical rules based on triangular fuzzy numbers, thereby significantly improving the prediction output. Guan et al. [29] expressed the upward, flat, and downward trends in historical fluctuations using multivalued linguistic neutrosophic sets, and extracted the optimal logical relationship based on similarity. Zhao et al. [30] applied probabilistic linguistic term sets to express fluctuation probability, constructed an information set by combining logic rules with similar histories, and made predictions using distance measurement as the standard for logical extraction. From the existing research, it can be seen that introducing other relevant time series into the prediction model can reduce the deviation caused by randomness. However, the introduction of multiple time series often makes the model too complex. This requires some new theories to express historical states with multiple factors.
Zadeh first introduced the concept of word computing and proposed the linguistic term set (LTS). Many scholars have expanded LTS from syntactic and semantic aspects, such as Interval Type-2 Fuzzy set [31], HFLTS [32], PLTS [33], and FLE [34]. The characteristics of these information forms can simplify the expression of complex information. Introducing them into predictive models can solve a series of problems such as the difficulty of embedding complex information into computational methods and constructing reasonable logical relationships. Pinto et al. [35] combined interval type 2 fuzzy set with self-organizing direction aware data partitioning algorithm (SODA) to build a prediction model with adaptive uncertainty processing capability; Dong et al. [36] modeled the predicted time series by combining multiple relevant factors through hesitation expression based on hesitation fuzzy set; Pattanayak et al. [37] processed multiple fluctuation uncertainties in the historical state of time series based on the probability expression and membership relationship of probabilistic intuitionistic fuzzy set; Yolcu et al. [38] combined the membership relationship of intuitionistic fuzzy sets and the adaptive characteristics of neural networks to build a prediction model. This model can combine the linear and nonlinear relationships in financial time series. The above research simplified the expression of historical information using some advantageous information forms to establish more outstanding prediction models. However, the above models still have different shortcomings in the expression or fusion of complex information.
Recently, Guo [39] et al. defined a double hierarchy linguistic term set (DHLTS) to adapt to linguistic term expression in many more complex environments. Compared with other linguistic term sets, the DHLTS can precisely express the real meaning in more detail and eliminate the influence of fuzzy granularity in the calculation. Double hierarchy linguistic term sets and their extensions are widely used in multi-attribute decision-making. For example, Li and Xu et al. [40] used DHLTS to characterize the decision-making attitudes of different decision-makers in the context of enterprise cooperation during the COVID-19 epidemic. They built a new theoretical model of decision-making, and verified the practicability of the proposed method. Fu et al. [41] compared and quantitatively aggregated language information from different experts by using the semantic model of UDHLTS and applied it to an engineering example of green mine selection. Although DHLTS have gained attention in decision-making, they are rarely used in the field of prediction. DHLTS has two linguistic term sets with different meanings. It can express information from multiple dimensions, such as information content and degree of matched. The characters of DHLTS provide a strong guarantee for expressing the key information in the time series as fully as possible. Meanwhile, DHLTS is a concise and effective quantification method, which makes it possible to obtain prediction experience and rules no longer a “black box” process.
This article introduces DHLTS to construct a prediction model. It takes the main time series fluctuation information as the first hierarchy linguistic term set, the fuzzy gap between the main time series and the related time series as the second hierarchy linguistic term set to describe the correlation between them. Then, a new fuzzy logical relationship is defined according to the concept of DHLTS, and a new fuzzy forecasting model is constructed by combining the new fuzzy logical relationship with the equivalent transformation function and the measurement of similarity. The contributions of our paper are as follows: This paper introduced the DHLTS to express the common impact by internal volatility rules and external factors from a novel perspective. The linguistic characteristics of the DHLTS reduce the granularity effect brought by fuzziness and make the model have higher prediction performance. This model ensures the non “black box” process of experience and rule acquisition through concise and effective quantification, providing the possibility of achieving transparency in the prediction process.
Considering the impact of the US stock market on other stock markets, the TAIEX and SSE Composite index were used for experimental analysis to verify the effectiveness of the model. Experiments prove that the model has better universality and accuracy.
The remaining sections of this paper are arranged as follows. The theories of fuzzy time series and DHLTS are introduced in the second part. In the third part, the constructions of double hierarchy linguistic logic rules and the prediction method are described. The empirical test of the forecasting model and the test results are presented in the fourth part. In the fifth part, the conclusion is summarized.
Preliminaries
In this section, we illustrate relevant theories and definitions through some examples.
Definition of fuzzy-fluctuation time series
Let S be an LTS, according to its meaning, the LTS S ={ S-2, S-1, S0, S1, S2 } = { sharply down, down, equal, up, sharply up } can describe the dynamic trend of the time series, where both S z and z are increasing strictly monotonically.
This FFLR of the FFTS Y(t) is made up of the two parts, the left Y (t - n) , …, Y (t - 2) , Y (t - 1) and the right Y(t). The left and the right are denoted as LHS and RHS, respectively.
In 2017, on the basis of numerous studies, a double hierarchy linguistic term set (DHLTS) was designed by Gou et al.
In addition, under different α values, the second hierarchy LTS needs to be further explained in different cases. For the first hierarchy LTS

The two parts of the second hierarchy LTS.
To express the rise and fall of the stock market, minor adjustments to f are needed. According to the above definition and the division of the fuzzy interval, the value range of the equivalent transformation function is shifted to the symmetrical distribution on both sides of the origin, and the dividing point coefficient with the largest absolute value is uniformly expanded according to the fuzzy interval division. The conversion function f’ is obtained as follows:
The left-hand side of DHLLR DL (t) is
We develop a new forecast model according to the forecast scenario based on DHLLRs. The method shown in Fig. 2 has 5 parts. Additionally, the detailed steps in the model are described in detail in the following paragraphs. To consider the strong correlation between stock markets and compare model performance with existing works, we use the TAXEI in 2004 and the S&P index in 2004 as initial data. Taking November 1 as the demarcation point, the data are divided into two parts: the former is the training set, and the latter is the testing set.

Flow chart of the new prediction model.
Step 1. Generation of FFTS
Let X (t) (t = 1, 2, …, T) be an original time series, and let F (t) = X (t) - X (t - 1) (t = 2, 3, …, T) be its corresponding FTS. Then, we can calculate the total mean len of all elements in the FTS F (t) to define the fuzzy interval. Define intervals as e-2 = (- ∞ , -1.5len] , e-1 = (- 1.5len, - 0.5len] , e0 = (- 0.5len, 0.5len) , e1 = [0.5len, 1.5len) , e2 = [1.5len, + ∞). Corresponding to the intervals, an LTS can be defined as S = {S-2 : sharply down, S-1 : down, S0 : equal, S1 :up, S2 : sharply up}. In this way, the time series of TAIEX can be fuzzified into an FFTS Y I (t), and the time series of the S & P index can be fuzzified into a fuzzy fluctuation time series Y II (t), according to Definition 2.
Step 2. Generation of TDTS and LTS G
Considering the impact of the US stock market on the Taiwan stock market, we use the difference between Y II (t) (fuzzy fluctuation of S & P index) and Y I (t) (fuzzy fluctuation of TAIEX) as the trend bias of TAIEX fluctuation. For these two fuzzy fluctuation time series, the trend difference time series (TDTS) can be obtained by calculating the subscripts of linguistic term Sα as shown in equation D (t) = Y II (t) - Y I (t). We fuzzily express this information based on the magnitude of trend differences to construct an LTS G, as shown in Fig. 3, and further obtain an FFTS G (t) based on D (t).

Generation of LTS G.
Step 3. Establishment of DHLTS
Based on the description of the DHLTS in Definition 4, the second hierarchy LTS is a further description of the first hierarchy LTS. We take the LTS S corresponding to the FFTSI as the first hierarchy LTS and take the LTS G corresponding to the TDTS as the second hierarchy LTS. On this basis, taking into account the symmetry of stock market fluctuations and the similarity of fluctuation ranges, we construct an appropriate DHLTS for stock market forecasting based on the two different parts contained in the DHLTS in Fig. 1, as shown in Fig. 4.

Construction of DHLTS.
Step 4. Establishment of DHLLRs
In the previous steps we have obtained the two main FFTS (Y I (t) and G (t) (t = n, n + 1, …, T, n ⩾ 2)) and at the same time sorted out the construction of the DHLTS. Furthermore, according to the Definition 4, the two fuzzy linguistic terms corresponding to time can be expressed as a DHLT. After strictly arranging in chronological order, a DHLT time series can be obtained. The resulting DHLT time series integrates an expression of volatility as well as the impact of external stock market trends. According to Definition 6, the DHLLR corresponding to each time point can be obtained in chronological order and according to a predetermined nth-order, as shown in Fig. 5.

Construction of DHHLRs.
Step 5. Forecasting of future
DL′ (t) (t = 2 + order, 3 + order, …, ρ) is used to represent the relevant historical state of each observation point X (t) in the testing set. Then, we can calculate the distance between DL (k) and each DL (t) (t = 2 + order, 3 + order, …, ρ), as described in Definition 7. After that, some similar DL (t) s are found according to the selected the closest number ϑ and each corresponding DR(t) is found through DHLLRs. As described in Definition 8, all the found RHSs are grouped into
The fuzzy prediction model constructed based on the above five steps is logically rigorous. Firstly, DHLTS is applied to express the fusion of the two factors, and then the logical relationship information base of historical data is constructed. When data of a certain date need to be predicted, fluctuation states with historical similarity are found from the logical relationship information base according to their corresponding logical relationship LHS to make predictions for the future. For example, Figs. 5 and 6 clearly express the process of generating the prediction results. All prediction results can be interpreted based on historical experience, and the prediction process is transparent, traceable and highly interpretable.

Rule extraction and prediction.
This section consists of three parts: (I) Using stock index datasets to prove the constructed model. (II) Exploring whether different numbers of similar LHSs selected and different orders can improve the model performance. (III) Using carbon price datasets to validate the constructed model.
Prediction of TAIEX considering stock market correlation
TAIEX is an internationally well-known stock index, and it has been used by many studies. For the sake of evaluating our model, we use TAIEX in 2004 as the dataset to be predicted. At the same time, based on the correlation between stock markets and the similarity of the index calculation formula, the S&P index in 2004 is the second dataset to be studied. Taking November 1 as the dividing point, we divide the dataset into a training set and a testing set.
Step 1: First, the corresponding fluctuation values of the TAIEX and S&P index in 2004 need to be calculated. Second, the overall absolute average of the two FTSs corresponding to the two original datasets in the training set section is calculated. In this way, the overall absolute average values of the fluctuation values in the historical training datasets of TAIEX and the S&P index in 2004 are lenI = 66.87 and lenII = 6.27, respectively. According to the two calculated mean values lenI and lenII, the fuzzy intervals are divided. Then, based on the division of fuzzy intervals, the two FTSs can be fuzzied into two FFTSs.
Step 2: Considering the influence in the Taiwan stock market from the US stock market, the difference in the fluctuation trend between the S&P index and TAIEX is calculated and we take this as the partial adjustment of the fluctuation trend of TAIEX. In this experiment, the trend difference time series (TDTS) can be calculated as D (t) = Y II (t) - Y I (t).
Step 3: To consider the correlation between stock markets in the forecast, we combine TDTS and FFTSI in DHLTS by taking FFTSI as the first hierarchy LTS and TDTS as the second hierarchy LTS, and thus the DHLTS can be obtained. Then, FFLR can be converted into DHLLR.
Step 4: According to the fluctuation of each time point and its corresponding historical fluctuation, the 5th-order FFLR S0, S-2, S0, S1, S0 → S-1 of TAIEX can be built. For instance, the 5th-order fuzzy fluctuation and next fuzzy fluctuation are S0, S-2, S0, S1, S0, S-1, and their corresponding trend differences are G-1, G2, G1, G-1, G0, G1. Because the former can be further described by the latter, they can be converted to a DHLTS
Step 5: All DHLLRs are collated from the training dataset and used to predict the test dataset. To further explain this step, we elaborate on the prediction of TAIEX on December 1. First, we find the LHS

RHSs combination with similar LHSs to the forecast date.
On this basis, in accordance with the score function of DHLTS, the fuzzy forecasting fluctuation can be obtained as:
According to the above steps, we can obtain the predicted value and the error for each time point in the testing set. Table 1 and Fig. 8 show the predicted results in a more intuitive fashion.
Forecast result and error from 1 November to 31 December

Forecasting results from 1 November 2004 to 31 December 2004.
For the performance forecasted by the model, we can evaluate the combination property of the model by examining the difference between the forecasted values and the actual values. Many widely accepted indicators are used for difference comparison, such as RMSE, MAPE and MdRAE. They are defined as follows:
The order number of this model was set to 5, and the number of similar DHLLRs selected ϑ is set to 5. After the experiment on the data in 2004, MSE, MPE, RMSE and MAPE were 2752.65, 52.46, 0.65%, and 0.66%, respectively.
Based on the above analysis, more data were tested. The detailed forecast results of TAIEX from 1998 to 2006 are shown in Fig. 9 and Table 2.

The actual value and forecast value of the TAIEX (1998-2006).
RMSEs and MAPEs of forecasting results in TAIEX from 1998 to 2006
We compared the constructed model with other models in accordance with the prediction results of TAIEX from 1999 to 2004. Table 3 shows a preliminary comparison of RMSE for each model. In the prediction of TAIEX for several consecutive years, the RMSE of the constructed model is the lowest in five years, and the result shows the stability and accuracy of the constructed model. The compared models include a single-factor high-order fuzzy prediction model and a multifactor high-order fuzzy prediction model, some of which use other methods to improve the model performance. Under the preliminary comparison of the predicted results of different models for 6 consecutive years, the excellent performance of the proposed model in predicting TAIEX can be clearly demonstrated.
A comparison of RMSEs for forecasting TAIEX 2001–2004
Bold emphasis to indicate the lowest RMSE in the same time series.
To further analyze the performance of different methods, we refer to the research of Derrac et al. [43] and Li et al. [44], and introduce Wilcoxon sign rank test and Friedman test to compare and analyze the prediction accuracy of different methods. This article uses these two tests to ensure that the contribution of the proposed model is significant. In the Wilcoxon signed rank test and Friedman test, the original assumptions are that there are no significant difference in prediction accuracy between the compared models. When the p-value is less than 0.05, reject the original hypothesis.
In Table 4, based on the Friedman test, the proposed model exhibits significant differences in predicting TAIEX compared to other models; Based on the Wilcoxon signed rank test, the predictive performance of our model is generally superior to other comparative models. Although the comparison with Cheng et al.’s method is not significant, it is evident from the comparison of RMSE predicted for 6 consecutive years that the proposed method outperforms Cheng et al.’s method.
Results of Friedman test and Wilcoxon signed-rank test
To test the model’s performance in other markets and whether it is universal, we took the SSE Composite Index to conduct further experiments. We took the SSE Composite Index as the dataset to be predicted and selected the Dow Jones industrial Average index to represent the trend interference from the US stock market. The data of each year were divided into two parts based on the cut-off point of November 1 and forecasted by our model. Table 5 shows the RMSEs and MdRAEs in the predicted results for each year from 2013 to 2018.
RMSEs and MAPEs of forecasting results in SSE Composite Index from 2010 to 2018
RMSEs and MAPEs of forecasting results in SSE Composite Index from 2010 to 2018
In further analysis, we compared the prediction results of different methods using two indicators: average MAPE and average MdRAE. MAPE is more convenient to determine the proportion of prediction error in actual data and to evaluate the impact of prediction results on subsequent analysis. And MdRAE can evaluate the stability and accuracy of model predictions from a median perspective.
In the comparison in Table 6, the average MAPE of the proposed model is 0.85%, which is in the second place, lower than Dong’s method [47], but has the optimal average MdRAE. Overall, in the prediction of SSEC, although the evaluation indicators of the proposed model are not all optimal, it can still prove that the model has significant accuracy and stability.
A comparison of RMSEs and MdRAE for forecasting SSE Composite Index
The proposed model can not only be used to predict the stock price index, but also has a good forecasting performance for other data sets affected by external trends. In this part, based on the correlation between the gold price and the US dollar index, we use the proposed model to forecast the gold price. Different from the same direction fluctuation of stock price index in different markets, gold price and US dollar index have opposite fluctuation trend, which can be used as the second factor to predict gold price. Further, based on the definition of DHLTS, we can integrate and express the fluctuation of gold price and the trend impact brought by the fluctuation of US dollar index into the model. We further examine the model’s performance based on Comex gold price and US dollar index data from 2015 to 2019.
The prediction results in Table 7 show that although RMSE has fluctuated from 2015 to 2019, it is still at a relatively high level of accuracy, basically stable below 10, and the average RMSE of the predicted results for 5 consecutive years is 8.75. In addition, from the perspective of MAPE, it has fluctuated around 0.50% for five consecutive years, and the MAPE for three consecutive years from 2016 to 2018 has been below 0.50%. Overall, the model can also make good predictions about gold prices.
RMSEs and MAPEs of forecasting results in gold price from 2015 to 2019
RMSEs and MAPEs of forecasting results in gold price from 2015 to 2019
Considering the importance of the inherent connection between the stock market and forecasting, this paper proposes a forecasting model based on DHLTS. This model can incorporate the dynamic relationship between stock markets as a second factor into the prediction model, and proposes a DHLLR based on traditional FFLR. The application of DHLTS further refines the expression of influencing factors in historical states. For example, in stock price prediction, it effectively expresses the dynamic relationship with other stock markets while expressing the state of the stock market, and can achieve good prediction results in experiments on multiple datasets.
Overall, the model proposed in this article has many advantages. Firstly, the introduction of DHLTS in this model can effectively express the fluctuation state of time series affected by external trends; secondly, DHLLR is constructed so that the model can take into account the impact of external trends and make predictions based strictly on historical experience; at the same time, the model has a good theoretical foundation and is easy to implement through computer programming. In addition, since the model is based on information databases (training sets) and distance measures to extract logical rules, the selection of parameters will have an impact on prediction performance. Therefore, based on this model, our future research will explore the combination with some parameter optimization algorithms and how to reasonably consider more influencing factors to meet the needs of more accurate prediction in the future.
