A new fuzzy time series forecasting method based on clustering and weighted average approach

Abstract

Time series is a classification of data series of variables, orderly arranged with respect to time. In time series analysis, forecasting is the vital area of study besides other meaningful characteristics of the data. It has vast application in decision-making and prediction in the domain of economics, agriculture, medicine, industries, energy sector and other sciences. Fuzzy time series emerged as a robust tool to cater for historical data in linguistic values. This paper proposes the new method of fuzzy time series forecasting based on the approach of fuzzy clustering and information granules integrated with the weighted average approach to deal with the uncertainty in data series. To distinguish the power of modeling and prediction, the strategies of Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) are utilized as a criterion. Findings illustrate that proposed fuzzy based time series approach is vigorous to compute the accurate estimates.

Keywords

Fuzzy set fuzzy c-means clustering information granules fuzzy time series forecasting

1 Introduction

In practice consistent performance analysis, few would be contended about the need and significance of prediction as policy makers are concerned in assessing the future events of system disaster for the capital planning, stock management, creating reasonable opportunities for age substitution and logistic support. Intact, realistically most frameworks are repairable, and its significance measures change with time. Considering this rectification as a time series process, the “development” or “decline” of the system can be assessed.

The theory of fuzzy time series was first introduced by Song and Chissom [1] based on fuzzy sets premised independently by Zadeh [2] as an extension of the classical set theory. Fuzzy time series models perform without any assumptions that the conventional statistical models require. In addition, ecological information such as temperature, financial data sets of the stock exchange, energy consumption in different sectors and data series collections in numerous domains ought to be considered as fuzzy time series due to the ambiguous and uncertainty that they contain. Due to the above-mentioned reasons, intense for researchers in fuzzy time series is boosting.

In recent years, besides the formation of new soft computing mechanisms practiced in different fields such as privacy and security [3 –6], optimal solutions using genetic algorithms [7, 8] and medical domain [9], fuzzy time series prediction analysis has been broadly utilized in various domains. It includes the study of enrollments [10 –12], prediction analysis of temperature [13, 14] and stock index [15 –18]. Due to the significance of FTS to tackle the genuine issues, many models of different types have been developed and, mostly, are related to the algorithm of Chen’s model [10]. These investigations address various issues to improve the forecasting accuracy, including partitioning of data set into different space intervals [19 –22], number of variables [23, 24] and order [25, 26].

The process of defuzzification in FTS forecasting, affectively influenced by interval partitioning. In the earlier proposed algorithms, fuzzification was examined as optimization of an interval partitioning based on equal space intervals [1], average based partitioning [19] and ratio based partitioning [20]. After that, to improve the accuracy in forecasting, many novel approaches are introduced in the literature, such as genetic algorithm [27], particle swarm optimization (PSO) [28], single variable based optimization model [29], clustering approach [22], entropy-based partitioning [30] and fuzzy k-Medoid clustering method [31]. Since the FTS algorithms based on a clustering approach do not utilize intervals, rather use cluster centers in fuzzification of information. Fuzzy c-means clustering (FCM) [32] is widely used in the generation of forecasting algorithms, which is soft form of K-Means algorithm. Due to uncertainty and vagueness in data series, the prescribed algorithm generates less accurate outcomes and to cope with the perceptive issues, some modifications have been made to increase the accuracy in forecasting [33, 34]. To handle the issue of optimize partitioning; a vigorous methodology is required, which contains the property of adequacy to improve the accuracy in forecasting.

In this paper, the historical data series is clustered by using the approach of FCM [32] and information granules [35] which relies on the concept introduced in [36]. Unequal length of intervals are established and then fuzzified the input data using triangular membership function. In the end, the output is generated after defuzzification as the weighted average approach is employed to get crisp forecasts.

The rest of the paper is described as: section 2 reviews about the basic approaches of fuzzy sets, fuzzy time series, triangular membership function, fuzzy c-means and information granules. In section 3, the proposed fuzzy c-means clustering method integrated with the concept of weighted average approach is briefly introduced. Section 4 consists of an implementation of the proposed method on a historical data set and evaluation in terms of statistical accuracy measures such as RMSE and MAE. Finally, section 5 serves as a conclusion and recommendations of our present study.

2 Preliminaries

This section contains the review of basic definitions of fuzzy sets [2], fuzzy time series [1] and other theories.

2.1 Fuzzy time series

Definition 1. Let Y (t) (t = …0, 1, 2, …) be the universe of discourse on which fuzzy set f_i (t) (i = 1, 2, …) are characterized. Let F (t) is the aggregation of sets f₁ (t), f₂ (t) , …. Then, F (t) is known as fuzzy time series on Y (t) (t = 0, 1, 2, …).

Definition 2. Let there exist a fuzzy logical relationship R (t - 1, t), such that F (t) = F (t - 1) ₀R (t - 1, t), and “₀” express as operator, then F (t) is caused by F (t - 1).

Definition 3. If F (t - 1) = A_i and F (t) = A_j, then relationship between F (t) and F (t - 1) can be expressed by a fuzzy logical relationship (FLR) [10] A_i → A_j. If fuzzy logical relationships such that, A_i → A_j1, A_i → A_j2, …, A_i → A_jn, then these fuzzy logical relationships (FLRG) can be grouped as A_i → A_j1A_j2, …, A_jn.

Definition 4. Let a fuzzy set is denoted by F, and then a triangular membership function h_f can be characterized as; h_f : Y → [0, 1], where components of Y are assigned values 0 to 1. The prescribed triangular membership function can be expressed as $h_{F} (Y) = {\begin{matrix} \begin{matrix} 0, & y ⩽ a \\ \frac{y - a}{b - a} & a < y ⩽ b \\ \frac{b - y}{b - c} & c < y < b \end{matrix} \\ \begin{matrix} 0, & y ⩾ b \end{matrix} \end{matrix}}$ (1)

2.2 Fuzzy c-means (FCM)

The FCM clustering technique was first presented by Bezdek [32]. This is the most broadly applied clustering method that deals with the uncertainty in variables. This approach minimizes the least-squares with the execution of fuzzy clusters. Let h_ji denotes the membership value, n is the number of variables and c describes the number of clusters. Then the objective function with minimizing in the fuzzy clustering environment is as follows: $J_{β} (Y, V) = \sum_{j = 1}^{n} \sum_{i = 1}^{c} {(h_{ji})}^{β} d^{2} (y_{j}, ν_{i})$ (2)

Where β is called the fuzzy index, which satisfy the condition such that β > 1, d (y_j, ν_i) is the similarity measure between the observation and center of the cluster. The function J_β (Y, V) has the following constraints $\begin{matrix} 0 ⩽ h_{ji} ⩽ 1 \\ 0 ⩽ \sum_{j = 1}^{n} h_{ji} ⩽ n \\ \sum_{j = 1}^{n} h_{ji} = 1 \end{matrix}$

2.3 Information granules

The theory of information granules was firstly addressed by Zadeh [37]. It considers disintegration a whole into small elements, and each element is considered as a granule. Bargiela [35] exhibited the idea of recursive data granulation with tending to the issues of conglomeration and assessment. This advancement can be shown in a specific proper structure of the granular computation. In this paper, the concern is going to build up a solitary data granule depend upon some time-series information. Being developed procedure of data granule, reasonable granularity and semantic adequacy are the two instinctively convincing prerequisites. The state of reasonable granularity is achieved by including the more information inside the limits of Ω. Furthermore, the prerequisite of semantic is practiced by the length of the granule. Consider a data series of a variable y₁, y₂, . . . , y_n, spread over a time period t₁, t₂, . . . , t_n; then data granule Ω₁, Ω₂, . . . , Ω_n can be achieved by considering the idea of legitimate granularity to the informational index y₁, y₂, . . . , y_n.

3 Proposed fuzzy time series method

In this section, the complete process of the proposed algorithm is presented for forecasting of time series data. On the basis of FCM and information granules clusters and effective intervals [36] are developed, triangular membership function is used to fuzzify the time series data, fuzzy logical relationships are established and defuzzification rule of the weighted average is applied to get crisp forecasts. The proposed fuzzy time series algorithm is presented as follows:

Step 1. First of all, need to confirm whether data is stationary or not, because sometimes highly fluctuate data affect the accuracy of forecasting.

Step 2. After confirmation of stationarity, determine the universe of discourse. If A_max and A_min be the integers classifies as maximum and minimum observed values of the time series data such as A_max = max{ y_i|y_i ∈ A } and A_min = min{ y_i|y_i ∈ A } then the universe of discourse is defined as U = [A_min - A₁, A_max + A₂] or U = [U₁, U₂].

Step 3. Partition the universe of discourse into several effective lengths of intervals. In fuzzy time series, the incredible significance for researchers is to establish the effective length of intervals for the universe of discourse to get the exact anticipating results. In this context, the framework contains the way towards apportioning the universe of discourse, computation of prototypes utilizing the FCM theory and after that calculation of data granules of subsets to get the suitable effective length of intervals. The process of partitioning, clustering and to obtain the effective length of intervals is described as follows:

Compute the number of clusters c and prototypes v_i. let c = k/2, then the number of clusters should not exceed the k/2 integer.

Compute the prototypes {v₁, v₂, . . . , v_c} using FCM approach. On the basis of already established c clusters and for prototypes ν₁ < ν₂ < … ν_c, the mid points can be calculated as $x_{g} = \frac{v_{i} + v_{i + 1}}{2}$ where (g = 1, 2, …, c - 1). The subsets for data series can be constructed as $\begin{matrix} A_{1} = {y \in A | A_{\min} ⩽ y ⩽ x_{1}}, \\ A_{2} = {y \in A | x_{1} ⩽ y ⩽ x_{2}}, . . ., \\ A_{c} = {y \in A | x_{c - 1} < y ⩽ A_{\max}} \end{matrix}$

At this level, information granules are constructed. If v_i represents the prototypes of subset A_i, such that med (A_i) = v_i then upper and lower bounds for information granules are specified as l_i and m_i.

Obtain the optimize intervals with unequal length for universe of discourse as [ue₁, ue₂, . . . ue_k].

Step 4. Compute the membership grades for fuzzification process of historical data series by utilizing the information of optimize space intervals generated in the previous step. For this purpose, employ a triangular membership function described in Definition 4.

Step 5. In this step, after computing the membership grades, establish the fuzzy logical relationships (FLR) and fuzzy logical relationships group (FLRG), explained in Definition 3.

Step 6. In spite of the use of exponential smoothing methods in fuzzy time series, still, it has space to apply its different modified forms in fuzzy time series to get accurate forecasts. In the modeling, initial smoothed observation and weighted constant affect the minimum error forecasts. Although, the broad work and surveys on the choice of initial smoothed value have been done, in any case, there is an absence of consensus on the choice of these constants. In this step, on the basis of fuzzy logical relationships, select the significant weighted constant value corresponding to each observed value in historical data, and calculate the crisp forecast with the following formula. ${\hat{y}}_{t} = h_{ji} c_{z} + \sum_{r = 1}^{p} \frac{(1 - h_{ji})}{p} (c_{zr})$ (3)

Where h_ji is the smoothing or weighted constant for the current year of observed data series, ${\hat{y}}_{t}$ is the forecasted value for the current year, p is the number of fuzzy logical relationships and c_z is the center value of a corresponding fuzzy group, where (z = 1, 2, …, k). The complete architecture of the proposed model is illustrated in Fig. 1.

Fig. 1

Architecture of the proposed fuzzy time series model.

3.1 Performance measures

3.1.1 Mean absolute error (MAE)

It is the average absolute difference of actual and forecasted values. Its equation can be expressed as $MAE = \frac{1}{n} \sum_{i = 1}^{n} y_{i -} \hat{y}$ (4)

3.1.2 Root Mean Squared Error (RMSE)

Root Mean Squared Error is a measure of accuracy, used for comparison of different forecasting models by calculating forecasting errors of specific data set. It is written as $RMSE = {[\sum_{i = 1}^{n} \frac{1}{n} {({\hat{y}}_{-} y_{i})}^{2}]}^{\frac{1}{2}}$ (5) where y_i shows actual value and $\hat{y}$ indicates the forecasted value.

4 Forecasting and model evaluation

The proposed model is implemented on one of the food crops production data. The inspiration to utilize crop yield production data is that the estimation of food is one of the genuine issues because of vagueness in known and some obscure parameters. For this purpose, time series data of gram (chickpea) of Pakistan is taken from the website of the Food and Agriculture Organization of the United Nations. (http://www.fao.org/faostat/en/#data/TP).

4.1 Results and discussion

Description of the data regarding the production of gram pulse of Pakistan is presented in Table 1. The maximum figure for production of gram pulse in Pakistan was 868.3 thousand tons. The minimum value for production was 284.304 thousand tons in the period 1987–2013. The time series data of gram pulse is plotted in Fig. 2.

Table 1
Descriptive statistics regarding production of gram pulse

Variable Mean S.E MEAN Standard Deviation Median Kurtosis skewness

Production (000) tons 488.9 29.6 153.8 456.4 –0.83 0.36

Variable	Mean	S.E MEAN	Standard Deviation	Median	Kurtosis	skewness
Production (000) tons	488.9	29.6	153.8	456.4	–0.83	0.36

Fig. 2

Production of gram pulse crop in Pakistan.

Step 1. Stationary test: To check whether data is stationary or not, Augmented Dickey-Fuller (ADF) test is applied on the gram production data, Software generated results, and graphical presentation (Fig. 3) reveals that data series is stationary.

Fig. 3

plot of ACF for production.

Step 2. After confirmation of stationarity, determined the universe of discourse. A_max = 868.3 and A_min = 284.304 be the integers classifies as maximum and minimum observed values of the time series data then universe of discourse is defined as U = [260, 890], where A₁ = 24.304 and A₂ = 21.7.

Step 3. The process of partitioning, clustering and to ascertain the effective length of intervals is described as follows:

Number of effective unequal intervals to be estimated are 14, where c = 7 clusters. Then prototypes v₇ are given as. $\begin{matrix} v_{1} = 341.301, v_{2} = 434.575, \\ v_{3} = 495.325, v_{4} = 553.225, \\ v_{5} = 611.300, v_{6} = 717.306, \\ v_{7} = 824.467 \end{matrix}$

On the basis of already computed prototypes v₇, the mid points x_g are calculated as follows.

$\begin{matrix} x_{1} = \frac{v_{1} + v_{2}}{2} = 387.938, x_{2} = \frac{v_{2} + v_{3}}{2} = 464.95, \\ x_{3} = \frac{v_{3} + v_{4}}{2} = 524.275, x_{4} = \frac{v_{4} + v_{5}}{2} = 582.263, \\ x_{5} = \frac{v_{5} + v_{6}}{2} = 664.303, x_{6} = \frac{v_{6} + v_{7}}{2} = 770.88 \end{matrix}$

By following the instruction given in section 3, the subsets are established. $\begin{matrix} A_{1} = {y \in A | 284.304 ⩽ y ⩽ 387.938}, \\ A_{2} = {y \in A | 387.938 ⩽ y ⩽ 464.950}, \\ A_{3} = {y \in A | 464.950 ⩽ y ⩽ 524.275}, \\ A_{4} = {y \in A | 524.275 ⩽ y ⩽ 582.263}, \\ A_{5} = {y \in A | 582.263 ⩽ y ⩽ 664.303}, \\ A_{6} = {y \in A | 664.303 ⩽ y ⩽ 770.880}, \\ A_{7} = {y \in A | 770.880 ⩽ y ⩽ 868.300} \end{matrix}$

If v_i represents the prototypes of subset A_i (i = 1, 2, …, 7) then $\begin{matrix} med (A_{1}) = 341.301, med (A_{2}) = 434.575, \\ med (A_{3}) = 495.325, med (A_{4}) = 553.225, \\ med (A_{5}) = 611.300, med (A_{6}) = 717.306, \\ med (A_{7}) = 824.467 . \end{matrix}$

Using the above information, upper and lower bounds for information granules Ω_i are specified as l_i and m_i presented in Table 2.

Obtained the 14 optimize intervals with unequal lengths given as.

$\begin{matrix} U_{e 1} = [U_{1}, \frac{U_{1} + med (A_{1})}{2}] = [260.00, 300.65], \\ U_{e 2} = [\frac{U_{1} + med (A_{1})}{2}, med (A_{1})] = [300.65, 341.31], \\ U_{e 3} = [med (A_{1}), \frac{m_{1} + l_{2}}{2}] = [341.31, 384.25], \\ U_{e 4} = [\frac{m_{1} + l_{2}}{2}, med (A_{2})] = [384.25, 434.58], \\ U_{e 5} = [med (A_{2}), \frac{m_{2} + l_{3}}{2}] = [434.58, 477.05], \\ U_{e 6} = [\frac{m_{2} + l_{3}}{2}, med (A_{3})] = [477.05, 495.33], \\ U_{e 7} = [med (A_{3}), \frac{m_{3} + l_{4}}{2}] = [495.33, 521.90], \\ U_{e 8} = [\frac{m_{3} + l_{4}}{2}, med (A_{4})] = [521.90, 553.23], \\ U_{e 9} = [med (A_{4}), \frac{m_{4} + l_{5}}{2}] = [553.23, 563.20], \\ U_{e 10} = [\frac{m_{4} + l_{5}}{2}, med (A_{5})] = [563.20, 611.30], \\ U_{e 11} = [med (A_{5}), \frac{m_{4} + l_{5}}{2}] = [611.30, 677.40], \\ U_{e 12} = [\frac{m_{5} + l_{6}}{2}, med (A_{6})] = [677.40, 717.31], \\ U_{e 13} = [med (A_{6}), \frac{m_{6} + l_{7}}{2}] = [717.31, 759.16], \\ U_{e 14} = [\frac{m_{6} + l_{7}}{2}, U_{U}] = [759.16, 890] \end{matrix}$

Table 2

Information Granules

Ω₁ [l₁, m₁]	Ω₂ [l₂, m₂]	Ω₃ [l₃, m₃]	Ω₄ [l₄, m₄]	Ω₅ [l₅, m₅]	Ω₆ [l₆, m₆]	Ω₇ [l₇, m₇]
[284.304, 371.5]	[397, 474.6]	[479.5, 512.8]	[531, 561.9]	[564.5, 675.2]	[679.6, 751.22]	[767.1, 868.3]

Fuzzy sets established with the help of unequal length of intervals are computed as $\begin{matrix} F_{u 1} = [260.000, 280.325, 300.650], \\ F_{u 2} = [300.650, 320.975, 341.301], \\ F_{u 3} = [341.301, 362.775, 384.250], \\ F_{u 4} = [384.250, 409.410, 434.575], \\ F_{u 5} = [434.575, 455.810, 477.050], \\ F_{u 6} = [477.050, 486.187, 495.325], \\ F_{u 7} = [495.325, 508.610, 521.900], \\ F_{u 8} = [521.900, 537.560, 553.225], \\ F_{u 9} = [553.225, 558.210, 563.200], \\ F_{u 10} = [563.200, 587.250, 611.300], \\ F_{u 11} = [611.300, 644.350, 677.400], \\ F_{u 12} = [677.400, 697.350, 717.306], \\ F_{u 13} = [717.306, 738.230, 759.160], \\ F_{u 14} = [759.160, 824.580, 890.000] \end{matrix}$

Step 4. In this step computed the membership grades for fuzzification process of historical data series by utilizing the information of optimize length of intervals generated in the previous step. For this purpose, the triangular membership function defined in section 2 is employed. The values of membership grades are presented in Table 3.

Table 3

Fuzzified Production

Year	Production	Fuzzified Production	Membership
	(000 tons)	(000 tons)	Grades
1987	493	F6	0.254
1988	371.5	F3	0.594
1989	456	F5	0.991
1990	561.9	F9	0.26
1991	531	F8	0.581
1992	512.8	F7	0.685
1993	347.3	F3	0.279
1994	410.7	F4	0.948
1995	558.5	F9	0.959
1996	679.6	F12	0.11
1997	594.4	F10	0.297
1998	767.1	F14	0.121
1999	697.9	F12	0.97
2000	564.5	F10	0.05
2001	397	F4	0.506
2002	362.1	F3	0.968
2003	675.2	F11	0.07
2004	611.1	F10	0.008
2005	868.3	F14	0.332
2006	479.5	F6	0.268
2007	838	F14	0.795
2008	474.6	F5	0.114
2009	740.5	F13	0.892
2010	561.5	F9	0.341
2011	496	F7	0.05
2012	284.304	F1	0.804
2013	751.223	F13	0.38

Step 5. After assigning the membership grades to the observed data series, established the FLR’s and FLRG’s which are shown in Tables 4 and 5.

Table 4

Fuzzy logical relationships of the fuzzified production

Fuzzy Logical Relationships (FLR’s)
F₆ → F₃, F₃ → F₅, F₅ → F₉, F₉ → F₈, F₈ → F₇, F₇ → F₃, F₃ → F₄, F₄ → F₉, F₉ → F₁₂, F₁₂ → F₁₀,
F₁₀ → F₁₄, F₁₄ → F₁₂, F₁₂ → F₁₀, F₁₀ → F₄, F₄ → F₃, F₃ → F₁₁, F₁₁ → F₁₀, F₁₀ → F₁₄, F₁₄ → F₆,
F₆ → F₁₄, F₁₄ → F₅, F₅ → F₁₃, F₁₃ → F₉, F₉ → F₇, F₇ → F₁, F₁ → F₁₃

Table 5

Fuzzy logical relationships Groups of the fuzzified production

Fuzzy Logical Relationship Groups (FLRG’s)
F₁ → F₁₃,	F₉ → F₈, F₉ → F₁₂ F₉ → F₇
F₃ → F₄, F₃ → F₅ F₃ → F₁₁	F₁₀ → F₄ F₁₀ → F₁₄
F₄ → F₃, F₄ → F₉,	F₁₁ → F₁₀,
F₅ → F₉, F₅ → F₁₃	F₁₂ → F₁₀
F₆ → F₃, F₆ → F₁₄	F₁₃ → F₉
F₇ → F₁, F₇ → F₃	F₁₄ → F₅, F₁₄ → F₆, F₁₄ → F₁₂,
F₈ → F₇,

Step 6. In this step, on the basis of fuzzy FLR’ s and FLRG’s selected the significant weighted constant corresponding to each observed value in historical data. The formula for this purpose is presented in Equation (3). For example, calculation for the year 1987 are computed as

For the production of year 1987, fuzzy set is F₆ = < 493, {0.254} >. FLRG corresponding to F₆ is F₆ → F₃, F₁₄. Since the midpoints of the optimize unequal length of intervals F_u3, F_u6 & F_u14 are 362.775, 486.187 and 824.58 respectively then forecast for the year 1987 using Equation (3) is calculated as $\begin{matrix} {\hat{y}}_{1} = & 0.254 (486.187) + \frac{(1 - 0.254)}{2} (362.775) \\ + \frac{(1 - 0.254)}{2} (824.58) {\hat{y}}_{1} = 566.37 \end{matrix}$

Other forecasts are also computed in a similar way and are presented in Table 7. The comparison graph (Fig. 4) shows that forecasted values are very close to the actual values. Also, the proposed model is compared with the conventional statistical models on the basis of accuracy measures, MAE & RMSE by using the Equations (4) and (5). The results illustrated in Table 6 show that the proposed fuzzy time series model in this study generates better forecasts than other statistical models.

Fig. 4

Comparison of observed values with forecasted values.

Table 6

Model comparison with Statistical Models

Model	RMSE	Rank
ARIMA	139.0211	3
Damped Holt’s Method	139.2732	4
Simple Exponential Smoothing	117.9466	2
Holt’s Method (Exponential Trend)	156.1272	5
Proposed Method	90.9103	1

Table 7

Forecasted Production (000 tonnes)

Year	Actual Production	Forecasted production	Year	Actual production	Forecasted Production
1987	493	566.37	2001	397	434.65
1988	371.5	419.28	2002	362.1	366.25
1989	456	457.54	2003	675.2	591.24
1990	561.9	574.04	2004	611.1	616.76
1991	531	525.43	2005	868.3	639.34
1992	512.8	448.08	2006	479.5	564.86
1993	347.3	463.63	2007	838	737.33
1994	410.7	412.06	2008	474.6	630.40
1995	558.5	557.98	2009	740.5	718.78
1996	679.6	599.36	2010	561.5	572.18
1997	594.4	608.16	2011	496	330.90
1998	767.1	580.09	2012	284.304	370.07
1999	697.9	694.05	2013	751.223	626.62
2000	564.5	615.51

4.2 Residuals analysis

Residuals are useful to see the model adequacy whether a time-series model is capturing enough information for accurate forecasting. A useful prediction time-series model must satisfy the following properties of residuals.

Residuals are uncorrelated

Have zero mean

Have constant variance

Normally distributed

The analysis illustrated in Fig. 5 with the help of R software describes that all information in data set is utilized by the proposed fuzzy time series model and all properties of residuals are satisfying in this residual analysis.

Fig. 5

Plot of residuals for production data.

5 Conclusion

In our paper, a new fuzzy time series method based on FCM, information granules and a new weighted average approach is proposed. The most significant advancement in the study is that the relationship among data series is considered in the clustering environment. Firstly, the extreme observations which may cause an interruption to generate accurate forecasts are eliminated and continue the process with stationarity in data series. Optimize the length of intervals are obtained using FCM and information granules theory. A unique weighted average formula is fitted which utilize the all fuzzy logical relationships to get the crisp forecasts. There is the requirement for future research to expand the current work on developing the more significant multivariate FTS model.

Footnotes

Acknowledgments

We would like to thank the referees and the journal editorial team for providing valuable advice that improved the quality of the original manuscript. This work is supported by National Nature Sciences Foundation of China (11671104).

References

Song

and Chissom

B.S.

, Fuzzy time series and its models, Fuzzy Sets and Systems54(3) (1993), 269–277.

Zadeh

L.A.

, Fuzzy sets, Information and Control8(3) (1965), 338–353.

Arif

, et al., A survey on security attacks in VANETs: Communication, applications and challenges, Vehicular Communications (2019), 100–179.

Arif

, Wang

and Balas

V.E.

, Secure VANETs: Trusted communication scheme between vehicles and infrastructure based on fog computing, Stud Inform Control27(2) (2018), 235–246.

Arif

, Wang

and Peng

, Track me if you can? Query Based Dual Location Privacy in VANETs for V2V and V2I. in 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE (2018), pp. 1091–1096.

Arif

, et al., SDN-based secure VANETs communication with fog computing. in International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage. (2018), pp. 46–59. Springer.

Olteanu

and Paraschiv

, The influence of random numbers generators upon genetic algorithms. in 2013 IEEE INISTA. (2013). IEEE, pp. 1–5.

Nicoara

E.S.

, Filip

F.G.

and Paraschiv

, Simulation-based optimization using genetic algorithms for multi-objective flexible JSSP, Studies in Informatics and Control20(4) (2011), 333–344.

Arif

, et al., Band Segmentation and Detection of DNA by Using Fast Fuzzy C-mean and Neuro Adaptive Fuzzy Inference System. in International Conference on Smart City and Informatization. (2019). Springer.

10.

Chen

S.-M.

, Forecasting enrollments based on fuzzy time series, Fuzzy Sets and Systems81(3) (1996), 311–319.

11.

Song

and Chissom

B.S.

, Forecasting enrollments with fuzzy time series—part I, Fuzzy Sets and Systems54(1) (1993), 1–9.

12.

Song

and Chissom

B.S.

, Forecasting enrollments with fuzzy time series—part II, Fuzzy Sets and Systems62(1) (1994), 1–8.

13.

Chen

S.-M.

and Hwang

J.-R.

, Temperature prediction using fuzzy time series, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)30(2) (2000), 263–275.

14.

Cheng

S.-H.

, Chen

S.-M.

and Jian

W.-S.

, Fuzzy time series forecasting based on fuzzy logical relationships and similarity measures, Information Sciences327 (2016), 272–287.

15.

Chen

S.-M.

and Tanuwijaya

, Fuzzy forecasting based on high-order fuzzy logical relationships and automatic clustering techniques, Expert Systems with Applications38(12) (2011), 15425–15437.

16.

Huarng

and Yu

H.-K.

, A type 2 fuzzy time series model for stock index forecasting, Physica A: Statistical Mechanics and its Applications353 (2005), 445–462.

17.

Jia

, Zhao

and Guan

, Forecasting based on high-order fuzzy-fluctuation trends and particle swarm optimization machine learning, Symmetry9(7) (2017), 124.

18.

Rubio

, Bermúdez

J.D.

and Vercher

, Improving stock index forecasts by using a new weighted fuzzy-trend time series method, Expert Systems with Applications76 (2017), 12–20.

19.

Huarng

, Effective lengths of intervals to improve forecasting in fuzzy time series, Fuzzy Sets and Systems123(3) (2001), 387–394.

20.

Huarng

and Yu

T.H.-K.

, Ratio-based lengths of intervals to improve fuzzy time series forecasting, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)36(2) (2006), 328–340.

21.

, et al., Using interval information granules to improve forecasting in fuzzy time series, International Journal of Approximate Reasoning57 (2015), 1–18.

22.

Wang

, et al., Determination of temporal information granules to improve forecasting in fuzzy time series, Expert Systems with Applications41(6) (2014), 3134–3142.

23.

Huarng

K.-H.

, Yu

T.H.-K.

and Hsu

Y.W.

, A multivariate heuristic model for fuzzy time-series forecasting, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)37(4) (2007), 836–846.

24.

T.H.-K.

and Huarng

K.-H.

, A bivariate fuzzy time series model to forecast the TAIEX, Expert Systems with Applications34(4) (2008), 2945–2952.

25.

Chen

S.-M.

, Forecasting enrollments based on high-order fuzzy time series, Cybernetics and Systems33(1) (2002), 1–16.

26.

Chen

S.-M.

and Chen

C.-D.

, Handling forecasting problems based on high-order fuzzy logical relationships, Expert Systems with Applications38(4) (2011), 3857–3864.

27.

Cai

, et al., A novel stock forecasting model based on fuzzy time series and genetic algorithm, Procedia Computer Science18 (2013), 1155–1162.

28.

Kuo

I.-H.

, et al., Forecasting TAIFEX based on fuzzy time series and particle swarm optimization, Expert Systems with Applications37(2) (2010), 1494–1502.

29.

Egrioglu

, et al., A new approach based on the optimization of the length of intervals in fuzzy time series, Journal of Intelligent & Fuzzy Systems22(1) (2011), 15–19.

30.

Chen

M.-Y.

, A high-order fuzzy time series forecasting model for internet stock trading, Future Generation Computer Systems37 (2014), 461–467.

31.

Dincer

N.G.

and Akkuş

Ö.

, A new fuzzy time series model based on robust clustering for forecasting of air pollution, Ecological Informatics43 (2018), 157–164.

32.

Bezdek

J.C.

, Ehrlich

and Full

, FCM: The fuzzy c-means clustering algorithm, Computers & Geosciences10(2-3) (1984), 191–203.

33.

Askari

, et al., Generalized entropy based possibilistic fuzzy c-means for clustering noisy data and its convergence proof, Neurocomputing219 (2017), 186–202.

34.

Pal

N.R.

, et al., A possibilistic fuzzy c-means clustering algorithm, IEEE Transactions on Fuzzy Systems13(4) (2005), 517–530.

35.

Bargiela

and Pedrycz

, Recursive information granulation: aggregation and interpretation issues, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)33(1) (2003), pp. 96–112.

36.

Wang

, Liu

and Pedrycz

, Effective intervals determined by information granules to improve forecasting in fuzzy time series, Expert Systems with Applications40(14) (2013), 5673–5679.

37.

Zadeh

L.A.

, Klir

G.J.

and Yuan

, Fuzzy sets, fuzzy logic, and fuzzy systems: Selected papers. 6 (1996), World Scientific.

A new fuzzy time series forecasting method based on clustering and weighted average approach

Abstract

Keywords

1 Introduction

2 Preliminaries

2.1 Fuzzy time series

3 Proposed fuzzy time series method

3.1.1 Mean absolute error (MAE)

4.1 Results and discussion

Table 1 Descriptive statistics regarding production of gram pulse Variable Mean S.E MEAN Standard Deviation Median Kurtosis skewness Production (000) tons 488.9 29.6 153.8 456.4 –0.83 0.36

Footnotes

Acknowledgments

References

Table 1
Descriptive statistics regarding production of gram pulse

Variable Mean S.E MEAN Standard Deviation Median Kurtosis skewness

Production (000) tons 488.9 29.6 153.8 456.4 –0.83 0.36