Abstract
Agricultural price forecasting has become a promising area of research in recent times. ARIMA model has been most widely used technique during the last few decades for this purpose. When the assumption of homoscedastic error variance is violated then ARCH/GARCH models are applied in order to capture the changes in the conditional variance of the time-series data. The ANN approach can also be applied in the field of forecasting of real time-series data successfully as an alternative to the traditional forecasting models. Real-world time-series data are rarely pure linear or nonlinear in nature, sometimes contain both the pattern together. In this situation a hybrid approach of combining the forecasts from a linear time-series model (ARIMA) and from a nonlinear time-series model (GARCH, ANN) has the better forecasting performance. The hybrid methodology namely ARIMA-GARCH and ARIMA-ANN have been applied for modelling and forecasting of wholesale potato price in Agra market of India. A comparative assessment has also been made in terms of Mean absolute percentage error (MAPE) and Root mean square error (RMSE) among the hybrid and their individual counterpart as far as forecasting is concerned. It is observed that ARIMA-ANN hybrid model outperforms the other combinations and individual counterpart for the data under consideration. R software package has been used for the data analysis.
Introduction
Time-series forecasting is an important statistical analysis technique used as a basis for manual and automatic planning in many application domains (Gooijer & Hyndman, 2006). Time series forecasting is an important area of forecasting in which past observations of the same variable are collected and analyzed to develop a model describing the underlying relationship. Forecasting plays a crucial role in business, industry, government and institutional planning. Sometimes we have little knowledge about the underlying data generating process. In this situation modeling approach becomes useful. Much effort has been devoted over last few decades to develop and improve several time-series forecasting models. There are several linear time-series models available in literature. One of the important and widely used technique for analysis of univariate time-series data is Box Jenkins’ Autoregressive integrated moving average (ARIMA) methodology (Box et al., 2007). The ARIMA model is so popular due to its statistical properties. ARIMA is a flexible class of models including pure autoregressive (AR) models, pure moving average (MA) models, combined AR and MA (ARMA) models. In addition to ARIMA, various exponential models can also be used to forecast a linear time-series process. But one of the major limitations of these models is the pre-assumed linear form of the models. This assumption of linearity limits the application of ARIMA model to real time-series data. Some of the applications of this model can be found in Paul and Das (2010, 2013), Paul et al. (2013, 2014), Paul (2015).
Linear models are not able to describe any changes in the conditional variances present in the real data. To tackle this situation, Engle (1982) defined the Autoregressive conditional heteroscedastic (ARCH) models in which significant presence of autocorrelation in the squared residual series is considered. But the ARCH models give satisfactory forecast only with large number of parameters which has necessitated the emergence of more parsimonious version that is Generalized ARCH (GARCH) models (Bollerslev, 1986). In GARCH models the unconditional autocorrelation function has slow decay rate.
Unlike the traditional model-based methods, artificial neural network (ANN) is a data-driven, self-adaptive, nonlinear, nonparametric method of forecasting. Many nonlinear processes that have unknown functional relationship can be modeled by the ANN models. There are many empirical evidences that nonlinear models perform well for long term forecasting whereas the linear models are suitable for short range forecasting. So there is a need of combining the linear and nonlinear models in order to get more accurate forecast. Sometimes it becomes too difficult in practice to decide whether a time-series process is generated using a linear or nonlinear model or whether a particular forecasting method is appropriate than the other in getting out-of-sample forecasts. In this situation generally a number of models are tried and the best model is selected based on some information criteria viz. Akaike’s information criteria (AIC) or Bayesian information criteria (BIC) or Hannan and Quinn (HQ) criteria. However, there is no guarantee that the final selected model will give best forecast if there are some influential factors like sample variation, structural change present in the data. As solution to this problem different forecasting methods can be combined to get the final forecast. Real time-series data are rarely pure linear or nonlinear in nature. They often contain both linear and nonlinear components in the structure which make it necessary to combine linear and nonlinear models to capture the existing pattern in the dataset more accurately. It is almost universally agreed that no single forecasting method will be the best choice in every situation. Most of the real-world problems are complex in nature and any single model is not able to capture several patterns uniformly. Therefore, combining of different models is important to increase the chance of capturing different patterns and improve the forecasting performance. Paul (2015) has combined ARIMAX model with GARCH and Wavelet technique and showed the improvement in forecasting accuracy as compared to individual counterpart. Paul and Sinha (2016) have compared the performance of ARIMAX and NARX model for forecasting crop yield. In the present investigation ARIMA-GARCH, ARIMA-ANN models along with the individual models like ARIMA and GARCH have been applied to the real data set.
Time-series forecasting models
There are several approaches of time-series modeling. Some of the traditional linear time-series models are moving average, exponential smoothing and ARIMA. To overcome the deficiencies of linear time-series models and to capture certain nonlinear pattern in the time-series data several nonlinear time-series models are available in the literatures. The most commonly used nonlinear time-series models are the bilinear model, the Threshold autoregressive model (TAR), the Exponential autoregressive (ExpAR) and the ARCH model. In the present investigation an attempt has been made to apply ARIMA, GARCH and its family of models, ANN model and some hybrid models. A brief description of the models is given below.
The ARIMA model
In an ARIMA model, it is assumed that the future value of a variable is a linear function of past values of the variable itself and random errors also. It is a linear univariate time-series model which expresses a time-series process, say,
where
Linear Gaussian models are not able to describe non constant conditional error variance which is present in many real time-series data. To handle such a situation, Engle (1982) has introduced the ARCH models in which significant presence of autocorrelation of squared residuals is considered. The ARCH (
where
But ARCH model has drawback that, when the order of ARCH model is very large, estimation of a large number of parameters is required which is really a cumbersome process. Also, ARCH model is not parsimonious model.
To overcome these difficulties of ARCH model, Bollerslev (1986) proposed GARCH model in which conditional variance is also a linear function of its own lags.
A GARCH (
where
The first step of a GARCH process is to check for conditional heteroscedasticity of the squared residual series
The working principle of ANN is based on the human brain by making the right connections. Like the structure of neuron, ANN comprises of several layers namely: input layer that receives external information; one or more hidden layer that performs mathematical operations on the data and an output layer that produces the results. All the layers are connected through an acyclic arc (Khashei & Bijari, 2010).
ANNs are more flexible computing system for modeling a wide variety of nonlinear problems. There are two Artificial neural network topologies – feed-forward and feedback. In feed-forward topology the flow of information is unidirectional and there is no feedback path where as in feedback topology feedback paths are there. These two topologies are demonstrated in Fig. 1.
Artificial Neural Network topologies – feed-forward (left) and feedback (right).
The application of neural network structure for solving a particular time-series problem involves determination of number of layers and total number of nodes in the structure which is done on experimentation basis. It is established that single hidden layer with sufficient number of nodes at the hidden layer and adequate data for initialization. In neural network determination of number of input nodes which are lagged observations of same variable plays an important role in model building. Determination of output nodes is relatively easy. It is suggested that model with small number of nodes at hidden layer results in improved out-of-sample forecasting performance.
Single hidden layer feed-forward network is most widely used for modeling and forecasting of time-series data. The model is organized by a structure of three layers of processing units connected by acyclic arc. The relationship between one output (
where
Hence the ANN model performs a nonlinear functional mapping from the past observations (
where
There are some similarities between ARIMA and ANN models. Both of them include a variety of models with different orders. Data transformation is sometimes needed to obtain best forecasts. A relatively large sample is necessary to fit a suitable model.
The presence of nonlinearity pattern in a time-series data can be tested using BDS test. After making the data stationary by differencing, suitable linear model (e.g. ARMA (
The hybrid methodology
Zhang (2001) proposed a hybrid approach that decomposes a time-series process into its linear and nonlinear component. The hybrid model considers the time-series
where
First, a linear time-series model, say, ARIMA is fitted to the data. At the next step residuals are obtained from the fitted linear model. The residuals will contain only the nonlinear components. Let
where Diagnosis of residuals is done to check if there is still linear correlation structures left in the residuals. The residuals are tested for nonlinearity by using BDS test. Once the residuals confirm the nonlinearity, then the residuals are modeled using a nonlinear model, say, ARCH. And also obtain the forecast values, Finally the forecasted linear and nonlinear components are combined to obtain the aggregated forecast values as
The hybrid approaches can be graphically represented by Figs 2 and 3.
Schematic representation of ARIMA-GARCH hybrid methodology.
Descriptive statistics of potato prices in Agra market
Note: SD: standard deviation; CV: coefficient of variation.
Seasonal factors for potato prices in the Agra market
Schematic representation of ARIMA-ANN hybrid methodology.
For the present study potato price data belonging to Agra market in India for the period January, 2005 to May, 2017, collected from National Horticulture Research and Development Foundation (NHRDF) (
Descriptive statistics and seasonal indices:
The descriptive statistics of potato price for Agra market are reported in Table 1. A perusal of the Table 1 indicates that average potato price in Agra market is 652. Since the CV is more than 50% it can be concluded that the variability in price of Agra market is slightly in higher sight. The series under consideration is positively skewed and leptokurtic. Original data is seasonally adjusted to eliminate the influence of seasonality in price. Table 2 shows the seasonal index values. Relatively higher values of seasonal indices are found from June to November. Being a rabi crop, the planting time of potato is 15
The first and foremost step in time-series analysis is to plot the data and visualize the presence of several time-series components. Figure 4a and b show the time-series plot of average monthly price of potato for original series and monthly potato price for seasonally adjusted series from January, 2005 to May, 2017 in Agra market. A perusal of this figure indicates that the price attains its higher values during the period June, 2014 to December, every year. Though the highest price has been observed in October, 2014. The time-plot of original price data also indicates that some seasonal pattern is present in the dataset and it is required some kind of seasonal adjustment.
ADF and PP test for stationarity
ADF and PP test for stationarity
Monthly price of Potato from January 2005 to May 2017 for (a) Original Series (b) Seasonally Adjusted Series in Agra Market.
Phillips-Perron (PP) and Augmented Dicky-Fuler (ADF) tests have been applied to see the presence of non-seasonal unit root in the seasonally adjusted series it was found that the null hypothesis of unit root test is not rejected at 5% level of significance indicating seasonally adjusted series are non-stationary in nature and the results are given in the Table 3. Non rejection of the null hypothesis of unit root for both the tests at 5% level of significance indicates that differencing is required to make the seasonally adjusted series stationary for the market. Rejection of null hypothesis of stationarity test for 1
Fitting of forecasting models
Fitting of ARIMA model
After confirming the stationarity of the price series after one differencing, suitable ARMA model was selected based on minimum AIC and BIC criterion and observing the significance of autocorrelation and partial autocorrelations functions. Accordingly, ARIMA(1,1,0) mode is selected for seasonally adjusted price series of potato in Agra market. The parameter estimates of fitted ARIMA model are furnished in Table 4 along with their significance level (
Parameter estimates of the ARIMA (1,1,0) of Agra market
Parameter estimates of the ARIMA (1,1,0) of Agra market
The presence of autocorrelation in the squared residuals of best fitted ARIMA model was investigated and reported in Table 5. It was found that the squared residuals are autocorrelated at least up to 12 lags indicating possible presence of ARCH effect. To test the presence conditional heteroscedasticity, ARCH-LM test is performed and it is found that the ARCH effect is significant up to 5 lags.
Test for ARCH effects for seasonally adjusted series
Test for ARCH effects for seasonally adjusted series
BDS test for nonlinearity of residuals
BDS test has been employed to test the presence of any remaining structure in the residuals obtained from the fitted ARIMA model for the market under consideration. The results of the test are given in Table 6 indicating the possible presence of nonlinear patter in the residuals of ARIMA model.
Fitting of GARCH model
Accordingly, to capture the nonlinearity and heteroscedasticity in conditional variance, GARCH model is applied for modelling and forecasting the price series. The parameter estimates of best fitted ARIMA and GARCH model are furnished in Table 7 along their significance level.
Parameter estimates of the ARIMA (1,1,0)-GARCH (1,1) model for Agra market
Parameter estimates of the ARIMA (1,1,0)-GARCH (1,1) model for Agra market
Once it is confirmed that the residuals of the fitted ARIMA model contains nonlinear part and also the significant ARCH effect is present, the hybrid models namely ARIMA-ANN and ARIMA-GARCH model as discussed in Section 4 were employed to investigate the improvement in forecast accuracy as compared to the individual ARIMA and GARCH models.
Evaluation of forecasting performances
The prediction abilities of the ARIMA and GARCH models and the hybrid models i.e. ARIMA-ANN and ARIMA-GARCH are compared with respect to mean absolute percentage error (MAPE) and root mean squared error (RMSE) for last twelve observations (i.e. for last twelve months). The formula for computing MAPE and RMSE are given below
where
Comparison of prediction performance of different models
The accuracy of a statistical model is the fundamental feature to select that particular model and to take many important decisions. Box-Jenkins’s ARIMA methodology is most popular method of forecasting of a linear time-series process. In many of the practical situations, the assumptions of linearity and homoscedastic error variance which are two most crucial assumptions of ARIMA model are violated. In such cases, nonlinear time series models are called for. GARCH family of models is the most widely used nonlinear time series models in literature. The hybrid methodology that decomposes a time-series into its linear and nonlinear part and then model each part separately before they are combined for getting final forecast is described in detail here. The above methodology has been applied in forecasting the wholesale price of potato in Agra market. The residuals obtained from the fitted ARIMA model was tested using BDS test which reveals that nonlinearity pattern exists in the residual series. The comparison of forecast performance among the ARIMA, GARCH, ARIMA-GARCH and ARIMA-ANN hybrid models has been carried out. It is seen that the hybrid models perform better than the individual counterpart i.e. ARIMA and GARCH models with respect to minimum MAPE and RMSE value. The residuals from finally fitted hybrid model are examined and it is found that the residuals are independent and normally distributed ensuring the adequacy of model selected.
