Abstract
In this paper, we focus on monthly number of dengue cases in Thailand using the univariate Box-Jenkins (seasonal ARIMA) and GARCH models. There are 3 types of dengue i.e. dengue fever (DF), dengue hemorrhagic fever (DHF), and dengue shock syndrome (DSS). These series are fitted with adjustment by population size and seasonal index. For each type, the best model is choosen by Akaike’s Information Criteria (AIC) and Schwartz’s Bayesian Criteria (SBC). A comparison of the fitted Box-Jenkins and GARCH models are presented using root mean square error (RMSE) and mean absolute percentage error (MAPE). The results showed that the best fitted for the univariate Box-Jenkins models of DF, DHF and DSS cases are seasonal ARIMA(0, 1, 1)
Introduction
Dengue fever also known as breakbone fever, is an infectious caused by the dengue virus. The dengue viruses are members of the genus Flavivirus in the family Flaviviridae and are divided into 4 serotypes, i.e. DEN-1, DEN-2, DEN-3 and DEN-4. Dengue symptoms are classified into three groups: dengue fever (DF), dengue hemorrhagic fever (DHF), and dengue shock syndrome (DSS). Dengue infection spreads to people by mosquitoes Aedesaegypti. The symptoms of disease are fever, headache, and muscle and joint pain, which are all very similar to those of fever-causing illness. Recently, dengue has emerged as a substantial global health problem with increased incidences in new countries and tropical areas (Nimmanpipaknit, 1997).
Dengue virus outbreak in Thailand since 1958 (Gubler & Kuno, 1997), then the disease is a major health problem in the country ever since. The outbreak was dengue fever (DF) and dengue hemorrhagic fever (DHF), every 2–3 years until the outbreak of violence in 1987 and 1998 (Punthana, 1999). In 1998, there were about 120,000 cases of dengue with 400 dead cases where DF was 85 percent, DHF was 13 percent and DSS was 1.86 percent (Bureau of the vector-borne diseases, 2014). In 2013, the number of the total cases was 154,444 cases (incidence rate is 241.03 per hundred thousand) and the number of the deaths was 136 cases (fatality rate is 0.09 percent). Then, Ministry of Public Health has taken some turn center emergency responders, Bureau of the vector-borne diseases, 2014.
The Box-Jenkins approach was described by statisticians George Box and Gwilym Jenkins in 1970. The Box-Jenkins approach uses to model autoregressive integrated moving average (ARIMA) process which is a mathematical time series model and used for forecasting the series. Data in time series are frequently collected at weekly or monthly intervals, e.g. notifications of diseases, entries to a hospital, mortality rates etc. often arise in the form of time series (Catalano et al., 1987; Helfenstein, 1991). The ARIMA models have been used successfully in epidemiology to monitor and predict infectious diseases, such as malaria and hepatitis A incidence, influenza and pneumonia deaths, accompany by other infectious diseases incidences, and the use of health facilities, or isolation beds (Earnest et al., 2005). Especially, the ARIMA model are used to forecast the dengue (Promprou et al., 2006; Paula et al., 2008; Siriwan et al., 2008).
Moreover, the seasonal autoregressive integrated moving average (Seasonal ARIMA) model is useful in situations when the time series data exhibit seasonality-periodic fluctuations that recur with about the same intensity each year and have been shown to be more accurate than those obtained by other statistical methods (Martinezl et al., 2011). (See also Choudhury et al., 2008; Silawan et al., 2008; Martinezet et al., 2011; Bhatnagar et al., 2012; Kavinga et al., 2013).
Further, generalized autoregressive conditional heteroskedasticity (GARCH) model, are used to characterize and model observed time series when the variances of data are not constant, depend on time, or be heteroscedasticity (Bollerslev, 1986). Some authors developed this model to the series in many fields which its variances are depending on time such as Reinaldo et al., (2005); Ranjit et al., (2009); Kiatekarunya, (2011); Siti et al., (2011); Pung et al., (2013). In our study, the number of dengue cases has nonconstant variance. Hence, the GARCH procedure could be used to forecast the dengue incidence in Thailand.
Therefore, the aims of this paper are developing the univariate Box-Jenkins and GARCH models to forecast the dengue incidence in Thailand and comparing the results from both GARCH and Box-Jenkins models. The paper is organized as follows. The first section is an introduction. Methodology is presented in Section 2. Results of the Box-Jenkins and GARCH models are shown in Section 3. Comparison of fitted models are presented in Section 4, while forecasting is presented in Section 5. Finally, the Section 6 shows conclusion.
Methodology
Datasets
The monthly number of dengue cases in Thailand was obtained from Department of Disease Control, Ministry of Public Health. The dataset was divided into two parts: the data observed from January 2003 to June 2013 was used to develop the time series model, and the monthly number of dengue cases from July to December 2013 was used to validate the model.
In this study, we consider the monthly number of dengue cases in Thailand adjusted by population size because in real life, the population size in Thailand varies in each month (Bureau of registration administration, 2014). Let
Moreover, we also consider the monthly number of dengue cases in Thailand adjusted by the seasonal components on the GARCH method since the GARCH model does not have a seasonal component but our data have the seasonal effect. The seasonal indexs are computed by the ratio-to-moving-average method. The
Models
Seasonal ARIMA
Let
GARCH model
The GARCH or generalized autoregressive conditional heteroscedastic model is used to fit the series which its variance is not constant or depends on time. This section describes the ARCH model and the traditional GARCH model as the following.
ARCH model
The ARCH(
where
To extend the model, Engle (1982) assumed that the mean of conditional density function of
where
To ensure that the conditional error variance,
Traditional GARCH model
Bollerslev (1986) introduced a new general class of ARCH models, named generalized autoregressive conditional heteroscedastic (GARCH) models, which allows for both a long memory and a more flexible lag structure. ARCH models concern with the conditional variance which is linearly associated with the past variances only. The GARCH models added the previous conditional variances into the formulation as well. The GARCH model with the assumption of normality can be expressed as
where
The sample ACF and PACF for the square root transformed monthly number of DF cases.
The sample ACF and PACF for the first differencing of the square root transformed monthly number of DF cases.
The sample ACF and PACF for the differencing at period 12 months of the square root transformed monthly number of DF cases.
The sample ACF and PACF for the differencing at periods 1 and 12 months of the square root transformed monthly number of DF cases.
The residual ACF and PACF for the ARIMA(0, 1, 0) 
The analysis of three-time series datasets, the monthly number of DF, DHF, and DSS cases by the univariate Box-Jenkins and GARCH models with the techniques of adjustment by population sizes and removing seasonal components are shown in this section. We used SAS
The univariate Box-Jenkins model for DF, DHF, and DSS cases
We investigate whether the series is stationary in the variance by the Box-Cox power transformation. The results showed that series of the monthly number of DF, DHF, and DSS cases need squared root transformation since
The sample ACF and sample PACF for the square root transformed monthly numbers of DF cases are shown in Fig. 1. The sample ACF shows a damped sine-cosine wave and the sample PACF has large spikes at lags 1, 2 and 13. The sample ACF and sample PACF for the first differencing of the square root transformed monthly number of DF cases are shown in Fig. 2. The samples ACF shows a damped sine-cosine wave with large spikes at 12 and 24. Then we consider the differencing at a period 12 months. The sample PACF has large spikes at lags 1, 12 and 13.
The sample ACF and sample PACF for the differencing at period 12 months of the square root transformed monthly number of DF cases are shown in Fig. 3. The sample ACF tends to decrease slowly implies that it needs the first differencing and the sample PACF has large spikes at lags 1 and 2.
To ensure that the first and the twelfth differencing are needed, we consider the sample ACF and PACF for the differencing at period 1 and 12 months of the square root transformed monthly number of DF cases which are shown in Fig. 4. The sample ACF has significant spikes at lags 1 and 12. Hence we identify the following ARIMA(0, 1, 0)
To identify the nonseasonal components of the ARIMA model the residual ACF and PACF of the ARIMA(0, 1, 0)
Parameters of the tentative models are estimated at
The best models for monthly DF, DHF and DSS cases
The best models for monthly DF, DHF and DSS cases
In the same manner using the ARIMA model but the data sets in this section were adjusted by population size (
The best monthly number of DF, DHF, DSS cases adjusted by population size
The best monthly number of DF, DHF, DSS cases adjusted by population size
The GARCH models are fitted on monthly number of DF, DHF and DSS cases via the SAS
To check whether each time series is stationary, we used the Augmented Dickey-Fuller Unit root tests for the logarithms of the monthly number of DF, DHF and DSS. The results showed that model with constant mean and model with trend of the logarithm transformed monthly number of DF, DHF, and DSS cases have
To determine the order of the AR process, we identify models: AR(1)–(12) for the logarithm transformed monthly number of DF, DHF, and DSS cases because these series have seasonal at period 12. The results of model fitting are given as the following.
The monthly number of DF, DHF, DSS cases
After the order of AR process is obtained, we tested the presence of ARCH effect on the logarithm of the monthly number of DF cases by using Portmanteau Q-statistics. The results showed that all 12 lags are less than 0.05. This implies that a higher order ARCH or a GARCH(1, 1) model might be a good choice to fit (Donna, 1995). Since the series has seasonal at period 12, we chose the ARCH(12) and GARCH(1, 1) to fit the model. When we tried to fit models: AR(1)–(12) with GARCH(1, 1), the GARCH(1, 1) coefficients in these models are not significant. Then, the tentative models are AR(1)–(12) with ARCH(12).
To check white noise, the chi-square statistic for
In the same manner we obtain the best model for DHF and DSS cases. All the best model for DF, DHF, DSS cases are shown in Table 3.
The best models fitted for the monthly number of DF, DHF, DSS cases
The best models fitted for the monthly number of DF, DHF, DSS cases
This section, we consider the monthly number of DF, DHF, and DSS cases with adjustment by population size. We perform in the same manner and obtained the best model for DF, DHF, DSS cases which are shown in Table 4.
The best models fitted for the monthly number of DF, DHF, DSS cases adjusted by population
The best models fitted for the monthly number of DF, DHF, DSS cases adjusted by population
In this section, the processes of GARCH model for the monthly number of DF, DHF and DSS cases with removing seasonal components are presented. Because our data have a seasonal at period 12 and the GARCH model do not have a seasonal term, we consider the GARCH with removing seasonal components. The best fitted model for the logarithm of the monthly number of DF, DHF, and DSS cases removed the seasonal components are presented in Table 5.
The best fitted models for the monthly number of DF, DHF, DSS cases removed the seasonal components
The best fitted models for the monthly number of DF, DHF, DSS cases removed the seasonal components
The best fitted models for the monthly number of DF, DHF, DSS removed seasonal components and adjusted by population size
The best fitted models for the monthly number of DF, DHF, DSS removed seasonal components and adjusted by population size
In this section, we consider on combination of two techniques that is removing seasonal components and adjustment by population size. We performed in the same manner after we got the order of AR process, we tested the presence of ARCH effect on the logarithm of the monthly number of DF, DHF, and DSS cases removed the seasonal components and adjusted by population size. The best fitted models for all three series are shown in Table 6.
For comparison, we calculated mean absolute percentage error (MAPE) and root mean square error (RMSE) of all models for the monthly number of DF, DHF, and DSS casees which are shown in Table 7. Mean absolute percentage error and root mean square error are evaluated as follows:
where
Mean absolute percentage error (MAPE) and root mean square error (RMSE) of all models for the monthly number of DF, DHF, DSS cases
Mean absolute percentage error (MAPE) and root mean square error (RMSE) of all models for the monthly number of DF, DHF, DSS cases
For the monthly number of DF cases, the best fitted model is AR(1)–GARCH(1, 1) removed the seasonal components since it gave the minimum of MAPE (0.56%) and RMSE 19.1008. The model that adjusted by population size does not outperform than the general one. While the monthly number of DHF cases, the best fitted model is AR(8)–ARCH(1) removed the seasonal components since it gave the minimum of MAPE (3.92%) and RMSE (193.2934). The ARIMA model that adjusted by population size slightly outperforms than the general on while the GARCH model that adjusted by population size gave smaller accuracy forecasting. Finally, the monthly number of DSS cases, the best fitted model is AR(1)–ARCH(1)removed the seasonal components since it gave the minimum of MAPE (1.57%) and RMSE (2.4518). The general ARIMA model slightly outperforms than that adjusted by population size on while the GARCH model that adjusted by population size in the opposite gave outperform than the general one.
Actual and forecasted values for AR(1)–GARCH(1,1) of the monthly number of DF cases removed the seasonal components from January 2003–December 2013.
Actual and forecasted values for AR(8)–ARCH(1) of the monthly number of DHF cases removed seasonal components from January 2003–December 2013.
Actual and forecasted values for AR(1)–ARCH(1) of the monthly number of DSS cases removed seasonal components from January 2003–December 2013.
A GARCH model has the best fitting better than univariate Box-Jenkins models) to dengue because GARCH model comes with a variance equation that suitable for the data which its variance is depended on time like our data. It looks like GARCH method with a variance equation give a better fit than only transformation the data to make it stationary in variance before generate the model like the process of Box-Jenkins model do.
Moreover, the GARCH model with removing seasonal component is outperforming than the general GARCH model and the GARCH model adjusted by population size because our data also have an obviously seasonal effect but the GARCH model does not have a seasonal term. After we removed the seasonal components, the model has given a better fit to the data.
There are six models for each series. Hence we compare the results of the ARIMA and GARCH models with ensemble method (Sanchez, 2008; Suhartono, 2013). The results from either ensemble averaging or ensemble stacking methods are coincied with our best model in the previous section. Therefore, the DF, DHF, and DSS cases are forecasting using the best model which are shown in Figs 6–8, respectively.
Conclusion
This paper proposed the univariate Box-Jenkins (Seasonal ARIMA) model and the GARCH models to forecast dengue incidence in Thailand which consists of 3 types of syndromes i.e. dengue fever (DF), dengue hemorrhagic fever (DHF), and dengue shock syndrome (DSS). The GARCH model outperforms the seasonal ARIMA model for all 3 types of syndromes.
The best fitted model for DF, DHF and DSS are AR(1)–GARCH(1, 1), AR(8)–ARCH(1) and AR(1)–ARCH(1) removed the seasonal components, respectively. Along with, the percent errors of the best fitted models are 0.56%, 3.9% and 1.57%, respectively. In this study, the technique of adjustment by population size does not obviously improve model fitting and forecasting.
The population size may not be varied as much as expected. For the future work, we would like to apply the vector ARIMA model and the vector ARMA model removed the seasonal components to forecast dengue incidence in Thailand. Then compare them to the GARCH models. Further, the GARCH model is outperformed than either ARIMA or Seasonal ARIMA. But the GARCH model does not have the seasonal component it may be useful to propose GARCH model with taking into account of seasonal component in the GARCH model.
Footnotes
Acknowledgments
We would like to thank the Centre of Excellence in Mathematics, the Commission on Higher Education, Thailand, the Thailand Research Fund, and Mahidol University for the financial support.
