Abstract
This study focuses on modelling dengue cases in northeastern Thailand through two meteorological covariates: cumulative rainfall and average maximum temperature. We propose two nonlinear integer-valued GARCHX models (Markov switching and threshold specification) with a negative binomial distribution, as they take into account the stylized features of weekly dengue haemorrhagic fever cases, which contain nonlinear dynamics, lagged dependence, overdispersion, consecutive zeros and asymmetric effects of meteorological covariates. We conduct parameter estimation and one-step-ahead forecasting for two proposed models based on Bayesian Markov chain Monte Carlo (MCMC) methods. A simulation study illustrates that the adaptive MCMC sampling scheme performs well. The empirical results offer strong support for the Markov switching integer-valued GARCHX model over its competitors via Bayes factor and deviance information criterion. We also provide one-step-ahead forecasting based on the prediction interval that offers a useful early warning signal of outbreak detection.
Keywords
Introduction
Dengue is a mosquito-borne viral disease that is widespread throughout the tropics. Female mosquitoes of the species Aedes aegypti are the main vector of dengue, and Aedes albopictus is the secondary vector of dengue virus in Asia (De Melo et al., 2012). Dengue fever causes flu-like symptoms lasting from two days to one week and usually occurs after an incubation period of 4–10 days after infected by mosquito bites. When severe dengue is developing (also known as dengue haemorrhagic fever [DHF]), the critical phase occurs 3–7 days after the first onset of the disease (A detailed clinical guidance can be found on the website of Centers for Disease Control and Prevention [CDC] USA.).
The literature in recent decades has extensively studied meteorological variables such as temperature and rainfall for their potential as early warning tools against dengue, an infectious disease (e.g., Hii et al., 2012). Numerous studies have reported the incidences of dengue in northeastern Thailand, with the tendency for high dengue transmission to rise during the rainy season (Chareonsook et al., 1999; Limkittikul et al., 2014) since the amount of rainfall may offer breeding sites for the mosquitos. This study models weekly DHF cases in Thailand associated with environmental conditions. We suspect that a nonlinear relationship exists between dengue cases and cumulative rainfall/average maximum temperature, which would allow us to look at dengue cases and two meteorological variables rather than just a simple linear relationship.
Many studies focus on analysing the dengue cases associated with meteorological variables via diverse statistical models, for example, regression analysis, autoregressive integrated moving average models generalized linear models and Poisson regression model (Halide and Ridd, 2008; Silawan et al., 2008; Wongkoon et al., 2012; Lowe et al., 2011, 2013).
In practice, such data are usually non-normally distributed and overdispersed (the mean is less than the variance). There are also a large proportion of zeros and consecutive zeros. Several articles model time series of counts using a Poisson distribution. For example, Ferland et al. (2006) propose an integer-valued GARCH (INGARCH) model to deal with overdispersion. Xu et al. (2012) apply a dispersed INARCH model to analyse weekly dengue cases in Singapore. Wang et al. (2014) suggest a self-excited threshold Poisson autoregression. Chen and Lee (2016) introduce the generalized threshold Poisson INGARCH model to cope with overdispersion, dynamic nonlinearity and structural change. Troung et al. (2017) and Chen et al. (2019) further propose some nonlinear Poisson INGARCH models on various time series of counts.
The Poisson model provides the main instrument for describing time series of counts, but other distributional assumptions can be employed instead—the most natural among other candidates being the negative binomial distribution (Fokianos, 2012). Some studies recommend using a negative binomial distribution to cope with overdispersion such as Zhu (2011), Chen et al. (2016) and Chen and Lee (2017). Since we frequently observe dynamic nonlinearity of infectious diseases, we introduce two nonlinear negative binomial INGARCH models (Markov switching and threshold specifications) along with the exogenous covariates in the conditional mean equation (named ‘INGARCHX’ models hereafter) to handle asymmetric effects, overdispersion, consecutive zeros and lagged dependence for the weekly time series of DHF cases. Chen et al. (2019) propose two types of Markov switching (MS) INGARCHX models with the Poisson distribution to analyse weekly DHF cases. The novelty of this study's contribution with respect to that of Chen et al. (2019) is that we incorporate a negative binomial distribution with a greater variance than its Poisson mean to describe time series of counts. The negative binomial with two parameters (shape and scale) makes our proposed model very flexible.
The Markov switching model of Hamilton (1989) and the threshold model of Tong (1978, 1983), known as the regime switching models, are two of the most popular nonlinear time-series models in the literature. The Markov switching models are considered to be dynamic nonlinear with a distinct state of disease incidences, while a threshold model captures the dynamic behaviour of time series by switching the regimes. The general idea for both models is that a process behaves differently when applying a different model. These two classes of models share many common features, such as nonlinear effects. Chen et al. (2011) and Chan et al. (2017) emphasize the similarities and differences for threshold and Markov switching models in the economics and finance domain.
This study conducts parameter estimation and one-step-ahead forecasting for two proposed models based on Bayesian methods. Several authors successfully employ Bayesian approaches to estimate parameters for time series of counts (see Malyshikina et al., 2009; Manitz and Höhle, 2013; Chen and Lee, 2016; Martínez-Bello et al., 2017). Furthermore, we consider a numerical approximation to Bayesian model selection and forecasting, in which we heavily utilize Markov chain Monte Carlo (MCMC) techniques to overcome the difficulty arising in the frequentist method. We provide one-step-ahead forecasting via the predictive distribution, which is a by-product of the sampling algorithm. Our study develops a dengue forecasting model that provides early warning of a dengue outbreak one week in advance via the Bayesian method.
The rest of this article runs as follows. Section 2 introduces negative binomial two-state Markov switching and threshold INGARCHX models, which take into account the Bayesian inference and MCMC methods to evaluate model parameters. We consider model comparison, prediction and diagnostic checking for the proposed models. Section 3 presents simulation studies to evaluate and compare with the two models. Section 4 applies the analytical results and discussion based on the proposed models to the four provinces’ weekly DHF cases. We then make comparisons between competing models. Section 5 gives concluding remarks.
Nonlinear negative binomial INGARCH models
We consider two nonlinear specifications: Markov switching and threshold mechanism. The major difference between Markov switching models and threshold models is that the former assume that the underlying state process resulting in the nonlinear dynamics is latent, whereas the latter usually allow the nonlinear dynamics to be driven by observable variables. Both nonlinear specifications are widely applied across diverse fields of endeavour through time-series analysis applicable for capturing regime variation (Hamilton, 1989; Chen et al., 2009).
More precisely, we introduce two nonlinear INGARCHX models with negative binomial variables. One is the MS INGARCHX model, and the other is the threshold INGARCHX model where ‘X’ refers to two meteorological covariates: weekly cumulative rainfall and weekly mean maximum temperature. Suppose that
where
Let
where
We next conduct the threshold INGARCH model by incorporating covariates. Assume that
where
This indicates that
(See Remark 3 of Zhu, 2011; Chen and Lee, 2017). Both models in Equations (2.3) and (2.4) suppose that the initial observation
We next discuss how to make inferences of the model parameters and join together unobserved state components for the Markov switching model with a Bayesian framework.
Assume that
where
In practice, we believe that the coefficients of two climatic factors should have a positive relationship with weekly DHF cases, that is,
We consider
To solve a label switching problem for Markov switching models, we use the condition of the first-order stationary for an INGARCH(1,1) as a restriction to relabel the MCMC outputs to have a higher intensity in state 2 as
where
We subsequently use parameter groups (i)
where The posterior conditional distribution of The posterior conditional distribution of the transition probability where The posterior conditional distribution for The posterior conditional distribution of The posterior conditional distribution of state where Herein, if a random number generated is less than
All conditional posterior distributions are non-standard forms except for
To account for the threshold INGARCHX model, we denote the vector of all unknown parameters in Equation (2.4) by
where
It is very important to evaluate forecasting performance based on the fitted model. We can forecast
where
For comparison of different models, we use a formal Bayesian approach. Let there be two models
The RHS of Equation (2.18) expresses the Bayes factor (BF) and the prior odds ratio, respectively. In this case, assuming that we have equal preferences of these models, their prior probabilities are
where
A key part for the analysis of the time series is to check the adequacy of the specified model. Jung et al. (2006) offer the standardized Pearson residuals in order to check a dynamic structure of the mean and variance by
Clearly, the mean and variance of these residual should be close to zero and unit, respectively, without any significant serial correlation in
We now conduct a simulation study to illustrate the performance of the MCMC methods, in terms of parameter estimation and model selection. We simulate samples of size
where
Model 2 is the same as the previous set-up except for the true value of
Table 1 reports simulation results containing the averages of posterior means, medians and 95% credible intervals over the 200 replications for Models 1 and 2. The coverage probabilities indicate the percentages of the 95% posterior intervals that cover the true values generated by MCMC iterations. Overall, the posterior means of Models 1 and 2 are near the true values with coverage rates of at least 83%.
Simulation results for Models 1 and 2 obtained from 200 replications
Comparisons of Models 1 and 2 based on 200 replications
This study designs retrospective data analysis of time series for the weekly DHF cases in the four provinces of Maha Sarakham, Kalasin, Sakon Nakhon and Mukdahan in northeastern Thailand, covering approximately 12 years from January 2006 to October 2017 (612 weeks) as collected by the Bureau of Epidemiology, Department of Disease Control (DDC), Ministry of Public Health (MOPH), Thailand. For meteorological covariates, we take weekly cumulative rainfall (mm) and weekly average maximum temperature (C) from Thai Meteorological Department (TMD), Thailand. Figure 1 specifies the geographic locations in order of the four provinces.
We consider the time series of the weekly DHF cases (total sample size 612) for the four provinces, denoted by DS1, DS2, DS3 and DS4, in northeastern Thailand as shown in Table 3. All datasets of weekly DHF cases appear to be overdispersed. Table 3 reports the summary statistics for all DS1–DS4, including the percentage of zeros and consecutive zeros (at least two consecutive zeros), and also presents weekly cumulative rainfall and weekly mean maximum temperature. We observe that DS1–DS4 have at least 7% and 5% zeros and consecutive zeros, respectively. Figure 2 shows time-series plots for DS1–DS4. Obviously, these datasets include consecutive zeros, especially Sakon Nakhon (DS3) and Mukdahan (DS4).
Geographic map of four provinces in northeastern Thailand
Geographic map of four provinces in northeastern Thailand
Summary statistics for the weekly DHF cases, cumulative rainfall (mm) and mean maximum temperature (C)
Weekly DHF cases in northeastern Thailand, DS1: Maha Sarakham; DS2: Kalasin; DS3: Sakon Nakhon; DS4: Mukdahan
ACF plots of weekly DHF cases, DS1–DS4
Figure 3 depicts the ACF plots for the DHF cases in the four provinces. It shows that there exists series dependence in each DHF cases. Here
Figure 4 plots the meteorological covariates’ time series of cumulative rainfall and temperature for each dataset, respectively, which exhibit a repeated pattern. For computational convenience, we transform
Time-series plots of meteorological covariates. Weekly cumulative rainfall (upper panel) and weekly mean maximum temperature (lower panel) for DS1–DS4, respectively
Model comparison between the three models for DS1-DS4
Bayesian estimates of the two INGARCHX models for DS1
Bayesian estimates of the two INGARCHX models for DS2
Bayesian estimates of the two INGARCHX models for DS3
Bayesian estimates of the two INGARCHX models for DS4
Estimated
is based on the Markov switching INGARCHX model for DS1, DS2 (upper panel), DS3 and DS4 (lower panel), respectively
Tables 5–8 report Bayesian estimates of the Markov switching and threshold INGARCHX models for DS1–DS4, including posterior means, medians and 95% credible intervals (25 and 97.5 percentiles), respectively. We provide some evidence of convergence of the Markov chains in the Appendix, which includes some trace plots and ACF plots of parameter iterates by fitting the Markov switching negative binomial INGARCHX model. The results are convincing becuase convergence is clear in each dataset. We illustrate via ACF plots that the adaptive method for
Moreover,
To assess model adequacy, we employ standardized residuals through the posterior estimates of the target model. By using the Ljung—Box
Figure 5 provides the average probabilities of state 2,
DS1–DS2: One–step ahead forecast from the Markov switching INGARCHX models
DS3–DS4: One–step ahead forecast from the Markov switching INGARCHX models
This study provides two nonlinear integer-valued GARCHX (Markov switching and threshold) models with negative binomial distribution for modelling weekly DHF cases that consist of meteorological covariates. Applying Bayesian inferences of these models demonstrates some fascinating statistical features, consisting of overdispersion, lagged dependence, consecutive zeros, nonlinear dynamics and switching coefficients of meteorological covariates.
This article demonstrates the benefits of forecasting using the posterior predictive distribution via the joint modelling of unknown parameters and state variables. We employ a Bayesian framework through the MCMC method for making inferences. Parameter estimation methods of simulation studies illustrate that they are appropriately estimated with sufficient coverage properties. Comparisons between the three competing models based on the BF and DIC confirm that the MS INGARCHX model outperforms the threshold INGARCHX model. The evidence herein offers strong support for the proposed MS INGARCHX model over its competitors. Nevertheless, some fundamental properties exists for the threshold INGARCHX model, which might properly describe other datasets. We use predictive credible intervals that take into account ‘upper threshold’ for monitoring and providing early warning signals of outbreaks. The posterior probabilities convey clearly the captured state changing in the modelled datasets.
Finally, the proposed models allow for a warning several weeks in advance of dengue epidemics, which involve forecasting weekly cumulative rainfall and average maximum temperature as well.
Appendix
We provide some evidence of convergence of the Markov chains in Section 4. Some trace plots and ACF plots of parameter iterates by fitting the Markov switching negative binomial INGARCHX model.
DS2: Trace and ACF plots of MCMC estimates
DS3: Trace and ACF plots of MCMC estimates
DS4: Trace and ACF plots of MCMC estimates
Footnotes
Acknowledgments
We thank the editor, the associate editor and the anonymous referees for their valuable time and careful comments on our article, which have led to an improved version of it.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
Cathy WS Chen’s research is funded by the Ministry of Science and Technology, Taiwan (MOST107-2118-M-035-005-MY2).
