Abstract
In this article a non-homogeneous martingale model is proposed to model volatility in a stochastic time series of count data with constant mean. The approach is derived from a general non-homogeneous birth-and-death process, in which the mean and the variance of population size can vary as a function of time. This model can be important in modelling early warning signals that there is going to be a change of state in a complex system. The net reproduction ratio obtained from fitting a non-homogeneous birth–death model can be used as an additional tool to compare this model with a model where there is no change in the mean over the observation period. These models and procedures are illustrated with quarterly Methicillin resistant staphylococcus aureus prevalence data registered since 2001 from three Acute Trusts of hospitals of the National Health Service in Great Britain.
Keywords
Introduction
Count data, such as the population size of some biological species, or the number infected by some infectious disease in a population, are often regulated by rich underlying mechanisms, and as a consequence can reveal complex behaviour. There are, for example, situations in which it is difficult to determine in a time series of counts whether there is upward or downward change in the mean, or whether there is just some random fluctuation around a constant value, which only seems to show a trend. In situations where there appears to be curvature, in the case of population data for example because of density dependence, distinguishing between the situations might be even more difficult. This ‘turbidity’ can occur because the variance of the process might not be constant over time so that, for instance, an increase in variance might generate several relatively high or low values in a row, suggesting erroneously an upward or downward change of the mean. This variability in a measurement of interest over time is often, especially in the economic literature, referred to as volatility.
Volatility can have different meanings in the literature. In the economic literature volatility is used for the constant diffusion coefficient in a geometric Brownian motion when modelling the (log) outcome of a financial instrument over time, such as with the famous Black-Scholes model.
Secondly, the idea of constant volatility in such models is usually replaced by one where volatility can change in time, since in many cases this was what the data suggests (Lewis 2000). Stochastic volatility models were developed that deal with this changing volatility. So volatility in this literature refers to the variability, usually measured in terms of standard deviation, of time series that have a continuous state space. Modelling volatility is seen in the economic literature as being of vital importance to obtain reasonable models for financial instruments.
Volatility is also defined as the occurrence of very extreme values of a certain outcome over a period of time (Lindsey 2004). The term then refers to a change in variance over time. This might be modelled with heavy tail distributions and by allowing the dispersion to depend on the previous values of the variability. This ‘increase in variance’ interpretation of volatility is also seen in the environmental literature, for instance in climate research. There it is used as an indication that more extreme outcomes in climate measurements occur (Ahmed et al. 2009). In climate studies it has also been recognized, for some time now, that volatility plays an important role in understanding climate change (Katz and Brown 1992).
Volatility in this article is defined as the possible non-homogeneous variability in a measurement of interest over time.
Modelling volatility usually means modelling the change in variability in time. For count models with a time changing mean, the model implicitly accounts for changing volatility, since with these models there is a relationship between the mean and the variance, and if the mean is changing in time (e.g., drift), then so is the variance, and thus the volatility. A good example is the much used Poisson model for count data, for which the standard deviation is
This leaves open the case for stochastic processes that have counts as their state space, where the mean of the process is constant, but where there might nevertheless be a change in volatility. The case of a model with only a time trend for the mean and not for the variance is usually not a valid case for count data.
It might be important to detect a change in volatility for processed data with constant mean because this change in volatility might be a signal that in the near future a change in the mean might occur. An increase in variance as a signal that there is going to be a change of state is known in complex dynamical systems. As discussed in (Scheffer et al. 2009), in such systems these changes can be early warning signals that the system is approaching a state transition. As discussed there, a system approaching critical points becomes increasingly slow in recovering from small perturbations. This critical slowing down leads to three possible early warning signals, among which an increase in variance. Examples that an increase in variance can be an early warning signal can be observed in such diverse areas as climate change, ecosystems, asthma attacks and epileptic seizures (Scheffer et al. 2009).
In foregoing modelling studies of data from a stochastic process with constant mean, the population size or the number of infected-and-immune individuals in a population in the case of infectious diseases, tended to be described by deterministic models and, to estimate the parameters of the model, rather ad hoc distributional assumptions are made. But, it is not always realized that an ad hoc choice of a distribution also means an ad hoc choice for the formula describing the variance of the data.
This article proposes a new non-homogeneous martingale model for (changing) volatility in count data with constant mean. As said, volatility is used here for the (homogeneous) variability of the outcomes of a stochastic process. In Section 2 the non-homogeneous birth-and-death process is used to derive this non-homogeneous martingale model. A description of a likelihood-based approach to fit a conditional version of the model is explained in Section 3. In Section 4 it is described how a change in variance in a model with constant mean might be observed. Also the choice of a survival curve is explained there. In section 5 the models are applied to methicillin-resistant Staphylococcus aureus (MRSA) data obtained from the National Health Service (NHS) Acute Trusts of hospitals from Great Britain.
Non-homogeneous martingale process
As time evolves, a stochastic process can generate count data from a distribution, with the same mean and variance. Such a process can be ‘destabilized’ by a change in volatility only, in the absence of a change in the mean: the counts in time remain scattered around a certain level but the amount of scatter might increase or decrease with time. As explained above, this kind of disturbance might be a signal that in the near future the mean might change over time. This change in the mean can manifests itself as a smooth—possibly non-linear—change in the mean over time, or it might be a jump in the average count, upwards or downwards, after which the process might ‘stabilize’ again at a new level. To study ‘destabilization’, the model of the process that generates the count data must be able to distinguish between the state in which only the volatility changes and the state in which there is a change in the mean over time, and thus also a change in volatility. In order to achieve this, the ‘change-in-volatility-only’ model should be a special case of the ‘change-in-mean and change-in-volatility model’. Besides this, the model should take the dependence between the counts at different time points into account.
In the non-homogeneous birth–death models the rates are allowed to change over time. This can be very useful in modelling the dynamics of a process, for example in infectious disease models. A possible interpretation for the time-varying rates is given in (Becker and Yip 1989). The non-homogeneous birth–death model with counts as its state-space can be reduced to a model that models volatility only, as will be shown below by taking the birth rate—or reproductive power—and the death rate equal.
In the non-homogeneous birth–death model the probability of a birth or a death at a given point in time will depend on the population size at the previous point in time only, i.e., it is a Markov model.
So, the general non-homogeneous birth-and-death process is an example of a stochastic process that is suitable for our purpose. It is used in this article to derive a new model that only describes the variance as a function of time and that is shown to be a non-homogeneous martingale process. We will use the language of infectious disease epidemiology as our motivating practical example.
First a brief discussion of the non-homogeneous birth–death process is given. Details can be found in (Van den Broek and Nishiura 2009). A random count variable, measuring population size at calendar time
From the birth rate it can be seen that
The non-homogeneous martingale process is a special case of the above mentioned non-homogeneous birth-and-death process. In a population with a constant mean the birth and death rates are equal (i.e.,
The expected value of the population size at time
According to the model described in Section 2, the data are generated by the stochastic non-homogeneous birth-and-death process where the rate at which a birth (new case, for the ‘infectious-disease interpretation’) occurs is equal to the rate at which a death (recovery/removal) occurs. To fit the model to empirical data, it can be noted that the observed data represent just a single sample path of all possible sample paths that can be generated by such a model. The population size at time
In addition, empirical observations of such a process are usually made in discrete time
Because the probability mass function (2.2) is actually a conditional probability (i.e., conditioned on the observed population size at
Using
In this variance,
Using the net reproduction ratio
To compare the martingale model derived above with a model with a change in the mean, the general non-homogeneous birth–death process seems the most obvious candidate since it has the martingale model as a special case. The process is described by the stochastic non-homogeneous birth-and-death differential equations, shown in section two, the solution and properties of which are given elsewhere (Van den Broek and Nishiura 2009). The process involves time-dependent variation both in the mean and in the variance of the population size. The expectation and the variance of the population size are:
The mean population count in this model can be thought of as consisting of two parts. The first is the expected number present at
Note that the expected value of the process obeys the differential equation for the deterministic birth–death process, that is
The rates in this model, the birth rate
Since in the present work a model with changing volatility is compared to a model with a changing mean and thus, because the variance is a function of the mean, also changing volatility, model comparison might be difficult using only AIC. For instance, in the first part of the observation period the martingale model may seems to fit well whereas in the last part it may not. In that case the net reproduction ratio might be an additional tool to compare models because the net reproduction ratio for the martingale model is one, since there the birth rate equals the death rate. For the non-homogeneous birth–death model, the net reproduction ratio can be an increasing or a decreasing function or both. In this way it can be used to decide which model is better: if the net reproduction ratio is constant around one, then the martingale model might be appropriate, if not, the non-homogeneous model is preferred. But it can also show the changing behaviour mentioned: in the first part the net reproduction ratio can be constant around one but it can increase or decrease in a latter part of the observation period or the other way around. That is, it might show a change when time evolves.
The conditional version of the net reproduction ratio is:
Survival function
Given that the population dynamics involve complex mechanisms, on the one hand, the family of survival functions used in the model needs to be flexible. On the other hand, one generally should aim to use the simplest model, e.g., exponentially distributed survival functions for homogeneous processes, where rates are constant. The generalized gamma is a rich family of distributions that include the exponential as a special case but also the Weibull and the gamma distribution. As a consequence, by checking the estimated parameters of the generalized gamma one can check if the data supports constant rates.
The survival function is then given by:
Gamma distributions for Weibull distribution for Exponential distribution for Log-normal, Pareto and power function distributions for appropriate limits.
Further details of the generalized gamma distribution are given elsewhere (Kleiber and Kotz 2003).
Prevalence of MRSA in three NHS trust in Great Britain
Methicillin-resistant Staphylococcus aureus (MRSA) is a bacterial infection that causes infections in different parts of the body and is resistant to several antibiotics such as methicillin, amoxicillin, penicillin and oxacillin. MRSA infections are an important risk for people who have weakened immune systems and are in hospital intensive care units, nursing homes and other health care centres. There are many risk factors for acquiring MRSA like surgery, duration of hospitalization, compliance with hand disinfection procedures and antibiotic exposure, among others (Tacconelli et al. 2008).
Because hospital patients have an increased risk of being infected with MRSA it is important for hospitals to monitor their MRSA-prevalence and to take measures to prevent spread. Are the (precautionary) measures taken effective and is the prevalence decreasing in time, or is the volatility increasing as a consequence of the measures? It is also possible that the measures taken, hardly have influence and that the prevalence of MRSA is randomly fluctuating around some fixed level. For a discussion of the problems that can arise with MRSA infection data, including the variability of the rates (see Spiegelhalter, 2005).
Other stochastic models for hospital outbreak data have been used. For instance, Pelupessy et al. (2009) describe a model for different colonization routes of pathogens within a hospital. They derive a stationary probability distribution for the colonized patients that takes the different routes of colonization into account and need data on these routes. Yet other modelling approaches can be found in Cooper et al. (2004), Grundmann et al. (2002) and in Grundmann and Hellriegel (2006). In contrast the present article describes a model that uses a probability distribution for which the expected value is not changing over time but which is nevertheless depending on time through the variance of the process and only relays on infection prevalence data. The model is a special case of the non-homogeneous birth–death model to which it can be compared in order to determine whether or not there is a change in mean over time. This comparison can be done in two ways: by using Akaike's information criterium (or the bias corrected version if the sample size is small) and by use of the net reproduction ratio.
An important point is that the rate at which MRSA reproduces, depends on the duration of hospitalization (Beyersman et al. 2011). The above model deals with this by taking the reproduction power to be time dependent. As is explained by Becker and Yip (1989), a rate parameter might behave in a time dependent manner because there is a difference in susceptibility among the susceptibles. Those with higher susceptibility tend to be infected earlier while those with low-risk susceptibility will evade infection for a longer period. Hence heterogeneity among susceptibles (related to duration of hospitalization) can make the rate behave in a time dependent manner. Furthermore, as explained, in a SIR interpretation, the reproductive power can be thought of as a product of the infection rate parameter (
Because of the importance to monitor the MRSA prevalence, there is a surveillance of MRSA among NHS Acute Trust hospitals in Great Britain. This surveillance has been mandatory since April 2004. Quarterly data from April–June 2001 until January–March 2010 and monthly tables from February 2010 until February 2011 can be obtained from the web site of Public Health Enland (https://www.gov.uk/government/statistics/mrsa-bacteraemia-annual-data).
Positive blood cultures from the same patient within 14 days of the initial culture were considered to be part of a single infected episode. Duplicate reports, more than 14 days apart are considered to be separate episodes of infection of the same patient. That is, the data consist of 14-day-episodes prevalence data for patients who are MRSA positive. New cases in a hospital are reproduced from existing cases usually through health care workers. The rate at which this happens is the reproductive power in the model. After discovery of MRSA a case is usually removed or isolated so that it is not possible for this person to reproduce any longer. The rate at which this removal happens is the death rate in the model.
The models and methods described in this article are used for the data of three of the NHS Trust hospitals to illustrate the following specific data issues:
The data from the Leeds Teaching Hospitals Trust show that the non-homogeneous martingale model fits the data best, based on the bias corrected Akaike's information criterium. There is a change in volatility. The data from the Guy's & St. Thomas’ Trust do not show a clear choice between some birth–death models and a martingale model, based on the bias corrected Akaike's information criterium. Inspection of the curve of the net reproduction ratio shows that there is evidence for a birth–death model and thus a change in mean. Also the data for the King's College Hospital Trust is not conclusive between some birth–death models and the martingale model, based on the bias corrected Akaike's information criterium. Here, however, the net reproduction ratio does not show evidence of a change in mean, indicating that the martingale model is to be preferred. The data also shows that there is a constant rate at which an event (new MRSA infection or MRSA removal) occurs, meaning that volatility is changing linearly in time.
We assumed that the characteristics of the process that generates the data within a Trust is approximately the same for all hospitals of that Trust and that thus the data within a Trust can be aggregated.
Using the data of these three Trusts, 14 different models were fitted. Four martingale models were fitted in which the survival distribution (of the reproduction and removal times) is taken to be the generalized gamma, the gamma, the Weibull and the exponential distribution. Ten birth–death models were fitted: birth–death models where the reproduction times and the removal times had different generalized gamma, gamma, Weibull and exponential distributions, birth–death models where the reproduction times had a exponential distribution, constant birth rate, and the removal times had a generalized gamma, gamma or a Weibull distribution and finally birth–death models where the removal times had an exponential distribution (constant death rate) and the reproduction times had a generalized gamma, gamma or a Weibull distribution. Table 1 compares the bias corrected AIC values (AICc) between the different models for each of the three Trusts mentioned above. The bias corrected version of the AIC is used since the sample sizes are not that large.
Figure 1 shows the data (upper part) for Leeds Teaching Hospital. The number of cases per quarter seems to stay reasonably stable for a long period (about 30 quarters). Only at the end of the observation period there seems to be a disturbance in volatility or in the mean (and thus also in volatility). As can be seen from Table 1 the martingale model with the gamma distribution gives the best fit according to AICc, indicating that the volatility of the process is changing. This martingale model has an AICc
Bias corrected AIC's (AICc's) for 3 Trusts using 14 different models
Bias corrected AIC's (AICc's) for 3 Trusts using 14 different models


In this figure, the line for the four-period moving standard deviation is also shown (dashed line), as an empirical measure of the standard deviation of this process. This is of course different from the conditional standard deviation of the martingale model, but shows approximately the same pattern, although in a more ‘zig-zag’-style: an increase in the beginning and a decrease in the end of the observation period.
If the choice for the martingale model in the case of the Leeds Teaching Hospital seems a clear one, such is not the case with Guy's & St. Thomas. As can be seen from Table 1– The martingale model with a Weibull and the birth–death model with an exponential distribution for the reproduction times (a constant birth rate) and a gamma distribution for the removal times are close. Figure 2 shows a barplot of the data in the upper part and in the lower part the four-period moving standard deviation (dashed line) together with the conditional standard deviation (solid line) from the martingale model with the Weibull distribution. These are both decreasing. One can then pick the martingale model because its number of parameters is smaller but one may also try to see if there is some other information available. As mentioned previously, Akaike's information criterion—and also the bias corrected one—judges the fit over the whole observation range. There can be evidence in the data, however, that the birth and death rates are approximately equal in the first part of the observation period but not for a later part and one can then, as discussed earlier, use the net reproduction ratio
Similar observations can be made for King's College Hospital. Five models are very close in fit as can be seen from Table 1: the martingale model with exponential distribution; the birth–death model with exponential survival functions; the birth–death model with exponential birth rate and generalized gamma or Weibull removal times and the birth–death model with exponential removal times and gamma reproduction times. Figure 4 shows the data in the top graph and in the lower graph the four-period moving standard deviation (dashed line), with the conditional standard deviation from the martingale model with the exponential distribution (solid line). Again, one can have a look at the net reproduction ratio which is shown in Figure 5, together with the net reproduction ratio's from 250 sample paths and their 95% percentile lines. Here, the line



Modelling volatility is becoming an important issue, not only in the economic literature where the Black-Scholes model has drawn much attention, but also in areas as ecology, and more recently in the environmental sciences where climate change is thought to cause an increase in volatility. Modelling volatility can be done separately from modelling the mean of a process when the normal distribution is used. When models for counts are used, things are slightly more complicated since there a change in mean has a change in variance as a consequence because of the relationship between the mean and the variance. If there is more (or less) variance as there must be according to the model, then models for over (or under) dispersion can be used.
This leaves the important case where there is no change in the mean but there is a change in volatility. It is important to be able to distinguish this case from the ‘changing-mean’ case, because a change in volatility can give the erroneous impression that there is a change in the mean. Consider for instance the case that the variance is increasing, then by coincidence a few large (or small) values in a row might be observed, suggesting a change in the mean. A change in volatility only can also indicate a disturbance in a stable process, after which a change in the mean can occur. This change can be smooth, or it can manifest as a jump, after which the process can stabilize again.
This article proposes a new non-homogeneous martingale model for count data, derived from a non-homogeneous birth–death process, to study the changes in volatility without change in the mean. This model is capable of modelling a change in volatility while the mean stays the same, but can also deal with a process that shows no change at all, a stable process.
The net reproduction ratio plays an important role in choosing between a volatility-only model and a model with different non-homogeneous birth and death rates. It might be that in a part of the observation period the net reproduction ratio is approximately one, indicating an equal birth and death rate and thus pointing to the non-homogeneous martingale model, where in another part there might be evidence from the data that this is not the case.
As an illustration the models were fitted to MRSA prevalence data from three Trusts in Great Britain: Leeds Teaching Hospital, Guy's & St. Thomas and King's college Hospital. The procedure described above was able to reveal different aspects of the data: a changing volatility only (Leeds Teaching Hospital), evidence using the bias corrected AIC and the net reproduction ratio that there was a change in mean in the later part of the observation period (Guy's & St. Thomas) and no evidence for change, not in volatility and not in mean (King's college Hospital). For the hospital Guy's & St. Thomas the data showed that the net reproduction ratio dropped below one, indicating that the MRSA measures taken in that hospital are effective. For King's college the data did not show that the net reproduction ratio was different from one indicating that MRSA is persisting there.
As is clear from the models used here and the data of the three hospitals, it might not at all be obvious, whether there is a changing mean in the data or change in volatility or that the time series as a whole stays reasonably stable. This is especially the case in data that shows a large variance. This would lead to the policy implication for these kind of surveillance that rather long time series are needed in order to be able to identify a changing mean.
Footnotes
Acknowledgments
I would like to thank Hiroshi Nishiura for his help on an early draft of the article.
