Abstract
In this study we develop a change point methodology to identify and estimate changes in the parameter of a Poisson distribution. The proposed methodology considers the case when the Poisson parameter changes abruptly at an unknown point of time. For this case, the maximum likelihood estimate of the change point and its asymptotic distribution are pursued. Mainly, we carry out a large scale simulation study for evaluating the appropriateness of the asymptotic distribution of the mle from the view point of finite samples, and also for evaluating the closeness under known and unknown parameters. The simulations study also compares the mle with that of a Bayesian estimate. Then, the methodology is applied to three examples. First, we uncover changes in the number of homicides in California using monthly data from January 2002 until December 2020. Secondly, data about deaths of females caused by stomach cancer is considered to detect possible changes in the numbers recorded from 1930 to 2011. Thirdly, British coal mining disasters from 1851 to 1962 in which more than 10 men were killed are analyzed.
Keywords
Introduction
A common problem for scientists who analyze time series data is that the data may not be homogeneous in time. In many models, the assumption of homogeneity dominates their setup to explain the observed stylized facts. However, the parametric structure of the process may not remain constant throughout the whole sampling period. Statistically speaking, this means that the characteristics of data may have changed over time, perhaps multiple times. Such changes that induce instability in the parameters of the initial model lead to the classical change point problem. In this scenario, using stationary methods to model non-stationary processes can lead to spurious results.
The goal of a change point problem is to first determine whether changes exist in the parametric structure of the statistical model, and then to estimate the time points where a transition from one set of parameters to another set has occurred. Deriving distribution theory of the change detection statistics as well as deriving the distribution of the change point estimates is an integral part of a change point problem. Furthermore, a proper understanding of the underlying phenomena can help in figuring out what might have triggered these changes.
A change point model begins by postulating whether the parameter vector of the model has possibly changed at an unknown time point. One then carries out a statistical test to determine whether an unknown change point has occurred with the null hypothesis being that there is no change in the parameters. The methodology goes back to Page (1951), who introduced the classical change point problem. In a change point problem, the change may occur either abruptly or smoothly. One should understand the basic phenomenon well enough to ensure that the model formulation is consistent with how changes may occur in the process. In this article, we are concerned only with the abrupt change formulation. Page (1951) used cumulative sums to detect abrupt changes in independent time ordered sequence of values where the first set of observations may have come from one distribution and the remaining set from another distribution. In subsequent years, the change detection methodology has become a key tool in quality control and many fields of science and business. Consequently, there has been a voluminous number of articles and monographs dealing with the change point detection problem, for example, Page (1951), Bassiville et al. (1998), Csörgö and Horváth (1997), Tartakovsky et al. (2020).
While estimating a change point, it is always preferable to find a confidence interval for the true location of the change point instead of merely a point estimate. This of course requires the availability of the distribution of the change point estimate. Several years after Page (1951) formulated the detection problem, Hinkley (1970) considered a likelihood-based inference for a single change point and obtained the asymptotic distribution of the maximum likelihood estimator (mle) under the assumption that the parameters characterizing the distribution were known. Later, Hinkley (1972) demonstrated the distributional equivalence for the asymptotic distributions of the mle when the parameters are known and when they are estimated. Based on the work of Hinkley (1972), researchers can replace the unknown parameters before and after the change point with their maximum likelihood estimates and the asymptotic distribution of the change point will remain the same as if the parameters were known. Asymptotic distribution of the mle allows one to obtain confidence intervals of any desired level.
The asymptotic distributional form derived by Hinkley (1972) is computationally intractable. Subsequently, multiple authors provided some sort of approximations or bounds for the distribution of the change point mle. Yao (1987) derived an alternative asymptotic form assuming the amount of change to be small and computed the quantiles for this case. Jandhyala and Fotopoulos (1999) computed bounds and suggested two types of straightforwardly computed approximations for the distribution. They also derived sharp upper and lower bounds for the asymptotic probabilities and computed these bounds for the cases of change point estimation under the normal and exponential distributions. Fotopoulos et al. (2010) developed exact computable expressions when there is a change in the mean vector of a sequence of Gaussian variables.
This study considers inferential methods for the abrupt change point problem when data follows Poisson distribution. The Poisson distribution, which is discrete is often used to model the number of events occurring within a given time interval. The methodology applied is likelihood-based for both detection and estimation. The proposed methodology can be applied to time series data coming from medicine, finance, queueing theory, climatology, network and many other fields. Change point problem for Poisson distribution has been considered previously in the literature (see, i.e., West & Ogden, 1997; Ghorbanzadeh et al., 2017; Nyambura et al., 2016; Worsley, 1986). In this article, we first provide computable expressions for an approximation of the asymptotic distribution of the change point mle. We then perform large scale simulations to investigate the closeness of the derived asymptotic distribution to the empirical distribution of the change point mle when the parameters are known and when they are estimated. Furthermore, the simulations are performed for reasonable finite sample sizes to ascertain the closeness of theoretical and empirical distributions. Our simulation results indicate almost complete agreement indicating that the methodology works very well for reasonable sample sizes. We also carry out simulations to compare the performance of change point mle with the Bayesian methodology of West and Ogden (1997). The simulations show that the confidence intervals constructed from the change point mle performed as good or better than the prediction intervals computed by West and Ogden (1997). From the applications perspective, the methodology is applied to identify changes in monthly homicides in the state of California, yearly deaths caused by stomach cancer in females, and British coal mining disasters.
The paper is organized as follows. In Section 2, a methodology to detect changes in the characteristics of Poisson data is described. In Section 3, computable expressions are provided for an approximation of the asymptotic distribution of change point mle for Poisson distributed random variables when a change occurs in the rate parameter. Section 4 illustrates the results of large-scale simulations. Section 5 contains the change point analysis for the number of homicides in California, the yearly deaths caused by stomach cancer, and British coal mining disasters. Finally, Section 6 concludes the paper with a discussion.
Detection of changes in the Poisson parameter
Change point methods involve two fundamental inferential problems, detection and estimation. Under the likelihood-based approach, the detection part is addressed through likelihood ratio statistics and their asymptotic sampling distributions. The estimation part begins with the point estimate of the change point from the detection part.
In this section, we present the double exponential distribution for the log likelihood ratio statistic for detecting unknown changes in Poisson random variables. Yao (1987) showed that the likelihood ratio, under the null hypothesis of no change point in a sequence of independent normal random variates converges in distribution to the double exponential extreme value distribution. Haccou et al. (1988) proved that the asymptotic distribution of the likelihood ratio statistic for the change point problem for exponentially distributed random variables is a double exponential type extreme value distribution. Gombay and Horváth (1990) extended this asymptotic result to the case of testing for the change point in the mean of a sequence of independent random variables having a general distribution. Gombay and Horváth (1996) showed that the double exponential type extreme value distribution applies well in the case of a sequence of gamma random variables.
Let
The null hypothesis for no change in the Poisson parameter is formulated as
versus the alternative hypothesis that there is an unknown change point
where
When the change point is known
where
where
If the
In the previous section, we only point out how to obtain the point estimate of the change point. However, finding a confidence interval for the change point is an essential part of the inferential procedure, which requires distribution theory of the change point mle. In this section, we lay out the procedure for obtaining the asymptotic distribution of the maximum likelihood estimate, mainly applying the general method derived by Jandhyala and Fotopoulos (1999) to the specific case of Poisson distribution.
Let
and the mle of
Hinkley (1972) showed that the asymptotic distributions of the change point mle when parameters are known and unknown are identical. In practice, this means that when the parameters
and the change point mle can be found by
To establish the distribution of the change point mle, it is more convenient to work with the centered mle
The differences
where the random walk
Establishing the explicit functional relationship of
We let
In deriving the asymptotic theory, both
Since both of the approximations proposed in Jandhyala and Fotopoulos (1999) provide similar level of accuracy, we implement only their second approximation. The asymptotic distribution of
Additionally,
where
Finally, we need to compute
In this section, we investigate the closeness of the derived asymptotic distribution to the empirical distribution of the change point mle when the parameters are known and when they are estimated. An additional goal is to show empirically that the asymptotic distribution developed in Section 3 is applicable for finite samples.
The total variation distance measure given in Eq. (9) below is used to evaluate the closeness of two distributions.
Hinkley (1972) demonstrated that the asymptotic distribution of the change point mle remains the same even for unknown parameters. However, empirical considerations need to support this proposition. For this reason, we run simulations for varying sample sizes and true change point locations. Specifically, we consider the following cases:
Total variation distances between asymptotic distribution (known parameters) and empirical distribution under known, and estimated parameters (based on 500,000 simulations)
It is also clear that the sample size and the true change point location do not have big impact on the performance of the distributions since the total variation distances differ only slightly for several combinations of those quantities when the parameters are either known or estimated. It is of importance to mention here that the sample sizes of
To further support the closeness between the empirical and theoretical distributions, plots of histograms were constructed. In order to save space, we omit presenting the plots and instead only provide a description of the constructed plots. Comparing the histograms of the change point mle for the empirical distribution with known and estimated parameters and the asymptotic approximate distribution, we have observed that they were quite close for different values of the Poisson parameters before and after the change point. Further, the change point is not affecting the distribution of the mle since plots of histograms for various change point locations when the sample size is fixed overlap for the theoretical distribution and the empirical either when the parameters are estimated or when they are assumed to be known. Additionally, we do not notice any differences in the distributions when both the sample size and the change point location change.
Next, we compare our approach to methods in literature. Specifically, four articles are considered, West and Ogden (1997), Ghorbanzadeh et al. (2017), Nyambura et al. (2016), Worsley (1986).
West and Ogden (1997) considered the problem of estimating change points in a sequence of Poisson random variables by allowing the change point to range over continuous time interval. The authors calculated interval estimators by using Bayesian confidence intervals and carried out simulations to calculate coverage probabilities and mean width of 95% confidence intervals, to show the efficacy of their methodology. We used the same values for the parameters before and after the change and all scenarios the authors considered to compare our results. In our simulations, only discrete points for the location of the change point were considered. Specifically, we generate 1,000 samples, for each scenario, and for each sample we applied the detection methodology and the theoretical distribution, described in the Sections 2 and 3, to calculate the 95% confidence interval. Based on the confidence intervals, the coverage and the mean width were calculated.
Coverage probabilities and mean widths of 95% confidence interval for change point mle for different combinations of sample sizes and change point locations. In the top table we present the results based on the derived asymptotic distribution. In the bottom part, we present the results from West and Ogden (1997)
In the bottom part of Table 2, the coverage probabilities range from 93.2% to 96.1%. As seen in the top part of the same table, in all cases the coverage probabilities are greater than the nominal 95%. Further, the average width is around 3.2 for all scenarios.
Moreover, in West and Ogden (1997), the authors used a sample size of 50, the rate parameter before the change equal to 10 and after the change to vary and take values 12, 16, 20, 24, 28, 32, 36 and 40. The authors showed that the coverage probability for all scenarios is close to 95% and that the mean interval width decreases as the change between the rate parameters increases. We repeated the simulations for the aforementioned combinations and in Table 3 we present the results including coverage probability, mean width, empirical standard error, theoretical standard error, and also
Coverage and Interval width when
Based on Table 3, we can see that our methodology provides coverage probabilities greater than those in West and Ogden (1997) when the rate parameter after the change is greater than 16. Also, the average width obtained using our methodology is smaller when
Furthermore, Ghorbanzadeh et al. (2017) proposed a methodology to estimate the change points for Poisson observations. To compare our method with the one developed by Ghorbanzadeh et al. (2017), we consider the same values
In Nyambura et al. (2016), the authors present a table with different locations of the change point and concluded that their methodology is powerful when
Evaluation of the california public safety realignment
Do prisons prevent violent crimes? There are many reasons to believe they do. First, people who are in detention can’t be involved in any unlawful activity in the community, an effect criminologists call “incapacitation”. Additionally, the fear of imprisonment can deter someone from committing crimes. Furthermore, time spent in incarceration can successfully rehabilitate inmates and prepare them to be productive members of the society.
For many decades, the confidence in the effectiveness of incarceration led to increased sentence lengths and punish a broader range of crimes with imprisonment by every jurisdiction in the United States (Simon, 2014; Clear, 1994; Travis et al., 2014). Perhaps, more than any other state, California experienced an increase of 572 percent in the prison population between 1980 and 2010 Lawrence (2012).
In 2009, a federal three-judge panel ordered the state of California to reduce the prison population from close to 180 percent to 137.5 percent of design capacity. On 23 May 2011, the US Supreme Court upheld the ruling. Prior to the initiative the state’s prison population had risen to roughly 180 percent of its design capacity, and prisoners had become unable to receive routine medical or mental health care. In response, the governor and the state legislature passed two bills, California Assembly Bill 109 (2011), which became law and went into effect on October 1
As of 2011, California’s state prisons were designed to house approximately 85,000 inmates. At that time, the Prison system housed nearly twice that many, approximately 156,000 inmates.
Expanding prison capacity was not financially feasible. To abide by the federal court’s order and reduce the prison population, lower-level offenders were sentenced to county jails rather than state prison and the counties were given most of the responsibilities for the parolees. Lower-level offenders are considered non-violent, non-serious and those who do not require to register as sex offenders. Furthermore, parolees who violated the terms of their release but had not been convicted of a new felony were no longer sent to prison. They served either short time in county jails or were sanctioned locally. We need to mention here that all inmates in state prison at the time would not be moved to county jails.
Consequently, counties sent fewer offenders to prison for probation failures and non- revocable parole. The reform resulted in reducing the total number of people incarcerated in California. Even though the county jail population rose, it did not rise as the prison population fell.
Sundt et al. (2016) raised the question “Can prison populations be reduced without endangering the public?”. The authors used the effect of California’s dramatic efforts to reduce prison overcrowding population. California’s Public Safety Realignment initiative, motivated by a federal mandate, represents an attempt by the state of California to reduce the state’s prison population.
This reform raised public safety concerns and questions whether the crime rates would rise since thousands of offenders who would have been incarcerated would be on the street. It was this fear that led Justice Alito to worry that ordering California to reduce its prison population would create “a grim roster of victims” (Brown, 2011). According to this perspective, we can predict that California’s Realignment Act will increase crime.
Lofstrom and Raphael (2015) employed the synthetic cohort method to identify a convex combination of non treated states and match them with the pre-intervention characteristics of the treatment state. The authors showed that the realignment did not lead to an increase in violent crimes. However, the property crime increased slightly. Additionally, the aforementioned study revealed that returns to prison, a major cause of overcrowding, were decreased. Before the new rule, more than 40 percent of released lawbreakers were back in prison within a year and it was the highest rate in the nation. In realignment’s first year, the return rate dropped by 33 percent and the state’s rate was below the national average.
In our analysis, we use the change point methodology to evaluate the implementation of the California Public Safety Realignment Act. Our goal is to test whether the number of homicides significantly changed in the period from 2002 up to 2020 and identify the ramifications that followed the implementation of the Realignment. First, we apply the detection methodology described in Section 2. If we identify that there is an abrupt change in the number of homicides, the asymptotic distribution of the mle of the change point in Section 3 will be established. The asymptotic distribution of the mle can help us to identify any desired level of confidence interval for the unknown change point.
We study the data provided by the Department of Justice (DOJ) and the Criminal Justice Statistics Center (CJSC) about homicides reported in California for the Federal Uniform Crime Reporting (UCR) program. We use monthly data from January 2002 through December 2020. Figure 1 displays the time series data.
First, we conducted a Chi-square goodness of fit test (
We find the statistic
The parameter estimates before and after the change point mle are obtained as
Probability distribution of the asymptotic distribution for the applications in 5.1 and 5.2
Probability distribution of the asymptotic distribution for the applications in 5.1 and 5.2
Monthly time series for the number of homicides in California.
Recognizing that the above table gives probabilities associated with deviations from the true change point, we find that the 99% confidence interval contains the months of {August, September, October} of 2008. Mainly, from this study we find that the Public Safety Realignment did not change the downward trend in the number of homicides that started around September of 2008.
As a second example, we consider the yearly recorded number of deaths in females in United States caused by stomach cancer. We use the data published by Centers for Disease Control and Prevention (CDC) from 1930 to 2011. The data series is presented in Fig. 2.
Time series of stomach cancer deaths in females in USA.
In contrast to the time series plot presented in Section 5.1, Fig. 2 illustrates a downward trend. However, from the plot it is not clear when the characteristics of the data changed significantly, and the methodology utilized in the previous example is also applied to this dataset.
First, upon ascertaining that the Poisson distribution assumption was reasonable (
The ACF and PACF plots for data before and after the change point estimate did not reveal any significant dependencies. Further application of change detection methodology on data prior to the mle yielded
Building upon the above results, we then carried out the next aspect of change point analysis by finding confidence intervals for the two change points. The Poisson parameter estimates were found to be
A plausible reason for the first change that occurred around 1959 seems to be the prevalence of refrigeration. Refrigeration that began in the early 1900s gained widespread use by the 1950s. As a result, the US diet has begun to include more fresh fruits and vegetables (high in anti-carcinogenic anti-oxidants) and less preserved meats (high in nitrites and carcinogenic nitrosamines). To understand the factors that might have significantly reduced the number of stomach cancer deaths in females around 1980, we looked at many studies including Siegel et al. (2017), Murphy et al. (2018), and the American Cancer Society. These indicate that the number of deaths from stomach dropped since the 1980s mainly because more people were getting screened and changing their lifestyle-risk related factors. Screening helps to find colorectal polyps more often and consequently removed before they develop into cancers and also to find cancers earlier when it is easier to treat them.
Finally, a much-analyzed dataset of intervals between British coal mining disasters from 1851–1962, in which more than 10 men were killed, is considered. The dataset was first gathered by Maguire et al. (1952), and subsequently was extended and corrected by Jarrett (1979). A change point investigation appears in Worsley (1986). We converted the data table in Jarrett (1979) to the observed annual counts and applied our proposed method. We found that the rate parameter before the change point is equal to 3 and 1 after the change. By applying our approach, we found the change point to be in 1890 and the 95% Confidence Interval to be {1888, 1889, 1890, 1891, 1892}. Worsley (1986) also estimated the change point to be in 1890, but the confidence interval obtained was much wider since it contained the years {1884, …, 1895}.
Conclusions
In this article, we have constructed an efficient and complete methodology to detect and estimate an unknown change point as well as compute the asymptotic distribution of the change point mle when chronological data follows the Poisson distribution. The essential idea to determine the asymptotic distribution of the change point estimate goes back to Jandhyala and Fotopoulos (1999) and Fotopoulos et al. (2010) where the authors combine ladder epochs and ladder heights and then implement the Spitzer-Baxter identities. The resulting distribution enjoys properties that are of significance in how one can conduct inferences and identify important events that probably were not known earlier. The analysis was pursued under the assumption that the parameters before and after the change were known. However, empirical analysis has shown that the error of magnitude by replacing the parameters by their corresponding estimated values at the estimated change point is negligible. To show the degree of accuracy, we use the total variation distance. Although there exist many distances to show precision, we found that the distance of the empirical distribution of the change point from its asymptotic counterpart measured through total variation distance appears to be effective and informative.
The examples considered for analysis were chosen carefully. In all cases, the Poisson distribution seems to be a reasonable assumption for the data and the detection methodology indicated an existence of a change point. Then, the asymptotic distribution of the change point was constructed. The inferential results indicate that the method works very well.
Footnotes
Acknowledgments
The authors are thankful to the referee(s) for their comments and suggestions that made the paper better both in its contents and readability.
