Maximum likelihood estimation of a change point for Poisson distributed data

Abstract

In this study we develop a change point methodology to identify and estimate changes in the parameter of a Poisson distribution. The proposed methodology considers the case when the Poisson parameter changes abruptly at an unknown point of time. For this case, the maximum likelihood estimate of the change point and its asymptotic distribution are pursued. Mainly, we carry out a large scale simulation study for evaluating the appropriateness of the asymptotic distribution of the mle from the view point of finite samples, and also for evaluating the closeness under known and unknown parameters. The simulations study also compares the mle with that of a Bayesian estimate. Then, the methodology is applied to three examples. First, we uncover changes in the number of homicides in California using monthly data from January 2002 until December 2020. Secondly, data about deaths of females caused by stomach cancer is considered to detect possible changes in the numbers recorded from 1930 to 2011. Thirdly, British coal mining disasters from 1851 to 1962 in which more than 10 men were killed are analyzed.

Keywords

Change point estimation detection two-sided random walk total variation distance Poisson distribution maximum likelihood estimate

1. Introduction

A common problem for scientists who analyze time series data is that the data may not be homogeneous in time. In many models, the assumption of homogeneity dominates their setup to explain the observed stylized facts. However, the parametric structure of the process may not remain constant throughout the whole sampling period. Statistically speaking, this means that the characteristics of data may have changed over time, perhaps multiple times. Such changes that induce instability in the parameters of the initial model lead to the classical change point problem. In this scenario, using stationary methods to model non-stationary processes can lead to spurious results.

The goal of a change point problem is to first determine whether changes exist in the parametric structure of the statistical model, and then to estimate the time points where a transition from one set of parameters to another set has occurred. Deriving distribution theory of the change detection statistics as well as deriving the distribution of the change point estimates is an integral part of a change point problem. Furthermore, a proper understanding of the underlying phenomena can help in figuring out what might have triggered these changes.

A change point model begins by postulating whether the parameter vector of the model has possibly changed at an unknown time point. One then carries out a statistical test to determine whether an unknown change point has occurred with the null hypothesis being that there is no change in the parameters. The methodology goes back to Page (1951), who introduced the classical change point problem. In a change point problem, the change may occur either abruptly or smoothly. One should understand the basic phenomenon well enough to ensure that the model formulation is consistent with how changes may occur in the process. In this article, we are concerned only with the abrupt change formulation. Page (1951) used cumulative sums to detect abrupt changes in independent time ordered sequence of values where the first set of observations may have come from one distribution and the remaining set from another distribution. In subsequent years, the change detection methodology has become a key tool in quality control and many fields of science and business. Consequently, there has been a voluminous number of articles and monographs dealing with the change point detection problem, for example, Page (1951), Bassiville et al. (1998), Csörgö and Horváth (1997), Tartakovsky et al. (2020).

While estimating a change point, it is always preferable to find a confidence interval for the true location of the change point instead of merely a point estimate. This of course requires the availability of the distribution of the change point estimate. Several years after Page (1951) formulated the detection problem, Hinkley (1970) considered a likelihood-based inference for a single change point and obtained the asymptotic distribution of the maximum likelihood estimator (mle) under the assumption that the parameters characterizing the distribution were known. Later, Hinkley (1972) demonstrated the distributional equivalence for the asymptotic distributions of the mle when the parameters are known and when they are estimated. Based on the work of Hinkley (1972), researchers can replace the unknown parameters before and after the change point with their maximum likelihood estimates and the asymptotic distribution of the change point will remain the same as if the parameters were known. Asymptotic distribution of the mle allows one to obtain confidence intervals of any desired level.

The asymptotic distributional form derived by Hinkley (1972) is computationally intractable. Subsequently, multiple authors provided some sort of approximations or bounds for the distribution of the change point mle. Yao (1987) derived an alternative asymptotic form assuming the amount of change to be small and computed the quantiles for this case. Jandhyala and Fotopoulos (1999) computed bounds and suggested two types of straightforwardly computed approximations for the distribution. They also derived sharp upper and lower bounds for the asymptotic probabilities and computed these bounds for the cases of change point estimation under the normal and exponential distributions. Fotopoulos et al. (2010) developed exact computable expressions when there is a change in the mean vector of a sequence of Gaussian variables.

This study considers inferential methods for the abrupt change point problem when data follows Poisson distribution. The Poisson distribution, which is discrete is often used to model the number of events occurring within a given time interval. The methodology applied is likelihood-based for both detection and estimation. The proposed methodology can be applied to time series data coming from medicine, finance, queueing theory, climatology, network and many other fields. Change point problem for Poisson distribution has been considered previously in the literature (see, i.e., West & Ogden, 1997; Ghorbanzadeh et al., 2017; Nyambura et al., 2016; Worsley, 1986). In this article, we first provide computable expressions for an approximation of the asymptotic distribution of the change point mle. We then perform large scale simulations to investigate the closeness of the derived asymptotic distribution to the empirical distribution of the change point mle when the parameters are known and when they are estimated. Furthermore, the simulations are performed for reasonable finite sample sizes to ascertain the closeness of theoretical and empirical distributions. Our simulation results indicate almost complete agreement indicating that the methodology works very well for reasonable sample sizes. We also carry out simulations to compare the performance of change point mle with the Bayesian methodology of West and Ogden (1997). The simulations show that the confidence intervals constructed from the change point mle performed as good or better than the prediction intervals computed by West and Ogden (1997). From the applications perspective, the methodology is applied to identify changes in monthly homicides in the state of California, yearly deaths caused by stomach cancer in females, and British coal mining disasters.

The paper is organized as follows. In Section 2, a methodology to detect changes in the characteristics of Poisson data is described. In Section 3, computable expressions are provided for an approximation of the asymptotic distribution of change point mle for Poisson distributed random variables when a change occurs in the rate parameter. Section 4 illustrates the results of large-scale simulations. Section 5 contains the change point analysis for the number of homicides in California, the yearly deaths caused by stomach cancer, and British coal mining disasters. Finally, Section 6 concludes the paper with a discussion.

2. Detection of changes in the Poisson parameter

Change point methods involve two fundamental inferential problems, detection and estimation. Under the likelihood-based approach, the detection part is addressed through likelihood ratio statistics and their asymptotic sampling distributions. The estimation part begins with the point estimate of the change point from the detection part.

In this section, we present the double exponential distribution for the log likelihood ratio statistic for detecting unknown changes in Poisson random variables. Yao (1987) showed that the likelihood ratio, under the null hypothesis of no change point in a sequence of independent normal random variates converges in distribution to the double exponential extreme value distribution. Haccou et al. (1988) proved that the asymptotic distribution of the likelihood ratio statistic for the change point problem for exponentially distributed random variables is a double exponential type extreme value distribution. Gombay and Horváth (1990) extended this asymptotic result to the case of testing for the change point in the mean of a sequence of independent random variables having a general distribution. Gombay and Horváth (1996) showed that the double exponential type extreme value distribution applies well in the case of a sequence of gamma random variables.

Let $\Upsilon_{1},\Upsilon_{2},\ldots,\Upsilon_{n}$ be independently distributed random variables following Poisson distribution with probability function of $Y_{i}$ given by

$\displaystyle f({y_{i};\lambda_{i}})=\frac{\lambda_{i}^{y_{i}}e^{-\lambda_{i}}% }{y_{i}!},\textit{ and }\lambda_{i},y_{i}>0,i=1,2,\ldots n.$ (1)

The null hypothesis for no change in the Poisson parameter is formulated as

$\displaystyle H_{0}:\lambda_{1}=\lambda_{2}=\ldots=\lambda_{n}$

versus the alternative hypothesis that there is an unknown change point

$\displaystyle H_{\alpha}:\lambda_{1}=\ldots=\lambda_{\tau}\neq\lambda_{\tau+1}% =\ldots=\lambda_{n},$

where $\tau$ is the unknown change point that needs to be estimated.

When the change point is known $\tau=t$ , we reject the null hypothesis for large values of the generalized likelihood ratio

$\displaystyle\Lambda_{t}=\left\{\frac{\mathop{\prod}\nolimits_{i=1}^{t}\frac{% \hat{\lambda}_{1}^{y_{i}}e^{-\hat{\lambda}_{1}}}{y_{i}!}\Pi_{i=t+1}^{n}\frac{% \hat{\lambda}_{2}^{y_{i}}e^{-\hat{\lambda}_{2}}}{y_{i}!}}{\Pi_{i=1}^{n}\frac{% \hat{\lambda}^{y_{i}}e^{-\hat{\lambda}}}{y_{i}!}}\right\},$ (2)

where $\hat{\lambda}_{1}$ is the mle based on the sample before the change, $\hat{\lambda}_{2}$ is the mle for the sample after the change and $\hat{\lambda}$ the mle for the entire sample. When the change point $\tau$ is unknown, one rejects the null hypothesis for large values of $Q_{n}=\mathop{\max}\limits_{1\leqslant t\leqslant n-1}({2\log\Lambda_{t}})$ . From Theorem 1.3.1 in Csörgö and Horváth (1997), the asymptotic distribution of $Q_{n}$ is based upon

$\displaystyle\mathop{\lim}\limits_{n\to\infty}P({\alpha({\log n})Q_{n}^{1/2}% \leqslant x+b({\log n})})=\exp({-2e^{-x}}),x\in{\rm R},$ (3)

where $\alpha(s)=({2\log s})^{1/2}$ , $b(s)=2\log s+\frac{d}{2}\log\log s-\log\Gamma\left({\frac{d}{2}}\right)$ and $d$ denotes the number of parameters that change under the alternative hypothesis. In the case of Poisson distributed data, $d=2$ .

If the $p$ -value found in Eq. (3) is significant then the point that $Q_{n}$ attains its maximum is considered as the estimator of the unknown change point $\tau$ . Jarušková (1997) showed that the double exponential asymptotic distribution in Eq. (3) is conservative because of the slow convergence rate. One can avoid this by eliminating a small portion of the data from both ends.

3. Maximum likelihood estimation of the change point

In the previous section, we only point out how to obtain the point estimate of the change point. However, finding a confidence interval for the change point is an essential part of the inferential procedure, which requires distribution theory of the change point mle. In this section, we lay out the procedure for obtaining the asymptotic distribution of the maximum likelihood estimate, mainly applying the general method derived by Jandhyala and Fotopoulos (1999) to the specific case of Poisson distribution.

Let $Y_{1},Y_{2},\ldots,Y_{n}$ be a sequence of time series independent random variables having a Poisson distribution with the probability function of $Y_{i}$ given by Eq. (1). Then, under the classical abrupt change point model with change in parameter $\lambda$ , we let the parameter before the unknown change point $\tau$ to be $\lambda_{1}$ and after the change to be $\lambda_{2}$ . The change point $\tau$ is unknown and we need to estimate it. Furthermore, the parameters $\lambda_{1}$ and $\lambda_{2}$ are also unknown and need to be estimated from the data. The likelihood function is

$\displaystyle\tilde{L}(\tau)\mathop{\max}\limits_{\lambda_{1}}\Pi_{i=1}^{\tau}% f(y_{i};\lambda_{1})\mathop{\max}\limits_{\lambda_{2}}\Pi_{i=\tau+1}^{n}f(y_{i% };\lambda_{2})$ (4)

and the mle of $\tau$ can be attained by $\tilde{\tau}=\text{arg}\mathop{\max}\limits_{1\leqslant j\leqslant n-1}\tilde{% L}(j)$ .

Hinkley (1972) showed that the asymptotic distributions of the change point mle when parameters are known and unknown are identical. In practice, this means that when the parameters $\lambda_{1},\lambda_{2}$ are unknown, one can work with the simpler case of known parameters by simply replacing them by their mle’s $\hat{\lambda}_{1}$ and $\hat{\lambda}_{2}$ . Hence, we may consider the parameters to be known. The likelihood function in this case is

$\displaystyle L(\tau)=\mathop{\prod}\limits_{i=1}^{\tau}f({y_{i};\lambda_{1}})% \mathop{\prod}\limits_{i=\tau+1}^{n}f({y_{i};\lambda_{2}})$ (5)

and the change point mle can be found by $\hat{\tau}=\text{arg}\mathop{\text{missing}}{max}\limits_{1\leqslant j% \leqslant n-1}\mathop{\sum}\limits_{i=1}^{j}W_{i}$ , where $W_{i}=\log\frac{f({Y_{i};\lambda_{1}})}{f({Y_{i};\lambda_{2}})}$ represents the log likelihood increments.

To establish the distribution of the change point mle, it is more convenient to work with the centered mle $\hat{\tau}-\tau$ instead of $\hat{\tau}$ . As a result, the mle can be rewritten as

$\displaystyle\xi_{n}=\hat{\tau}-\tau=\text{arg}\mathop{\text{missing}}{max}% \limits_{-\tau+1\leqslant j\leqslant n-\tau-1}\sum_{i=1}^{\tau+j}W_{i}.$ (6)

The differences $\mathop{\sum}\limits_{i=1}^{\tau+j}W_{i}-\mathop{\sum}\limits_{i=1}^{\tau}W_{i}$ define the following two-sided random walk $\Gamma(\cdot)$ :

$\displaystyle\Gamma_{n}({j;\tau})=\left\{{{\begin{array}[]{ll}\mathop{\sum}% \limits_{i=1}^{j}W_{i}^{\ast}=\mathop{\sum}\limits_{i=1}^{j}X_{i}^{\ast}=S_{j}% ^{\ast},&j=1,\ldots,n-\tau-1\\ 0,&j=0\\ -\mathop{\sum}\limits_{i=1}^{-j}W_{i}=-\mathop{\sum}\limits_{i=1}^{-j}X_{i}=S_% {j},&j=-1,\ldots,-\tau+1\\ \end{array}}}\right\}$ (7)

where the random walk $S^{\ast}$ represents the log likelihood for integers greater than $\tau$ and the random walk $S$ represents the log likelihood for integers less than $\tau$ . To determine $\tau$ we need to find the larger of the two-sided random walk maxima.

Establishing the explicit functional relationship of $X$ and $Y$ along with that of $X^{\ast}$ and $Y^{\ast}$ enables one to identify the distributions of both $X$ and $X^{\ast}$ . This is a basic step in the algorithmic procedure in Jandhyala and Fotopoulos (1999).

We let $Y\sim\textit{Pois}({\lambda_{1}})$ and $Y^{\ast}\sim\textit{Pois}({\lambda_{2}})$ . Without loss of generality, we let $\lambda_{2}>\lambda_{1}$ . Then the likelihood ratios can be expressed as

$\displaystyle X=-\log\frac{f({Y;\lambda_{1}})}{f({Y;\lambda_{2}})}=-({\lambda_% {2}-\lambda_{1}})+\Upsilon\log\frac{\lambda_{2}}{\lambda_{1}},$ $\displaystyle X^{\ast}=\log\frac{f({Y;\lambda_{1}})}{f({Y;\lambda_{2}})}=({% \lambda_{2}-\lambda_{1}})-\Upsilon^{\ast}\log\frac{\lambda_{2}}{\lambda_{1}},$

In deriving the asymptotic theory, both $\tau$ and $n-\tau$ tend to infinity to ensure enough information exists on both sides. In practice the above statement ensures that the true change point stays far enough from either end of the ordered sequence. Under this assumption, Fotopoulos and Jandhyala (2001) showed that $\xi_{n}\to\xi_{\infty}$ almost surely. Denote $\hat{\tau}_{\infty}$ to be the maximum likelihood estimate of $\tau$ based on the sample $Y_{1},Y_{2},\ldots,Y_{n}$ with $n\to\infty$ . Jandhyala and Fotopoulos (1999) showed that the asymptotic distribution for the change point mle, in their second approximation, is computed from

$\displaystyle P({\xi_{\infty}=j})\cong\left\{{{\begin{array}[]{ll}e^{-B}[{q_{j% }-({1-e^{-B^{\ast}}})\tilde{u}_{j}}],&j<0\\ e^{-B-B^{*}},&j=0\\ e^{-B^{*}}[q_{j}^{*}-(1-e^{-B})\tilde{u}_{j}^{*}],&j>0\\ \end{array}}}\right\}$ (8)

Since both of the approximations proposed in Jandhyala and Fotopoulos (1999) provide similar level of accuracy, we implement only their second approximation. The asymptotic distribution of $X$ and $X^{\ast}$ requires the calculation of the following quantities.

$b_{n}=P({S_{n}>0})=1-P\left({\textit{Pois}({n\lambda_{1}})\leqslant\frac{n({% \lambda_{2}-\lambda_{1}})}{\log\frac{\lambda_{2}}{\lambda_{1}}}}\right)$ and $b^{\ast}_{n}=P({S_{n}^{\ast}>0})=P\left({\textit{Pois}({n\lambda{}_{2}})% \leqslant\frac{n({\lambda_{2}-\lambda_{1}})}{\log\frac{\lambda_{2}}{\lambda_{1% }}}}\right)$ .

Additionally,

$\displaystyle\tilde{b}_{k}=E\{e^{-S_{n}}I(S_{n}>0)\}=e^{-k\lambda_{1}}\sum_{u=% 0}^{\left\lfloor\frac{k(\lambda_{2}-\lambda_{1})}{\log\frac{\lambda_{2}}{% \lambda_{1}}}\right\rfloor+1}\frac{(k\lambda_{1})^{u}e^{-\log\frac{\lambda_{2}% }{\lambda_{1}}u}}{u!},$ $\displaystyle\tilde{b}_{k}^{*}=E\{e^{-S_{n}^{*}}I(S_{n}^{*}>0)\}=e^{-k\lambda_% {2}}\sum_{u=0}^{\left\lfloor\frac{k(\lambda_{2}-\lambda_{1})}{\log\frac{% \lambda_{2}}{\lambda_{1}}}\right\rfloor+1}\frac{(k\lambda_{2})^{u}e^{-\log% \frac{\lambda_{2}}{\lambda_{1}}u}}{u!},$

where $\lfloor\,\rfloor$ denotes the integer part of a number.

Finally, we need to compute $\{{q_{j}}\},\{\tilde{u}_{j}\}$ and $\{{q_{j}^{\ast}}\},\{\tilde{u}_{j}^{*}\}$ by implementing the iterative procedures $q_{0}=1,nq_{n}=\mathop{\sum}\limits_{j=0}^{n-1}b_{n-j}q_{j};\tilde{u}_{0}=1,n% \tilde{u}_{n}=\mathop{\sum}\limits_{j=0}^{n-1}\tilde{b}_{n-j}\tilde{u}_{j}$ . and similarly, for $\{{q_{j}^{\ast}}\},\{\tilde{u}_{j}^{*}\}$ .

4. Simulations

In this section, we investigate the closeness of the derived asymptotic distribution to the empirical distribution of the change point mle when the parameters are known and when they are estimated. An additional goal is to show empirically that the asymptotic distribution developed in Section 3 is applicable for finite samples.

The total variation distance measure given in Eq. (9) below is used to evaluate the closeness of two distributions.

$\displaystyle d_{TV}({X,Y})=\frac{1}{2}\mathop{\sum}\limits_{i\in{\rm Z}}|{P({% X=i})-P({Y=i})}|$ (9)

Hinkley (1972) demonstrated that the asymptotic distribution of the change point mle remains the same even for unknown parameters. However, empirical considerations need to support this proposition. For this reason, we run simulations for varying sample sizes and true change point locations. Specifically, we consider the following cases: $n=$ 100, $\tau=$ 30; $n=$ 100, $\tau=$ 50; $n=$ 100, $\tau=$ 70; $n=$ 80, $\tau=$ 30; $n=$ 80, $\tau=$ 40; $n=$ 80, $\tau=$ 50. We also consider different set of parameter values for the Poisson distributions before and after the change, particularly, we let $\lambda_{1}=$ 5, $\lambda_{2}=$ 10; $\lambda_{1}=$ 30, $\lambda_{2}=$ 50; and $\lambda_{1}=$ 12, $\lambda_{2}=$ 20. Table 1 consists of the total variation distances based on 500,000 simulations. Based on the results in Table 1, we can see that there is almost complete agreement between the theoretical asymptotic distribution (parameters are known) and the empirical distribution for finite samples when the parameters are known. Moreover, there is also a very good agreement for the theoretical distribution with the empirical distribution for the case when the parameters are estimated. Furthermore, the difference between using true parameters and estimated parameters is of very small magnitude for all different combinations of sample sizes and locations of the change point.

Table 1

Total variation distances between asymptotic distribution (known parameters) and empirical distribution under known, and estimated parameters (based on 500,000 simulations)

		$\lambda_{1}=$ 30, $\lambda_{2}=$ 50		$\lambda_{1}=$ 5, $\lambda_{2}=$ 10		$\lambda_{1}=$ 12, $\lambda_{2}=$ 20
$N$	$\tau$	Estimated	Known	Estimated	Known	Estimated	Known
100	30	0.007	0.002	0.035	0.022	0.025	0.020
100	50	0.008	0.002	0.036	0.020	0.025	0.019
100	70	0.009	0.002	0.037	0.020	0.025	0.019
80	30	0.007	0.002	0.036	0.022	0.026	0.019
80	40	0.008	0.002	0.034	0.022	0.026	0.020
80	50	0.009	0.002	0.034	0.022	0.026	0.020

It is also clear that the sample size and the true change point location do not have big impact on the performance of the distributions since the total variation distances differ only slightly for several combinations of those quantities when the parameters are either known or estimated. It is of importance to mention here that the sample sizes of $n=$ 80 and $n=$ 100 considered in our analysis, give results very close to the approximate asymptotic distribution. Subsequently, we can conclude that our methodology works well for finite samples.

To further support the closeness between the empirical and theoretical distributions, plots of histograms were constructed. In order to save space, we omit presenting the plots and instead only provide a description of the constructed plots. Comparing the histograms of the change point mle for the empirical distribution with known and estimated parameters and the asymptotic approximate distribution, we have observed that they were quite close for different values of the Poisson parameters before and after the change point. Further, the change point is not affecting the distribution of the mle since plots of histograms for various change point locations when the sample size is fixed overlap for the theoretical distribution and the empirical either when the parameters are estimated or when they are assumed to be known. Additionally, we do not notice any differences in the distributions when both the sample size and the change point location change.

Next, we compare our approach to methods in literature. Specifically, four articles are considered, West and Ogden (1997), Ghorbanzadeh et al. (2017), Nyambura et al. (2016), Worsley (1986).

West and Ogden (1997) considered the problem of estimating change points in a sequence of Poisson random variables by allowing the change point to range over continuous time interval. The authors calculated interval estimators by using Bayesian confidence intervals and carried out simulations to calculate coverage probabilities and mean width of 95% confidence intervals, to show the efficacy of their methodology. We used the same values for the parameters before and after the change and all scenarios the authors considered to compare our results. In our simulations, only discrete points for the location of the change point were considered. Specifically, we generate 1,000 samples, for each scenario, and for each sample we applied the detection methodology and the theoretical distribution, described in the Sections 2 and 3, to calculate the 95% confidence interval. Based on the confidence intervals, the coverage and the mean width were calculated.

Table 2

Coverage probabilities and mean widths of 95% confidence interval for change point mle for different combinations of sample sizes and change point locations. In the top table we present the results based on the derived asymptotic distribution. In the bottom part, we present the results from West and Ogden (1997)

	$\tau=n/4$		$\tau=n/2$		$\tau=3n/4$
$N$	Coverage	Mean width	Coverage	Mean width	Coverage	Mean width
10	0.993	3.495	0.994	3.434	0.994	3.631
20	0.992	3.302	0.988	3.353	0.986	3.364
30	0.972	3.324	0.977	3.251	0.981	3.353
40	0.989	3.310	0.968	3.256	0.968	3.290
50	0.972	3.265	0.967	3.233	0.978	3.320
60	0.966	3.269	0.965	3.215	0.968	3.236
70	0.975	3.245	0.969	3.178	0.971	3.255
80	0.971	3.196	0.964	3.166	0.968	3.223
90	0.964	3.183	0.972	3.149	0.968	3.183
100	0.964	3.188	0.964	3.126	0.964	3.202
	$\tau=n/4+0.25$		$\tau=n/2+0.5$		$\tau=3n/4-0.25$
$N$	Coverage	Mean width	Coverage	Mean width	Coverage	Mean width
10	0.957	3.619	0.958	3.324	0.955	3.650
20	0.961	3.504	0.957	3.084	0.953	3.328
30	0.958	3.152	0.958	2.762	0.951	3.010
40	0.961	2.807	0.940	2.693	0.947	2.759
50	0.945	2.740	0.952	2.660	0.957	2.621
60	0.940	2.639	0.942	2.592	0.945	2.588
70	0.952	2.534	0.938	2.577	0.950	2.538
80	0.947	2.499	0.948	2.526	0.950	2.535
90	0.948	2.493	0.936	2.249	0.960	2.423
100	0.949	2.494	0.932	2.530	0.948	2.487

In the bottom part of Table 2, the coverage probabilities range from 93.2% to 96.1%. As seen in the top part of the same table, in all cases the coverage probabilities are greater than the nominal 95%. Further, the average width is around 3.2 for all scenarios.

Moreover, in West and Ogden (1997), the authors used a sample size of 50, the rate parameter before the change equal to 10 and after the change to vary and take values 12, 16, 20, 24, 28, 32, 36 and 40. The authors showed that the coverage probability for all scenarios is close to 95% and that the mean interval width decreases as the change between the rate parameters increases. We repeated the simulations for the aforementioned combinations and in Table 3 we present the results including coverage probability, mean width, empirical standard error, theoretical standard error, and also $P({\hat{\tau}-\tau=0})$ , for each case.

Table 3

Coverage and Interval width when $n=$ 50, $\lambda_{1}=$ 10, $\tau=$ 25. In the table, we present the results based on the derived asymptotic distribution and the results from West and Ogden (1997)

$\lambda_{2}$	Coverage	Mean width	St. error empirical	St. error theoretical	$P({\hat{\tau}-\tau=0})$	West & Ogden (1997) coverage	West & Ogden (1997) mean width
12	0.847	16.171	6.31	7.78	0.1551	0.984	34.24
16	0.938	5.609	1.53	1.87	0.5525	0.948	6.74
20	0.969	3.233	0.64	0.70	0.8008	0.952	2.66
24	0.989	2.102	0.34	0.36	0.9182	0.944	1.68
28	0.972	1.070	0.19	0.20	0.9674	0.931	1.29
32	0.987	1.000	0.11	0.12	0.9883	0.914	1.04
36	0.996	1.000	0.06	0.07	0.9961	0.939	0.88
40	0.998	1.000	0.03	0.04	0.9988	0.952	0.76

Based on Table 3, we can see that our methodology provides coverage probabilities greater than those in West and Ogden (1997) when the rate parameter after the change is greater than 16. Also, the average width obtained using our methodology is smaller when $\lambda_{2}=$ 12 or 16. When the difference between the rate parameters is small, the detection methodology doesn’t capture the change point accurately and since our confidence interval is narrower than in West and Ogden (1997), this leads to smaller coverage probabilities. However, when the difference in rate parameters is large, it can be seen that the coverage probabilities are more well above 0.95 and reaching even more than 0.99 for values of $\lambda_{2}=$ 36 or 40. These high coverage probabilities can be explained by noting the high probabilities for $P({\hat{\tau}-\tau=0})$ in all these cases. Clearly, in these cases, the change point estimator $\hat{\tau}$ captures the true change point $\tau$ with very high probability leading to very high coverage probability at the true change point of $\tau=$ 25. In comparison, the continuous version of West and Ogden (1997) doesn’t yield such high coverage probability for values closer to $\tau=$ 25.

Furthermore, Ghorbanzadeh et al. (2017) proposed a methodology to estimate the change points for Poisson observations. To compare our method with the one developed by Ghorbanzadeh et al. (2017), we consider the same values $\lambda_{1}=$ 5, $\lambda_{2}=$ 10, $n=$ 500 and $\tau=$ 150. We performed 500,000 simulations and we calculated a 95% confidence interval for both methods. The results showed that the confidence interval derived from our method contains the points { $-$ 2, $-$ 1, 0, 1, 2} while for Ghorbanzadeh et al. (2017) contains one more time point, { $-$ 2, $-$ 1, 0, 1, 2, 3}. Thus, our approach provides a narrower confidence interval.

In Nyambura et al. (2016), the authors present a table with different locations of the change point and concluded that their methodology is powerful when $\tau=n/2$ . However, in this section we have shown that in our approach neither the sample size nor the location of the change affects our results.

5. Applications to real data

5.1 Evaluation of the california public safety realignment

Do prisons prevent violent crimes? There are many reasons to believe they do. First, people who are in detention can’t be involved in any unlawful activity in the community, an effect criminologists call “incapacitation”. Additionally, the fear of imprisonment can deter someone from committing crimes. Furthermore, time spent in incarceration can successfully rehabilitate inmates and prepare them to be productive members of the society.

For many decades, the confidence in the effectiveness of incarceration led to increased sentence lengths and punish a broader range of crimes with imprisonment by every jurisdiction in the United States (Simon, 2014; Clear, 1994; Travis et al., 2014). Perhaps, more than any other state, California experienced an increase of 572 percent in the prison population between 1980 and 2010 Lawrence (2012).

In 2009, a federal three-judge panel ordered the state of California to reduce the prison population from close to 180 percent to 137.5 percent of design capacity. On 23 May 2011, the US Supreme Court upheld the ruling. Prior to the initiative the state’s prison population had risen to roughly 180 percent of its design capacity, and prisoners had become unable to receive routine medical or mental health care. In response, the governor and the state legislature passed two bills, California Assembly Bill 109 (2011), which became law and went into effect on October 1 ${}^{\text{st}}$ , 2011.

As of 2011, California’s state prisons were designed to house approximately 85,000 inmates. At that time, the Prison system housed nearly twice that many, approximately 156,000 inmates.

Expanding prison capacity was not financially feasible. To abide by the federal court’s order and reduce the prison population, lower-level offenders were sentenced to county jails rather than state prison and the counties were given most of the responsibilities for the parolees. Lower-level offenders are considered non-violent, non-serious and those who do not require to register as sex offenders. Furthermore, parolees who violated the terms of their release but had not been convicted of a new felony were no longer sent to prison. They served either short time in county jails or were sanctioned locally. We need to mention here that all inmates in state prison at the time would not be moved to county jails.

Consequently, counties sent fewer offenders to prison for probation failures and non- revocable parole. The reform resulted in reducing the total number of people incarcerated in California. Even though the county jail population rose, it did not rise as the prison population fell.

Sundt et al. (2016) raised the question “Can prison populations be reduced without endangering the public?”. The authors used the effect of California’s dramatic efforts to reduce prison overcrowding population. California’s Public Safety Realignment initiative, motivated by a federal mandate, represents an attempt by the state of California to reduce the state’s prison population.

This reform raised public safety concerns and questions whether the crime rates would rise since thousands of offenders who would have been incarcerated would be on the street. It was this fear that led Justice Alito to worry that ordering California to reduce its prison population would create “a grim roster of victims” (Brown, 2011). According to this perspective, we can predict that California’s Realignment Act will increase crime.

Lofstrom and Raphael (2015) employed the synthetic cohort method to identify a convex combination of non treated states and match them with the pre-intervention characteristics of the treatment state. The authors showed that the realignment did not lead to an increase in violent crimes. However, the property crime increased slightly. Additionally, the aforementioned study revealed that returns to prison, a major cause of overcrowding, were decreased. Before the new rule, more than 40 percent of released lawbreakers were back in prison within a year and it was the highest rate in the nation. In realignment’s first year, the return rate dropped by 33 percent and the state’s rate was below the national average.

In our analysis, we use the change point methodology to evaluate the implementation of the California Public Safety Realignment Act. Our goal is to test whether the number of homicides significantly changed in the period from 2002 up to 2020 and identify the ramifications that followed the implementation of the Realignment. First, we apply the detection methodology described in Section 2. If we identify that there is an abrupt change in the number of homicides, the asymptotic distribution of the mle of the change point in Section 3 will be established. The asymptotic distribution of the mle can help us to identify any desired level of confidence interval for the unknown change point.

We study the data provided by the Department of Justice (DOJ) and the Criminal Justice Statistics Center (CJSC) about homicides reported in California for the Federal Uniform Crime Reporting (UCR) program. We use monthly data from January 2002 through December 2020. Figure 1 displays the time series data.

First, we conducted a Chi-square goodness of fit test ( $p$ -value $=$ 0.76) to ensure the monthly number of homicides followed Poisson distribution. Then, we applied the detection methodology described in Section 2 to test the null hypothesis of no change in the mean parameter of the Poisson distribution against the alternative hypothesis that there is a change point.

We find the statistic $Q_{n}=$ 255 with $p$ -value near 0 indicating that we have strong evidence for a change in the Poisson mean. The test attains its maximum in September of 2008 (81 ${}^{\text{st}}$ month) implying that the change point mle $\hat{\tau}=$ 81. Before proceeding with determining the asymptotic distribution of the mle, we assess the assumption of independence for data before and after $\hat{\tau}$ through the ACF and PACF plots of the corresponding residuals. The plots strongly support the assumption of independence. Further testing did not detect any further change points in the data either before or after $\hat{\tau}=$ 81.

The parameter estimates before and after the change point mle are obtained as $\hat{\lambda}=$ 199, $\hat{\lambda}=$ 154. By treating utilizing these parameter estimates as if they are known and implementing the methodology described in Section 3, the asymptotic distribution of the change point mle is presented in Table 4.

Table 4
Probability distribution of the asymptotic distribution for the applications in 5.1 and 5.2

	California public safety realignment	Stomach cancer in females
Lag	Change-point	1st change	2nd change
$-$ 10	0	0.0001	0
$-$ 5	0	0.0031	0
$-$ 4	0	0.0072	0
$-$ 3	0.001	0.0153	0.0002
$-$ 2	0.005	0.0423	0.0019
$-$ 1	0.042	0.1173	0.0284
0	0.906	0.647	0.9424
1	0.041	0.1332	0.0238
2	0.004	0.0393	0.002
3	0.001	0.0168	0.0002
4	0	0.0069	0
5	0	0.0034	0
10	0	0.0001	0

Figure 1.

Monthly time series for the number of homicides in California.

Recognizing that the above table gives probabilities associated with deviations from the true change point, we find that the 99% confidence interval contains the months of {August, September, October} of 2008. Mainly, from this study we find that the Public Safety Realignment did not change the downward trend in the number of homicides that started around September of 2008.

5.2 Stomach cancer in females

As a second example, we consider the yearly recorded number of deaths in females in United States caused by stomach cancer. We use the data published by Centers for Disease Control and Prevention (CDC) from 1930 to 2011. The data series is presented in Fig. 2.

Figure 2.

Time series of stomach cancer deaths in females in USA.

In contrast to the time series plot presented in Section 5.1, Fig. 2 illustrates a downward trend. However, from the plot it is not clear when the characteristics of the data changed significantly, and the methodology utilized in the previous example is also applied to this dataset.

First, upon ascertaining that the Poisson distribution assumption was reasonable ( $p$ -value $=$ 0.99), the change detection methodology was applied to test for existence of a change point in the time series which yielded $p$ -value $=$ 0. The corresponding change point mle is found to be $\hat{\tau}=$ 30, which corresponds to 1959.

The ACF and PACF plots for data before and after the change point estimate did not reveal any significant dependencies. Further application of change detection methodology on data prior to the mle yielded $p$ -value $=$ 0.19, and on data after the mle yielded $p$ -value $=$ 0, thus revealing the presence of a second change point in the data after the first mle. The corresponding change point mle was $\hat{\tau}=$ 51, which corresponds to the year 1980.

Building upon the above results, we then carried out the next aspect of change point analysis by finding confidence intervals for the two change points. The Poisson parameter estimates were found to be $\hat{\lambda}=$ 23.15 before the first change-point, $\hat{\lambda}=$ 14.46 for the data between the two change points and $\hat{\lambda}=$ 3.61 after the second change. The computed theoretical distribution of the change-points based on the above estimated parameters is presented in Table 4. Working with years instead of time point deviations, we find for the first abrupt change that the 90% confidence interval is {1958, 1959, 1960} while the estimated change-point occurred in the year 1959. Furthermore, for the second change the 95% confidence interval contains the points {1979, 1980}.

A plausible reason for the first change that occurred around 1959 seems to be the prevalence of refrigeration. Refrigeration that began in the early 1900s gained widespread use by the 1950s. As a result, the US diet has begun to include more fresh fruits and vegetables (high in anti-carcinogenic anti-oxidants) and less preserved meats (high in nitrites and carcinogenic nitrosamines). To understand the factors that might have significantly reduced the number of stomach cancer deaths in females around 1980, we looked at many studies including Siegel et al. (2017), Murphy et al. (2018), and the American Cancer Society. These indicate that the number of deaths from stomach dropped since the 1980s mainly because more people were getting screened and changing their lifestyle-risk related factors. Screening helps to find colorectal polyps more often and consequently removed before they develop into cancers and also to find cancers earlier when it is easier to treat them.

5.3 British coalmining disasters

Finally, a much-analyzed dataset of intervals between British coal mining disasters from 1851–1962, in which more than 10 men were killed, is considered. The dataset was first gathered by Maguire et al. (1952), and subsequently was extended and corrected by Jarrett (1979). A change point investigation appears in Worsley (1986). We converted the data table in Jarrett (1979) to the observed annual counts and applied our proposed method. We found that the rate parameter before the change point is equal to 3 and 1 after the change. By applying our approach, we found the change point to be in 1890 and the 95% Confidence Interval to be {1888, 1889, 1890, 1891, 1892}. Worsley (1986) also estimated the change point to be in 1890, but the confidence interval obtained was much wider since it contained the years {1884, …, 1895}.

6. Conclusions

In this article, we have constructed an efficient and complete methodology to detect and estimate an unknown change point as well as compute the asymptotic distribution of the change point mle when chronological data follows the Poisson distribution. The essential idea to determine the asymptotic distribution of the change point estimate goes back to Jandhyala and Fotopoulos (1999) and Fotopoulos et al. (2010) where the authors combine ladder epochs and ladder heights and then implement the Spitzer-Baxter identities. The resulting distribution enjoys properties that are of significance in how one can conduct inferences and identify important events that probably were not known earlier. The analysis was pursued under the assumption that the parameters before and after the change were known. However, empirical analysis has shown that the error of magnitude by replacing the parameters by their corresponding estimated values at the estimated change point is negligible. To show the degree of accuracy, we use the total variation distance. Although there exist many distances to show precision, we found that the distance of the empirical distribution of the change point from its asymptotic counterpart measured through total variation distance appears to be effective and informative.

The examples considered for analysis were chosen carefully. In all cases, the Poisson distribution seems to be a reasonable assumption for the data and the detection methodology indicated an existence of a change point. Then, the asymptotic distribution of the change point was constructed. The inferential results indicate that the method works very well.

Footnotes

Acknowledgments

The authors are thankful to the referee(s) for their comments and suggestions that made the paper better both in its contents and readability.

References

Bassiville

, & Nikiforov

I.V.

(1998). Detection of Abrupt Changes: Theory and Application. Prentice-Hall.

Brown v. Plata, 563 U.S. (2011).

California Assembly Bill 109. (2011). Criminal justice alignment.

Clear

R.T.

(1994). Harm in American penology: Offenders, victims, and their communities. Albany, NY: SUNY Press.

Csörgö

, & Horváth

(1997). Limit theorems in change-point analysis. New York: Wiley.

Fotopoulos

S.B.

, & Jandhyala

V.K.

(2001). Maximum likelihood estimation of a change point for exponentially distributed random variables. Stat. Prob. Letters, 51, 423-429.

Fotopoulos

S.B.

Jandhyala

V.K.

, & Khapalova

(2010). Exact asymptotic distribution of change-point MLE for change in the mean of Gaussian sequences. Ann. Appl. Stat, 4, 1081-1104.

Ghorbanzadeh

Durand

, & Jaupi

(2017). An Application Of The Change-Point Estimation For The Poisson Distribution. Iaeng Transactions On Engineering Sciences: Special Issue for the International Association of Engineers Conferences 2016.

Gombay

, & Horváth

(1990). Asymptotic distributions of maximum likelihood tests for change in mean. Biometrika, 77, 411-414.

10.

Gombay

, & Horváth

(1996). On the rate of approximations for maximum likelihood tests in change-point models. J. Mult. Anal, 56, 120-152.

11.

Haccou

Meelis

, & Geer

(1988). The likelihood ratio test for the change point problem for exponentially distributed random variables. Stochastic Processes and Their Applications, 27, 121-139.

12.

Hinkley

D.V.

(1970). Inference about the change-point in a sequence of random variables. Biometrika, 57, 1-17.

13.

Hinkley

D.V.

(1972). Time ordered classification. Biometrika, 59, 509-523.

14.

Jandhyala

V.K.

, & Fotopoulos

S.B.

(1999). Capturing the distributional behavior of the maximum likelihood estimator of a change-point. Biometrika, 86, 129-140.

15.

Jarrett

R.G.

(1979). A note on the intervals between coal mining disasters. Biometrika, 66, 191-193.

16.

Jarušková

(1997). Some problems with applications of change-point detection methods to environmental data. Envirometrics, 8(5), 469-483.

17.

Lawrence

(2012). California in Context: How Does California’s Criminal Justice System Compare to Other States? Berkeley, CA: The Chief Justice Earl Warren Institute on Law and Social Policy.

18.

Lofstrom

, & Raphael

(2015). Realignment, Incarceration, and Crime Trends in California. Public Policy Institute of California.

19.

Maguire

B.A.

Pearson

E.S.

, & Wynn

A.H.A.

(1952). The time intervals between industrial accidents. Biometrika, 38, 168-180.

20.

Murphy

C.C.

Sandler

R.S.

Sanoff

H.K.

Yang

Y.C.

Lund

J.L.

, & Bron

J.A.

(2018). Decrease in incidence of colorectal cancer among individuals 50 years or older following recommendations for population based screening. Clinical Gastroenterology and Hepatology: The Official Clinical Practice Journal of the American Gastroenterological Association, 15(6), 903-909.

21.

Nyambura

Mundai

, & Waititu

(2016). Estimation of change point in Poisson random variables using the maximum likelihood method. American Journal of Theoretical and Applied Statistics, 5(4), 219.

22.

Page

E.S.

(1955). A test for a change in a parameter occurring at an unknown point. Biometrika, 42, 523-526.

23.

Siegel

R.L.

Fedewa

S.A.

Anderson

W.F.

Miller

K.D.

Rosenberg

P.S.

, & Jemal

(2017). Colorectal Cancer Incidence Patterns in the United States, 1974–2013. J. Natl Cancer Inst, 109(8).

24.

Simon

(2014). Mass incarceration on trial: A remarkable court decision and the future of prisons in America. New York: The New Press.

25.

Sundt

Salisbury

, & Harmon

(2016). Criminology and Public Policy Vol. 15, Issue 2.

26.

Tartakovsky

Nikiforov

, & Basseville

(2020). Sequential Analysis Hypothesis Testing and Changepoint Detection. Chapman and Hall/CRC.

27.

Travis

Western

, & Redburn

F.S.

(2014). The Growth of Incarceration in the United States: Exploring Causes and Consequences. National Academic Press.

28.

West

W.R.

, & Ogden

T.R.

(1997). Continuous-time estimation of A change-point in a Poisson process. Journal of Statistical Computation and Simulation, 56(4), 293-302.

29.

Worlsey

K.J.

(1986). Confidence regions and tests for a change-point in a sequence of exponential family random variables. Biometrika, 73, 91-104.

30.

Yao

Y.C.

(1987). Approximating the distribution of the maximum likelihood estimate of the change-point in a sequence of independent random variables. Ann. Stat, 15, 1321-1328.

Maximum likelihood estimation of a change point for Poisson distributed data

Abstract

Keywords

1. Introduction

2. Detection of changes in the Poisson parameter

5.1 Evaluation of the california public safety realignment

Table 4 Probability distribution of the asymptotic distribution for the applications in 5.1 and 5.2

6. Conclusions

Footnotes

Acknowledgments

References

Table 4
Probability distribution of the asymptotic distribution for the applications in 5.1 and 5.2