Conway–Maxwell–Poisson (CMP) distributions are flexible generalizations of the Poisson distribution for modelling overdispersed or underdispersed counts. The main hindrance to their wider use in practice seems to be the inability to directly model the mean of counts, making them not compatible with nor comparable to competing count regression models, such as the log-linear Poisson, negative-binomial or generalized Poisson regression models. This note illustrates how CMP distributions can be parametrized via the mean, so that simpler and more easily interpretable mean-models can be used, such as a log-linear model. Other link functions are also available, of course. In addition to establishing attractive theoretical and asymptotic properties of the proposed model, its good finite-sample performance is exhibited through various examples and a simulation study based on real datasets. Moreover, the MATLAB routine to fit the model to data is demonstrated to be up to an order of magnitude faster than the current software to fit standard CMP models, and over two orders of magnitude faster than the recently proposed hyper-Poisson model.
Conway–Maxwell–Poisson (CMP) distributions have seen a recent resurgence in popularity for the analysis of dispersed counts (e.g., Shmueli et al. (2005); Lord et al. (2008); Lord et al. (2010); Sellers and Shmueli (2010)). Key features of CMP distributions include the ability to handle both overdispersion and underdispersion, containing the classical Poisson distribution as a special case, and being a continuous bridge between other classical distributions such as the geometric and Bernoulli distributions. CMP distributions are also full probability models, making them particularly useful for predictions and estimation of event probabilities.
Originally developed as a model for queueing systems with dependent service times (Conway and Maxwell (1962)), CMP distributions have subsequently found novel applications in the modelling of clothing sales, number of syllables of words in the Hungarian dictionary (Shmueli et al. (2005)), and in the regression modelling of airfreight breakages (Sellers and Shmueli (2010)), overdispersed counts of motor vehicle crashes (Lord et al. (2008)) and underdispersed counts of motor vehicle crashes (Lord et al. (2010)). See (Shmueli et al. (2005)) for a more detailed overview of the history, features and applications of CMP distributions.
One of the major limitations of CMP distributions is that it is not directly parametrized via the mean. Instead, the two model parameters, namely, the rate and the dispersion , are interpreted through ratios of successive probabilities via
The moments of CMP distributions satisfy recursive formulas,
but cannot be solved in closed form. For the first moment, an approximation can be obtained as
which is particularly accurate for or (Shmueli et al. (2005)). However, this approximation can be inaccurate outside those ranges. Interestingly, the recent resurgence of CMP distributions in practice has been for analyzing underdispersed counts, but this corresponds to which is precisely the region where the approximation fails, unless the rate is also very large.
The inability to model the mean directly has arguably limited the use of CMP distributions in practice, especially in regression scenarios where an applied scientist might sacrifice any perceived gains in model flexibility for a simpler, more easily interpretable approach such as a log-linear Poisson, Negative-binomial (Neg-Bin) or generalized Poisson (GP, Consul and Famoye (1992); Famoye (1993)) regression model. Other alternatives, such as the log-linear hyper-Poisson (hP) regression model (Sáez-Castillo and Conde-Sánchez (2013)), which is a special case of the exponentially weighted Poisson model (Ridout and Besbeas (2004)), and the Gamma-count (GC) model (Zeviani et al. (2014)), have also been proposed recently. Being able to parametrize CMP distributions via the mean will therefore be invaluable as it allows CMP models to be directly compatible and comparable to a host of competing count regression models.
Of the above-mentioned models, the only two that can handle both underdispersion and overdispersion and have achieved some popularity in practice seem to be the CMP and GP models. (Sellers and Shmueli (2010)) argue that GP models are less attractive than CMP models because GP models achieve underdispersion via a somewhat unnatural truncation of the support of the distribution. In fact, the support and the parameter space are functionally dependent. This is particularly undesirable in regression settings, as the level of underdispersion places an artificial restriction on the allowable range of means, and vice versa. In contrast, the support of CMP distributions is always the non-negative integers for any finite value of the dispersion.
In this article, we show how the means of CMP distributions can be modelled directly. We argue that this will promote their wider use in practice, putting them on the same level of interpretability and parsimony as competing, but less flexible, count regression models whilst retaining all the desirable features of CMP models. The simulation and data analysis examples in this article reinforce that the proposed framework is a simple yet flexible and robust way to model counts. Moreover, the MATLAB routine for fitting the proposed models is up to an order of magnitude faster than the state-of-the-art cmp function from the COMPoissonReg R-package (Sellers and Lotze (2015)) for fitting standard CMP models, and over two orders of magnitude faster than the hp.fit function for fitting hP models in R1
See http://www4.ujaen.es/ ajsaez/hp.fit.r
that accompanies the article by (Sáez-Castillo and Conde-Sánchez (2013)). The MATLAB routine can be downloaded from the Online Supplement.
Parametrizing CMP distributions via the mean
The CMP distribution with rate parameter and dispersion parameter has probability mass function (pmf) given by
where is a normalizing constant. A detailed summary of the key properties of CMP distributions can be found in (Shmueli et al. (2005))—here, we focus on a novel reparametrization via the mean.
Note that for a single sample of data, any reparametrization of the CMP distribution is equivalent (although some parametrizations can be easier to estimate than others). However, for regression settings in which the count responses may change depending on a set of covariates, it is more convenient and interpretable to model the mean of the distribution directly. To this end, let be the mean of the distribution. The CMP distribution with mean and dispersion can then be characterized by the pmf,
where the rate is a function of and , given by the solution to
We call the distribution defined by Equations (2.1)–(2.2) the CMP distribution to distinguish it from the standard CMP distribution.
The probability mass functions of CMP distributions with various levels of dispersion and means are plotted in Figure 1. We see that a large dispersion condenses the distribution around the mean while a small dispersion stretches the distribution away from the mean, in a somewhat analogous manner to the precision parameter in a normal family. In fact, the mean and dispersion turn out to be orthogonal (see Result 2), just like the mean and scale parameters in normal families.
Probability mass functions for underdispersed (white, ), Poisson (grey, ) and overdispersed (black, ) CMP distributions, with means and .
All the key features of CMP distributions hold for CMP distributions also. For example, CMP distributions generalize not only the Poisson distribution but also other well-known discrete distributions. More precisely, CMP distributions form a continuous bridge between the geometric, Poisson and Bernoulli distributions:
Result 2.1 As increases from 0 to , CMP models pass through the following special cases:
ν = 0 ⇒ Geometric{p = 1/(μ+1)}
ν = 1 ⇒ Poisson{μ}
ν → ∞ ⇒ Bernoulli{p = μ}, provided μ≤ 1.
Result 1 can be established via direct calculations (see Appendix A7.1 for details).
In general, implies overdispersion and implies underdispersion relative to a Poisson distribution with the same mean. Note that is in the interior of the parameter space, so we can test whether a Poisson model is adequate using a standard likelihood ratio test (see Result 4 below). In contrast, Neg-Bin models (and other Poisson scale mixtures) cover the Poisson distribution only as a limiting case. While the case degenerates to a Bernoulli distribution, note that any finite dispersion corresponds to the support of being the non-negative integers . This is in contrast to the GP distribution (Famoye (1993)) which achieves underdispersion by a somewhat unnatural truncation of the support—see Sellers and Shmueli (2010) for more discussion on this.
CMP distributions can be shown to form two-parameter exponential families (see Appendix A7.2). This is analogous to the result from Shmueli et al. (2005) for standard CMP distributions. For fixed , CMP distributions also form one-parameter exponential families (see Appendix A7.2), and it is this key property that makes them immediately adaptable to regression modelling via the generalized linear model (McCullagh and Nelder (1989)) framework.
In addition to retaining all the key features of standard CMP distributions, CMP distributions also possess an attractive orthogonality property. This is particularly useful in regression settings, ensuring that the maximum likelihood estimator (MLE) of the regression parameters will be asymptotically efficient and asymptotically independent of the estimated dispersion parameter (cf. Results 4 and 5b). Result 2 below follows from Corollary 1 of (Huang and Rathouz (2017)).
Result 2 (Orthogonality) The mean and dispersion parameter in CMP distri- butions are orthogonal.
Finally, the current approach is also distinct from the reparametrization due to (Guikema and Coffelt (2008)) which treats as the location parameter. There is still no closed form for the mean under this latter parametrization, making it incompatible with simple, easily interpretable mean-models, such as the log-linear model.
CMP regression models
Given a set of covariates , a CMP GLM for count responses can be specified via
where is some mean-model and is a vector of regression coefficients. The examples in this article focus mainly on the log-linear mean-model,
although any other link function can be used, as is the case with any GLM. Note that model (3.1) is a genuine GLM, so the score equations for the regression parameters have the usual ‘weighted least-squares’ form (cf. Fahrmeir and Kaufman (1985)) and can be solved using standard Newton–Raphson or Fisher scoring algorithms (see Result 4). Moreover, all the familiar key features of GLMs (e.g., McCullagh and Nelder (1989) Chapter 2) are retained.
The mean-dispersion specification in Equation (3.1) makes it directly comparable and compatible with a host of widely used log-linear regression models for counts. In particular, the mean is functionally independent of the dispersion parameter , making it similar in structure to the familiar Neg-Bin regression model for overdispersed counts. The dispersion can in turn be modelled via its own regression model using a set of covariates , with the natural link being to ensure non-negativity. The set of covariates can coincide with , share common components with , or be distinct altogether.
A referee pointed out that the mean specification in Equation (3.1) also makes it easy to incorporate offsets into the model. For example, suppose the count responses are collected over spatial regions of possibly different areas or time periods of possibly different durations. The size of each unit of observation is generically called the ‘exposure’, and we can take into account different exposures by simply including offsets in the model,
much in the same way as for classical GLMs (e.g., McCullagh and Nelder (1989) p. 206). In contrast, handling offsets in standard CMP regression models is not as easy, precisely because standard CMP distributions are not parametrized via the mean.
Estimation and inference
The log-likelihood for a generic observation CMP is
from which we can derive the score function for as
where
is the variance of ; details of this derivation are given in Appendix A7.3. Thus, for independent and identically distributed (iid) data, , the MLE for satisfies
so that the MLE is given simply by the sample mean, In contrast, estimation of the rate parameter λ in the standard CMP model is far more complicated, with (Shmueli et al. (2005)) giving three alternative estimation schemes for for the iid case alone. Relatively simple estimation procedures for mean parameter(s) is a key feature of the CMP parametrization.
Result 3 (iid case)For iid data, the MLE for is , the sample mean.
For regression models where for some mean-model , covariate vector and parameter vector , the score functions for the regression coefficients take on the usual weighted least-squares form (cf. Fahrmeir and Kaufman (1985)),
We have the following result.
Result 4 (Regression case)For independent data pairs, the MLE of can be characterized as the solution to the usual GLM score equations,
As is the case with Neg-Bin regression, the default is to assume a common dispersion across all data pairs. However, regression models for the dispersion are also possible (see also Smyth (1989)). Corresponding score functions for the dispersion parameters are given in Appendix A7.4.
The orthogonality property from Result 2 implies that the MLE for the mean parameters are asymptotically efficient and asymptotically independent of the MLE for the dispersion parameters. As CMP regression models are bona fide GLMs, we also have consistency and asymptotic normality of the MLE (Fahrmeir and Kaufman (1985)). We summarize in Result 5 below.
Result 5 (Asymptotic normality, constant dispersion)As,
where the asymptotic variance of is
and the asymptotic variance of is given in the Appendix.
In Result 5, denotes expectation under the design measure for the covariates . This can be either a prescribed design sequence or random sampling. Note that the asymptotic variance of is the same as if the true dispersion were known to begin with, confirming to be asymptotically efficient. Note also that and are asymptotically independent. A generalization of Result 5 for when the dispersion is itself modelled via a regression is given in the Appendix.
MATLAB software for fitting CMP regression Models given in Equation (3.1) to data can be downloaded from the Online Supplement. The usual GLM form of the score equations (Result 4) means that standard Newton–Raphson algorithms can be used. This leads to substantial gains in computational efficiency, with the MATLAB routine being up to an order of magnitude faster than the state-of-the-art cmp function in the COMPoissonReg R-package for fitting standard CMP models, and over two orders of magnitude faster than the hp.fit function for fitting hP models in R (see computer run times in Section 5).
The software also outputs, if requested, estimated standard errors for , obtained from the plug-in estimator of variance,
The log-likelihood value achieved at the maximum can also be outputted.
The estimated variance in Equation (4.2) can be used to carry out inferences based on Wald-tests. An alternative is to use likelihood ratio tests (LRTs), which do not require estimating the variance, yet enjoy asymptotic optimality properties. LRTs are also particularly useful for testing categorical predictors, where it is sensible to either include all levels of a factor or to exclude the factor altogether. These can be carried out using the MATLAB routine by comparing the maximum log-likelihood values achieved with and without model constraints. More precisely, we have the following result, which is analogous to the analysis-of-deviance.
Result 6 (LRT for composite hypotheses) Under , where is a matrix of full rank and is an vector,
A finite-sample adjustment can be made by comparing to a distribution.
We can also use LRTs to examine whether a simpler Poisson model () is adequate.
Result 7 (LRT for testing Poisson) Under ,
Results 1–7 suggest that CMP models offer the best of both worlds—they inherit all the desirable features of CMP models, including the flexibility to handle both over and underdispersion, with the parsimony, interpretability and familiarity of classical GLMs.
For post-fitting model diagnostics, residual plots based on Pearson or deviance residuals can be visually unappealing as they exhibit artificial banding due to the discrete nature of count responses. A probability inverse transformation (PIT) for discrete distributions was developed by (Smith (1985)) as a better diagnostic tool for count models. If the underlying distributional model is correct, then the PIT should resemble a random sample from a standard uniform distribution. Goodness-of-fit can then be assessed by plotting a histogram of the PIT, or via a quantile plot of the PIT against the uniform distribution.
PIT-uniform quantile plots for fitted CMP regression models for the (a) days absent, (b) takeover bids and (c) cotton bolls count datasets from Sections 5.1, 5.3 and 5.4, respectively.
As a quick demonstration, three examples of PIT-uniform quantile plots are given in Figure 2. These come from three data analysis examples examined later in Section 5. In short, all three plots show reasonable closeness to uniformity, indicating that the fitted CMP regression models are appropriate in each case. Detailed discussions of these three examples are given in Sections 5.1, 5.3 and 5.4, respectively.
Examples and simulations
We examine the finite-sample performance of the proposed method using various examples and a simulation study based on real datasets. We also compare to a range of competing models for dispersed counts, including the Neg-Bin, GP, hP, GC and standard CMP models. All models were fit using a Windows desktop with an i7-4770 CPU running at 3.40 GHz and 16.0GB RAM. Neg-Bin models were fit using the glm.nb function in the MASS package in R. GP models were fit using the R code downloaded from Famoye's webpage.2
See http://people.cst.cmich.edu/famoy1kf/appliedstat/.
See http://www.leg.ufpr.br/ walmes/papercompanions/gammacount2014/papercomp.html
accompanying the paper by (Zeviani et al. (2014)). Standard CMP models were fit using the cmp function in the COMPoissonReg package in R. Finally, the CMP model was fit using the MATLAB code provided in the Online Supplement. While some of these implementations are written on different platforms and perhaps to different levels of computational sophistication, they nevertheless represent the state-of-the-art for their corresponding methods. A comparison of these implementations is therefore of relevance and value to someone who is looking to use these methods today.
Overdispersed class attendance data: Analysis
See http://www.ats.ucla.edu/stat/stata/dae/nb_data.dta
examines the relationship between the number of days absent from high school and the gender, maths score (standardized score out of 100) and academic programme (‘General’, ‘Academic’ and ‘Vocational’) of 314 students sampled from two urban high schools. An insightful plot of the data, along with some summary statistics, can be found on the website.6
See http://www.ats.ucla.edu/stat/r/dae/nbreg.htm
These indicate that the number of days absent exhibits strong overdispersion. We first compare fits to the original data using log-linear Neg-Bin, GP, CMP, hP, GC and standard CMP regression models. We then look at a simulation study based on the dataset.
The estimated coefficients, standard errors, dispersions, maximized log-likelihoods and computer run times from five of the six competing models (the GC model did not converge) are given in Table 1. We see that the Neg-Bin, CMP, hP and standard CMP models all give similar maximized log-likelihood values, indicating comparable goodness-of-fits overall, with the GP model being somewhat inferior for these data. Moreover, the PIT quantile plot in Figure 2a shows close agreement with the uniform distribution, indicating that the CMP model indeed fits the data well.
Attendance dataset—Estimated coefficients, standard errors (se), dispersion, AIC and computer run times for competing models. Sample size .
,
Neg-Bin
GP
CMPμ
CMP
hP
GC
Coefficient
Estimate (se)
Estimate (se)
Estimate (se)
Estimate (se)
Estimate (se)
Estimate (se)
intercept
2.707 (0.204)
2.673 (0.238)
2.715 (0.190)
0.018 (0.057)
2.727 (0.180)
gender (Male)
0.211 (0.122)
0.194 (0.129)
0.215 (0.117)
0.035 (0.018)
0.217 (0.116)
did
programme (Academic)
0.425 (0.182)
0.421 (0.216)
0.425 (0.170)
0.050 (0.020)
0.431 (0.161)
not
programme (Vocational)
1.253 (0.200)
1.248 (0.228)
1.254 (0.190)
0.232 (0.041)
1.261 (0.185)
converge
math
0.006 (0.002)
0.006 (0.003)
0.006 (0.002)
0.001 (0.000)
0.006 (0.002)
dispersion
1.047
0.292
0.020
0.021
467.024
log-likelihood
864.2
870.9
864.5
864.0
863.9
run time (seconds)
0.05
0.07
6.7
41.4
3 416
The Neg-Bin, GP, CMP and hP models are log-linear models, so they are easy to interpret. For example, the fitted CMP model estimates that students in the General programme are expected to miss times more days of school compared to students in the Vocational programme, with female students expected to miss times more days of school compared to male students. A 10-point increase in maths scores is associated with a times reduction in the expected days of absence from school. Finally, the estimated dispersion parameter of is much smaller than 1, reflecting the strong overdispersion exhibited by the data. Interpretations of the other log-linear models are analogous. While the hP model achieved the largest maximized log-likelihood here (and only slightly), it took over 3 416 seconds, or 57 minutes, to fit.
For the standard CMP model, the interpretation of coefficients is more opaque. The estimated dispersion parameter is , so we can use the approximation
where the rate is given by
and is the indicator function. We can ascertain from the fitted model the direction and statistical significance of the effect of each covariate—for example, students with higher math scores tend to be absent significantly less often—but we cannot give an interpretation of the coefficients in the form of mean contrasts.
The estimated dispersion under the CMP and standard CMP regression models are essentially identical, indicating that both models are picking up similar levels of overdispersion despite modelling the counts on slightly different scales. This is consistent with Result 2, which states that the dispersion parameter is orthogonal to the mean-model.
Overdispersed class attendance data: Simulations
We now assess the finite-sample accuracy of the competing methods using data simulated from the fitted Neg-Bin model. More precisely, we first sample sets of covariates from the original dataset, then conditional on these covariates we generate count responses via
where the mean is given by the fitted model
The sample sizes and correspond to small, moderate and the original sample sizes, respectively.
For each synthetic dataset, we fit expanded models including all second-order interactions. The expanded model can be written in the model notation of (McCullagh and Nelder (1989)) as
We then examine Type I errors for simultaneously dropping all second-order interactions using the likelihood ratio test of Result 5, calibrated against the distribution. We compare these to Type I errors obtained from the Neg-Bin, GP and standard CMP regression models, also calibrated against the distribution. We did not compare to the GC model as it failed to converge for the majority ( 70%) of our simulated datasets. We also did not compare to the hP model as it took over 10 minutes to fit each dataset of sample size and over 50 minutes for each dataset of sample size . For the Neg-Bin, GP and CMP models, a total of 5 000 simulated datasets were used for each setting, but only 1 000 simulations were used for the standard CMP model due to the slow computational speeds (see average run times in Table 2). Finally, note that the CMP, GP and standard CMP models are all misspecified for Neg-Bin data.
Type I errors (%) and average computer run times (seconds) for dropping all second-order interactions using four competing methods on Neg-Bin data. Sample sizes of and . simulations per setting for Neg-Bin, GP and CMP models. simulations per setting for CMP model.
,Nominal level
average run time (s)
sample size
50
100
314
50
100
314
50
100
314
50
100
314
Neg-Bin
1.5
1.4
1.0
8.2
6.5
5.1
16.4
12.0
10.5
0.02
0.03
0.04
GP
2.4
1.3
0.6
9.2
6.2
3.1
17.1
11.8
7.2
0.03
0.05
0.13
CMP
1.5
1.5
1.4
7.9
6.7
6.1
15.5
12.2
12.3
1.6
2.8
7.9
CMP
3.0
1.8
2.8
14.3
7.8
8.8
20.1
15.3
16.6
43.2
80.5
229.8
The Type I errors at nominal 1%, 5% and 10% levels for each method are displayed in Table 2. We first see that while the LRT based on the correctly specified Neg-Bin model is asymptotically exact, it can still be rather biased for small sample sizes. Of the misspecified models, the proposed CMP model is the most robust, showing closest agreement with the Neg-Bin Type I errors. For smaller sample sizes, the GP model does reasonably well but its Type I errors become rather conservative for large samples. The standard CMP model shows worst performance overall, reflecting the limitation of having to model the mean on a different scale.
Also displayed in Table 2 are the average computer run times for each simulation under each model. We see that the two CMP-based models are the most computationally expensive, but this is not unexpected given the flexibility these models offer. More interesting, perhaps, is that the proposed CMP model is an order of magnitude faster to fit than the standard CMP model, which, when coupled with its increased parsimony and interpretability, makes it arguably a more appealing model to use in practice. This reinforces the earlier claim that CMP models combine the attractive features of standard CMP models with the familiarity and simplicity of a GLM. The Neg-Bin and GP models were the fastest to fit, as these do not involve approximating a normalizing constant at each iteration.
Underdispersed takeover bids
A dataset from (Cameron and Johansson (1997)) gives the number of bids received by 126 US firms that were successful targets of tender offers during the period 1978–1985, along with the following set of explanatory variables (descriptions taken from Sáez-Castillo and Conde-Sánchez (2013)):
Defensive actions taken by management of target firm: indicator variable for legal defence by lawsuit (leglrest), proposed changes in asset structure (rearest), proposed change in ownership structure (finrest) and management invitation for friendly third-party bid (whtknght).
Firm-specific characteristics: bid price divided by price 14 working days before bid (bidprem), percentage of stock held by institutions (insthold), total book value of assets in billions of dollars (size) and book value squared (size).
Intervention by federal regulators: an indicator variable for Department of Justice intervention (regulatn).
Some summary statistics of the data are given in Table A1 in the Appendix. A key feature of the dataset is that it exhibits strong underdispersion after accounting for the explanatory variables. (Sáez-Castillo and Conde-Sánchez (2013)) used these data to show that their proposed hP regression model fits better than competing Poisson and order-1 Poisson polynomial models. Here, we compare fits to the data using the log-linear hP, GP, CMP and standard CMP regression models. Interestingly, the GC model did not converge for these data either. The data are available from the Edcat R-package (Croissant (2011)).
The estimated coefficients, absolute -statistics, dispersion parameters, log-likelihood values and computer run times from the four competing models are given in Table 3. We see that the hP model gives the best fit to these data in terms of the maximized log-likelihood, but this is not surprising as the example was chosen by (Sáez-Castillo and Conde-Sánchez (2013)) to highlight the hP method. While the CMP model offers a slightly inferior fit to these data, it is better than the GP and standard CMP models. The PIT quantile plot in Figure 2b also shows good agreement with the uniform distribution, indicating that the CMP model is indeed a good fit.
Takeover bids dataset—Estimated coefficients, absolute -statistics (), dispersion, maximized log-likelihood and computer run times for competing models. Sample size .
,
hP
GP
CMP
CMP
GC
Coefficient
Estimate ()
Estimate ()
Estimate ()
Estimate ()
Estimate ()
intercept
1.033 (2.61)
0.939 (1.83)
0.990 (2.27)
1.836 (2.55)
leglrest
0.246 (2.18)
0.265 (1.82)
0.268 (2.18)
0.389 (2.04)
rearest
0.276 (1.87)
0.178 (0.97)
0.173 (1.12)
0.297 (1.24)
did
finrest
0.107 (0.65)
0.067 (0.33)
0.068 (0.39)
0.112 (0.42)
not
whtknght
0.492 (4.41)
0.483 (3.14)
0.481 (3.66)
0.707 (3.38)
converge
bidprem
0.704 (2.51)
0.646 (1.79)
0.685 (2.23)
1.015 (2.13)
insthold
0.390 (1.17)
0.375 (0.92)
0.368 (1.06)
0.532 (1.02)
size
0.179 (4.03)
0.182 (3.27)
0.179 (3.77)
0.275 (3.30)
size
0.008 (3.25)
0.008 (2.75)
0.008 (3.05)
0.012 (2.79)
regulatn
0.010 (0.07)
0.030 (0.20)
0.038 (0.29)
0.039 (0.20)
dispersion
2.624
0.021
1.754
1.727
log-likelihood
170.2
184.6
180.1
180.4
run time (seconds)
1 481
0.08
0.8
25.7
The hP model took 1 481 seconds, or 24.7 minutes, to fit whereas the CMP model took only 0.8 seconds. The standard CMP model took over 25 seconds to converge and offers an almost identical fit to the CMP model. Of all these methods, the GP model was the fastest to fit, but its maximized log-likelihood was quite a bit lower. Note that the estimated dispersions from the CMP and standard CMP models are again similar, reflecting the orthogonality property from Result 2.
Underdispersed cotton boll counts
(Zeviani et al. (2014)) describe a dataset from a greenhouse experiment using cotton plants to examine the effect of five defoliation (def) levels (0%, 25%, 50%, 75% and 100%) on the observed number of cotton bolls produced on by the plants at five growth stages: vegetative, flower-bud, blossom, fig and cotton boll. The data exhibit strong underdispersion, which can be identified from the within-group means and variances as the experimental design was replicated five times (see Figure 3). Another plot of the data showing this underdispersion is given in Figure 2 of (Zeviani et al. (2014)). This motivated (Zeviani et al. (2014)) to formulate a GC process, which generalizes the Poisson process by replacing the exponential waiting times between successive counts with Gamma-distributed waiting times.
Here, we compare goodness-of-fits of the GC model to the log-linear GP, hP, CMP and standard CMP regression models over a set of five nested linear predictors from (Zeviani et al. (2014)). In the following, the index indicates the growth stage.
The link function is the for the GC, GP, hP and CMP models, but it has no closed form for the standard CMP model. Note that the parameter in the GC model characterizes the mean waiting time between successive counts rather than the mean of the count process.
Cotton bolls dataset—AIC and computer run times for five sets of predictors and five competing methods. Sample size .
,
AIC
Run time (seconds)
Predictor
np
GC
GP
hP
CMP
CMP
GC
GP
hP
CMP
CMP
I
2
548.79
545.22
558.12
548.96
0.02
0.02
148
1.4
II
3
520.70
514.92
541.84
520.96
520.92
0.03
0.02
213
1.6
5.5
III
4
519.96
514.28
542.32
520.20
520.18
0.05
0.02
327
1.5
6.9
IV
8
456.29
460.32
520.28
456.48
456.40
0.11
0.02
2 488
6.7
18.5
V
12
440.77
453.75
522.42
440.82
440.50
0.29
0.03
2 317
8.6
31.5
The AIC and computer run times for the five predictor models and five competing methods are given in Table 4. We see that the GC, CMP and standard CMP models all give rise to essentially the same AICs for all 5 predictor models, indicating similar goodness-of-fits overall, with the GP and hP models being noticeably inferior. The CMP model is a log-linear model for the mean count, and so it is easier to interpret than the standard CMP model or the GC model, with the latter being a log-linear model for the mean time between counts and not the mean count directly. Moreover, the computation speed were three times faster for the CMP model than the standard CMP model, and over 100 times faster than the hP model. This again reinforces the claim that the CMP parametrization combines the flexibility of standard CMP models with the familiarity and computational efficiency of GLMs. Interestingly, the cmp software for standard CMP regression could not fit intercept-only models, which is why the entries for predictor I are empty.
Cotton bolls dataset—Observed values, fitted mean curves and 95% confidence intervals for the CMP model.
A plot of the predicted mean curves from the fitted CMP model, along with corresponding 95% confidence intervals, is given in Figure 2. These are essentially identical to those from the GC model (c.f. Figure 3 in Zeviani et al. (2014)), indicating that the CMP model matches the ‘tailor-made’ GC method for these data, in terms of both overall goodness-of-fit and subsequent inferences. This again demonstrates the flexibility and robustness of CMP models for real data analysis problems.
Cotton bolls dataset—Estimated coefficients, absolute -statistics (), dispersion parameters, maximized log-likelihood and computer run times for full model (predictor V) using five competing methods. Sample size .
,
GC
GP
CMP
CMP
hP
Coefficient
Estimate ()
Estimate ()
Estimate ()
Estimate ()
Estimate ()
intercept
2.234 (79.7)
2.200 (74.2)
2.190 (74.6)
10.895 (7.76)
2.196 (36.9)
0.412 (1.81)
0.413 (1.66)
0.436 (1.82)
2.019 (1.77)
0.617 (1.28)
0.763 (2.95)
0.796 (2.64)
0.805 (2.96)
3.724 (2.78)
1.020 (1.88)
0.274 (1.22)
0.240 (1.01)
0.288 (1.22)
1.343 (1.21)
0.090 (0.17)
0.464 (1.85)
0.443 (1.61)
0.486 (1.85)
2.265 (1.81)
0.268 (0.50)
1.182 (4.43)
1.366 (3.66)
1.249 (4.41)
5.748 (3.88)
1.202 (2.18)
0.645 (2.15)
0.796 (1.87)
0.680 (2.13)
3.133 (2.08)
0.621 (1.00)
0.320 (1.28)
0.078 (0.26)
0.351 (1.33)
1.596 (1.30)
0.370 (0.71)
1.199 (4.04)
0.938 (2.46)
1.290 (4.08)
5.894 (3.66)
1.347 (2.19)
0.007 (0.03)
0.023 (0.10)
0.009 (0.04)
0.038 (0.03)
0.245 (0.51)
0.018 (0.08)
0.002 (0.01)
0.020 (0.08)
0.090 (0.08)
0.257 (0.49)
dispersion
5.112
0.06
4.86
4.88
0.029
AIC
440.77
453.75
440.82
440.50
522.43
run time (seconds)
0.29
0.33
8.5
31.5
2 317
For completeness, Table 5 displays the estimated coefficients, absolute -statistics (), dispersion, maximized log-likelihood and computer run times for the full model (predictor V) using the five competing methods. We see that the -statistics from the CMP model are essentially identical to those from the GC model, indicating that the CMP model is making as efficient use of the data as the ‘tailor-made’ GC approach. Finally, the PIT quantile plot for the full model in Figure 2c confirms that the CMP model is indeed a good fit to the data.
Conclusion
CMP distributions have already proven to be flexible and robust models for dispersed counts. By parametrizing CMP distributions via the mean, we have placed CMP models on the same level of interpretability and parsimony as other popular, competing count models, while retaining all the key features of CMP distributions that have made them increasingly attractive for the analysis of dispersed count data. We anticipate that the proposed CMP parametrization will facilitate even wider adoption of CMP-based models in applied count regression problems, allowing them to be directly comparable to a host of competing log-linear models for counts. Moreover, the simple GLM structure of the proposed method makes it easy to understand on a conceptual level, and relatively easy to fit on a computational level, with computational times being significantly faster than for the similarly flexible hP and standard CMP models. While no one model outperformed every other model in the data analysis examples and simulation study considered in this article, it should be noted that the CMP model was always most comparable to the ‘tailor-made’ method for each of the examples. At the very least, CMP count regression models should be a useful addition to any applied statisticians toolkit.
Acknowledgments
The author thanks the editor, associate editor and two anonymous reviewers for comments and suggestions that improved the article. The author also thanks Dr Thomas Fung for his help on probability inverse transforms for discrete distributions.
Appendix
A Technical details and miscellaneous results
1.1 Result 1
For the case , solving Equation (2.2) gives and the normalizing constant is . Thus, is the pmf of a Poisson distribution with mean .
For the case , solving Equation (2.2) gives and the normalizing constant is . Thus, is the pmf of a geometric distribution with probability parameter .
As , the term in the denominator tends to for , so that only takes value or with probability proportional to and , respectively. Thus, the CMP distribution approaches a Bernoulli distribution with probability parameter , from Equation (2.2).
we see that CMP distributions form two-parameter exponential families with sufficient statistics and and corresponding natural parameters and , respectively.
For fixed dispersion , we see that CMP distributions also form one-parameter exponential families, with sufficient statistic and natural parameter . Note that for a fixed or given dispersion , the natural parameter and normalizing constant are viewed as functions of alone.
1.3 Score function for μ
First, recall that , so that
by the definition of as the mean of the distribution. The derivative of the log-likelihood given in Equation (4.1) with respect to therefore simplifies to the expression
Next, recall that is the solution to . Implicit differentiation with respect to gives
The last equality is true because by the definition of as the mean of the distribution. We therefore have , where
is the variance of .
The score function for is therefore given by
as claimed in Section 4.
1.4 Score function for ν
The score function for is
which is immediately seen to be unbiased, that is, . For notational convenience, define operators and by and , respectively, so that can be expressed more succinctly as
1.5 MLE for ν for iid data
For iid data CMP, the MLE can be characterized as the solution to , where is the empirical expectation operator. This can be simplified to a moment-matching equation,
1.6 MLE for ν in the regression case, and asymptotic variance of ν̂ from Result 4
For regression problems with independent data pairs , the MLE is characterized as the solution to the score equation
and its asymptotic variance is given by
where and denotes expectation over the design measure of .
1.7 Generalization of Result 4 for
If itself is modelled via a regression for some function , covariates and corresponding parameter vector , then the score function for is given by
We then have the following generalization of Result 5.
Result 5b (Asymptotic normality, non-constant dispersion). As ,
where the asymptotic variance of is given by
the asymptotic variance of is given by
and denotes the expectation with respect to the design measure of .
1.8 Summary statistics for the takeover bids dataset
Takeover bids dataset—Summary statistics
Numericals variables
Binary variables
Variable
Min
Max
Mean
sd
Variable
Percentage (%)
numbids
0
10
1.74
1.43
leglrest
42.9
bidprem
0.94
2.07
1.35
0.19
rearest
18.3
insthold
0
0.90
0.25
0.19
finrest
10.3
size
0.02
22.17
1.22
3.10
regulatn
27.0
whtknght
59.5
References
1.
CameronACJohanssonP (1997) Count data regression using series expansions: With applications. Journal of Applied Econometrics, 12, 203–23.
2.
ConsulPCFamoyeF (1992) Generalized poisson regression model. Communications in Statistics—Theory and Methods, 21, 89–109.
3.
ConwayRWMaxwellWL (1962) A queuing model with state dependent service rates. Journal of Industrial Engineering, 12, 132–36.
4.
CroissantY (2011) Ecdat: Datasets for econometrics, R Package, version 0.1–6.1.
5.
FahrmeirLKaufmanH (1985) Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Annals of Statistics, 13, 342–68.
6.
FamoyeF (1993) Restricted generalized poisson regression model. Communications in Statistics—Theory and Methods, 22, 1335–54.
7.
GuikemaSDCoffeltJP (2008) A flexible count data regression model for risk analysis. Risk Analysis, 28, 213–23.
8.
HuangARathouzPJ (2017) Orthogonality of the mean and error distribution in generalized linear models. Communications in Statistics—Theory and Methods, 46, 3290–96.
9.
LordDGuikemaSDGeedipallySR (2008) Application of the Conway–Maxwell–Poisson generalized linear model for analyzing motor vehicle crashes. Accident Analysis and Prevention, 40, 1123–34.
10.
LordDGuikemaSDGeedipallySR (2010) Extension of the application of Conway–Maxwell–Poisson models: Analyzing traffic crash data exhibiting underdispersion. Risk Analysis, 30, 1268–76.
11.
McCullaghPNelderJA (1989) Generalized linear models. Boca Raton: Chapman and Hall.
12.
Sáez-CastilloAJConde-SánchezA (2013) A hyper-Poisson regression model for overdispersed and underdispersed count data. Computational Statistics & Data Analysis, 61, 148–57.
13.
SellersKFLotzeT (2015) COM Poisson Reg: Conway–Maxwell–Poisson regression models, R Package, version 0.3.5.
14.
SellersKFShmueliG (2010) A flexible regression model for count data. Annals of Applied Statistics, 4, 943–61.
15.
SellersKFBorleSShmueliG (2012) The COM-Poisson model for count data: A survey of methods and applications. Applied Stochastic Models in Business, 28, 104–16.
16.
ShmueliGMinkaTPKadaneJBBorleSBoatwrightP (2005) A useful distribution for fitting discrete data: Revival of the Conway–Maxwell–Poisson distribution. Journal of the Royal Statistical Society, Series C, (Applied Statistics), 54, 127–42.
17.
SmithJQ (1985) Diagnostic checks of non- standard time series models. Journal of Forecasting, 4, 283–91.
18.
SmythGK (1989) Generalized linear models with varying dispersion. Journal of the Royal Statistical Society. Series B, 51, 47–60.
19.
RidoutMSBesbeasP (2004) An empirical model for underdispersed count data. Statistical Modelling, 4, 77–89.
20.
ZevianiWMRiberioPJBonatWHShimakuraSEMunizJA (2014) The Gamma-count distribution in the analysis of experimental underdispersed data. Journal of Applied Statistics, 41, 2616–26.