In this article, we attempt to introduce a count data model which is obtained by compounding Poisson distribution with Xgamma distribution. Important mathematical and statistical properties of the distribution have been derived and discussed. Parameter estimation is discussed using the maximum likelihood method of estimation followed by Monte Carlo simulation to investigate the behavior of the ML estimators. Finally, two real-life data sets are analyzed to investigate the suitability of the proposed distribution in modeling count data.
Researchers obtain plethora of probability models for the sake of analyzing many types of data from various fields, such as medicine, transport, engineering, agriculture and so on. Lots of well known techniques are employed to serve the purpose of constructing new probability distributions. Some well known techniques like discretization, T-X family, and compounding technique provides a very powerful way to extend common parametric families of distributions to fit data sets not adequately fit by classical distributions. Regarding the compound of probability distributions, the work has been done in this particular area since 1920. It is well known that Greenwood and Yule (1920) established a relationship between Poisson distribution and a negative binomial distribution through compounding mechanism by treating the rate parameter in Poisson distribution as a gamma variate. Skellam (1948) derived a probability distribution from the binomial distribution by regarding the probability of success as a beta variable between sets of trials. Lindley (1958) suggested a one parameter distribution to illustrate the difference between fiducial distribution and posterior distribution. Dubey (1970) derived a compound gamma, beta and F distribution by compounding a gamma distribution with another gamma distribution and reduced it to the beta 1 and beta 2 kind and to the F distribution by suitable transformations. Gerstenkorn (1993, 1996) proposed several compound distributions, he obtained compound of gamma distribution with exponential distribution by treating the parameter of gamma distribution as an exponential variate and also obtained compound of Pólya with beta distribution. Mahmoudi et al. (2010) generalized the Poisson-Lindley distribution of Sankaran (1970) and showed that their generalized distribution has more flexibility in analyzing count data. Zamani and Ismail (2010) constructed a new compound distribution by compounding negative binomial with one parameter Lindley distribution that provides good fit for count data where the probability at zero has a large value. A new generalized negative binomial distribution was proposed by Gupta and Ong (2004), this distribution arises from Poisson distribution if the rate parameter follows generalized gamma distribution; the resulting distribution so obtained was applied to various data sets and can be used as better alternative to negative binomial distribution. Adil et al. (2016) proposed a new competitive count data model by compounding negative binomial distribution with Kumaraswamy distribution that finds its application in biological sciences. Para and Jan (2018) introduced two count data models in medical sciences using compounding technique on discrete version of Weibull and inverse Weibull distribution by treating probability parameter as a random variable following beta distribution.
In this paper, we propose a new discrete count data model by compounding Poisson distribution with Xgamma distribution (Sen et al., 2016), as there is a need to find more flexible model for analyzing statistical data. The merit of the introduced distribution comes in its ability to describe all types of dispersions for data as shall be shown later. Also, the probability mass function of the distribution displays unimodal and right skewed shapes. Moreover, the capability of the distribution comparing to some of existing distribution is investigated, although existence of excess of zero, by fitting practical integer-valued data sets in application section.
Definition of proposed model (Poisson Xgamma distribution)
If , where is itself a random variable following Xgamma distribution (Sen et al., 2016) with parameter , then determining the distribution that results from marginalizing over will be known as a compound of Poisson distribution with that of Xgamma distribution, which is denoted by -Xgamma .
Theorem 1. The probability mass function of a Poisson Xgamma distribution i.e., -Xgamma is given by
Proof The pmf of a Poisson Xgamma distribution, i.e., -Xgamma can be obtained by
when its parameter follows Xgamma distribution (XGD) with pdf
We have
which is the pmf of -Xgamma .
Figure 1 exhibits the pmf plot of the proposed model for different values of the parameter .
Index of dispersion (IOD) plot of Poisson Xgamma distribution.
The corresponding cdf of Poisson Xgamma distribution is obtained as:
Figure 2 exhibits the cdf plot of the proposed model for different values of the parameter .
cdf plot of discrete Poisson Xgamma distribution.
Random data generation from Poisson Xgamma distribution
In order to simulate the data from Poisson Xgamma distribution, we employ the inverse cdf method. Simulating a sequence of random numbers of the Poisson Xgamma random variable with pmf and a cdf , where k may be finite or infinite can be described as in the following steps.
Step1: Generate a random number from uniform distribution U (0,1).
Step2: Generate random number based on
In order to generate n random numbers from -Xgamma distribution, repeat the step 1 to step times. We have employed R software for running the simulation study of the proposed model.
Statistical properties
In this section, different structural properties of the Poisson Xgamma model have been evaluated. These include moments, index of dispersion, moment generating function and probability generating function.
Moments
Factorial moments
Using Eq. (1), the th factorial moment about origin of the -Xgamma can be obtained as
Taking , we get
Taking in Eq. (3), the first four factorial moments about origin of Poisson Xgamma distribution can be obtained as
Index of dispersion (IOD) plot of Poisson Xgamma distribution.
Moments about origin (raw moments)
The first four moments about origin, using the relationship between factorial moments about origin and the moments about origin, of -Xgamma can be obtained as
Mean, variance and index of dispersion of the proposed model are given by
Figure 3 displays the index of dispersion (IOD) plot of Poisson Xgamma distribution, and shows that the distribution can describe all types of dispersions for data.
Probability generating function of the Poisson Xgamma distribution
We derive moment generating function and probability generating function of PXGD in this section.
Theorem 2. If has -Xgamma , then the probability generating function has the following form
Proof We begin with the well known definition of the probability generating function given by
Theorem 3. If has -Xgamma , then the moment generating function has the following form
Proof We begin with the well known definition of the probability generating function given by
Reliability analysis
In this section, we have been obtained the reliability function, hazard rate function and reverse hazard rate function of the proposed Poisson Xgamma model.
Reliability function R (x)
The reliability function is defined as the probability that a system survives beyond a specified time. It is also referred to as survival or survivor function of the distribution. It can be computed as complement of the cumulative distribution function of the model. The reliability function or the survival function of Poisson Xgamma distribution is calculated as:
Hazard function
The hazard function is also known as hazard rate, instantaneous failure rate or force of mortality is given as:
Reverse hazard rate function
The reverse hazard rate function of Poisson Xgamma distribution is given as:
Order statistics
Let be the ordered statistics of the random sample drawn from the discrete distribution with cumulative distribution function and probability mass function , then the probability mass function of rth order statistics is given by:
Using the Eqs (1) and (2), the probability density function of rth order statistics of Poisson Xgamma distribution is given by:
Then, the pmf of first order Poisson Xgamma distribution is given by:
and the pmf of nth order Poisson Xgamma model is given as:
Estimation of parameters
In this section, we discuss the parameter estimation of the Poisson Xgamma distribution using method of maximum likelihood estimation and the frequentist approach such as the method of moments.
Method of Maximum Likelihood Estimation
This is one of the most useful methods for estimating the different parameters of the distribution. Let be the random sample of size , drawn from Poisson Xgamma distribution (PXGD), then the likelihood function of PXGD is given as
The solution of the Eq. (4) is not possible in a closed form, so by using numerical computation, the solution of the log-likelihood Eq. (4) will provide the MLE of .
Regularity conditions of MLE in Poisson Xgamma distribution
Lemma 1. Suppose that the data is generated from a Poisson Xgamma distribution with unknown parameters and is the MLE. We have for any , of Poisson Xgamma distribution, . Moreover, the inequality is strict, unless , which means that .
Proof Let us consider the difference
Since , we can write
This proves that .
Theorem 4. Under some regularity conditions on the family of distributions, MLE of Poisson Xgamma Distribution is consistent, i.e. as .
Proof We have the following facts:
is the maximizer of (by definition of ML estimation).
is the maximizer of (by Lemma 4.1).
, we have (by Law of Large numbers).
Therefore, since two functions and are getting closer, the points of maximum should also get closer which exactly means that .
Asymptotic normality of ML estimates of PXGD
We want to show the asymptotic normality of MLE in Poisson Xgamma Distribution, i.e. to show that for some variance . This asymptotic variance in some sense measures the quality of MLE is the Fisher Information regarding .
Proof Since MLE is maximizer of , we have .
Let us use the mean value theorem,
then we can write, from here we get that
Since by Lemma (1) in the previous section, we know that is maximizer of , we .
Therefore, the numerator in
converges in distribution by central Limit theorem.
Next, let us consider the denominator in Eq. (5). First of all, we have that for all , by large law of numbers.
Also since, and by consistency result in Theorem 4, , we have .
Finally, the variance . Hence , where in the last equality, we used the definition of Fisher information.
Simulation study
In this section, we investigate the behavior of the ML estimators for a finite sample size n. Simulation study based on different samples from -Xgamma distribution is carried out. The random observations are generated by using the inverse cdf method presented in Section 2.1 from -Xgamma . A simulation study was carried out for six random parameter values as and the process was repeated 1000 times by taking different sample sizes (10, 25, 75, 100, 300, 600). The simulated results are given in Table 1. We observe in Table 1 that the agreement between theory and practice improves as the sample size increases, as all the measures of comparison decrease by increasing the sample size. MSE and Variance of the estimators suggest that the estimators are consistent and the maximum likelihood method performs quite well in estimating the model parameters of the proposed distribution.
Simulation study of ML estimators of Poisson Xgamma distribution
Sample size ()
Bias
Variance
MSE
Coverage probability (95%)
Bias
Variance
MSE
Coverage probability (95%)
10
0.01369
0.01107
0.01126
0.95200
0.03174
0.01486
0.01587
0.94900
25
0.01227
0.00413
0.00428
0.95800
0.02012
0.00314
0.00355
0.95800
75
0.00400
0.00108
0.00109
0.95800
0.01005
0.00166
0.00176
0.96700
100
0.00232
0.00072
0.00072
0.96700
0.00141
0.00064
0.00064
0.97100
300
0.00122
0.00031
0.00031
0.95300
0.00003
0.00021
0.00021
0.96900
600
0.00091
0.00015
0.00015
0.96200
0.00002
0.00006
0.00006
0.98100
Sample size ()
Bias
Variance
MSE
Coverage probability (95%)
Bias
Variance
MSE
Coverage probability (95%)
10
0.02381
0.05321
0.05378
0.94900
0.16088
0.01476
0.04065
0.94900
25
0.01676
0.04645
0.04673
0.95800
0.09607
0.00304
0.01227
0.95800
75
0.01103
0.03919
0.03931
0.96700
0.01179
0.00156
0.00170
0.96700
100
0.09031
0.02115
0.02931
0.97100
0.02803
0.00054
0.00132
0.97100
300
0.00759
0.00930
0.00936
0.96900
0.00931
0.00011
0.00019
0.96900
600
0.00128
0.00011
0.00011
0.98100
0.00293
0.00004
0.00005
0.98100
Sample size ()
Bias
Variance
MSE
Coverage probability (95%)
Bias
Variance
MSE
Coverage probability (95%)
10
0.23858
0.53261
0.58953
0.94900
1.07940
9.67985
10.84496
0.92900
25
0.16810
0.46495
0.49321
0.95800
0.59866
2.92271
3.28110
0.94800
75
0.11076
0.39242
0.40469
0.96700
0.05246
0.37736
0.38011
0.95100
100
0.09081
0.21201
0.22025
0.97100
0.06917
0.28800
0.29278
0.96800
300
0.00809
0.09350
0.09356
0.96900
0.06518
0.28009
0.28433
0.96600
600
0.00278
0.00159
0.00159
0.98100
0.01618
0.09562
0.09588
0.97100
Moments method of estimation
In order to estimate the unknown parameter of discrete Poisson Xgamma model by the method of moments, we need to equate first sample moment with the corresponding population moment. where is the first sample moment and is the first population moment and the solution for may be obtained by solving above equation through numerical methods. Equating the first population moment about origin with the first sample moment of -Xgamma , we have
which is the moment estimate of the parameter . It should be noted that statistic for has certain mathematical restrictions to be greater than zero. Hence, the method is not preferred for parameter estimation of -Xgamma .
Applications of Poisson Xgamma distribution
In this section, we fit our proposed distribution to two practical datasets, the first representing epileptic seizure counts (Chakraborty, 2010) to illustrate our claim that our proposed model fits well when compared to other competing models. The data set representing epileptic seizure counts has a long right tail and approaches to zero slowly. The data set is given in Table 2. The second data set will be shown in next paragraphs.
In each of these distributions, the parameters are estimated by using the maximum likelihood method. We have analyzed the data using R software. Parameter estimates along with standard errors in brackets and model function of the fitted distributions are given in Table 3. Computationally, the maximum likelihood estimates for the parameters of interest were obtained by Newton Raphson method.
Estimated Parameters by ML method for fitted distributions for dataset representing epileptic seizure counts
Distribution
Parameter estimates (standard error)
Model function
Poisson Xgamma distribution
Poisson distribution
Zero inflated poisson
Geometric distribution
Negative binomial distribution
Discrete weibull
Discrete lindley
Poisson quasi-lindley
We compute the expected frequencies for fitting Poisson Xgamma, Poisson, Zero Inflated Poisson, Geometric, Negative Binomial, discrete Weibull (Nakagawa & Osaki, 1975), discrete Lindley (Bakouch et al., 2012) and Poisson Quasi Lindley (Altun, 2019) distributions with the help of R studio statistical software (R version 3.5.3, 2019) and Pearson’s chi-square test is applied to check the goodness of fit of the models. The expected counts and chi square -value for each fitted model are given in Table 4. Based on the chi-square -value, we observe that Poisson Xgamma distribution provides a satisfactorily better fit for the data set representing epileptic seizure counts compared to other distributions.
Fitted proposed distribution and other competing models to a dataset representing epileptic seizure counts
Epileptic seizure (X)
Observed counts
Poisson Xgamma distribution
Poisson distribution
Zero inflated poisson
Geometric distribution
Negative binomial distribution
Discrete weibull
Discrete lindley
Discrete poisson quasi lindley
0
126
132.981
74.935
126.000
137.963
120.201
120.120
121.867
121.868
1
80
83.371
115.712
65.080
83.736
93.009
92.875
90.942
90.941
2
59
53.367
89.339
68.974
50.823
59.184
59.036
58.745
58.744
3
42
33.497
45.985
48.733
30.847
34.949
35.133
35.216
35.216
4
24
20.391
17.752
25.824
18.722
19.837
20.071
20.167
20.167
5
8
12.037
5.482
10.948
11.363
10.987
11.131
11.197
11.197
6
5
6.914
1.411
3.868
6.897
5.984
6.029
6.079
6.079
7
4
3.878
0.311
1.171
4.186
3.221
3.202
3.245
3.246
8
3
2.132
0.072
0.402
6.464
3.627
1.673
1.710
1.710
Degrees of freedom
6
4
4
6
5
5
5
5
-value
0.431
0.001
0.012
0.144
0.259
0.261
0.331
0.332
Furthermore, in order to compare our proposed distribution and other competing models above, we consider the criteria like AIC (Akaike information criterion), AICC (corrected Akaike information criterion) and BIC (Bayesian information criterion). The better distribution corresponds to lesser AIC, AICC and BIC values.
where is the number of parameters in the statistical model, n is the sample size and is the maximized value of the log-likelihood function under the considered model. From Table 5, it has been observed that the Poisson Xgamma distribution has the lesser AIC and BIC values as compared to other competing models. Hence we conclude that the Poisson Xgamma distribution leads to a better fit than the other competing models for analyzing the data set given in Table 2.
Model comparison criterion for fitted models to a dataset representing epileptic seizure counts
Criterion
Poisson Xgamma distribution
Poisson distribution
Zero inflated poisson
Geometric distribution
Negative binomial distribution
Discrete weibull
Discrete lindley
Discrete poisson quasi lindley
595.343
636.045
599.637
598.396
594.942
594.749
594.482
594.482
AIC
1192.687
1274.091
1203.274
1198.791
1193.884
1193.499
1192.964
1192.964
BIC
1196.547
1277.952
1210.996
1202.652
1201.605
1201.220
1200.685
1200.685
In the second data set, analyze the data regarding the distribution of accidents to 647 women working on high explosive shells in 5 weeks, studied by Ghitany and Al-Mutairi (2009). The data set is given in Table 6.
Accidents data of 647 women working on high explosive shells in 5 weeks
Number of accidents
0
1
2
3
4
5
Observed count
447
132
42
21
3
2
We also analyze the data in Table 6 using R software (R version 3.5.3, 2019). Parameter estimates along with standard errors in brackets and model function of the fitted distributions are given in Table 7. Computationally, the maximum likelihood estimates for the parameters of interest were obtained by Newton Raphson method.
Estimated Parameters by ML method for fitted distributions for dataset in Table 6
Distribution
Parameter Estimates (standard error)
Model function
Poisson Xgamma distribution
Poisson distribution
Zero inflated poisson
Geometric distribution
Negative binomial distribution
Discrete weibull
Discrete lindley
Poisson quasi-lindley
We compute the expected frequencies for fitting Poisson Xgamma and the compared distributions mentioned before to the data set given in Table 6 with the help of R software (R version 3.5.3, 2019). Pearson’s chi-square test is applied to check the goodness of fit of the models discussed. The expected counts and chi square -value for each fitted model are given in Table 8. Based on the chi-square -value, we observe that Poisson Xgamma distribution provides a satisfactorily better fit for the data set regarding distribution of accidents to 647 women working on high explosive shells in 5 weeks, studied by Ghitany and Al-Mutairi (2008).
Fitted probability models with expected frequencies and chi-square -value for dataset in Table 6
Number of accidents
Observed count
Poisson Xgamma
Poisson
Zero inflated poisson
Geometric
Negative binomial
Poisson lindley
Discrete weibull
Discrete lindley
Discrete poisson quasi lindley
0.00
447.00
442.56
406.31
447.00
441.57
445.89
439.45
445.54
441.57
441.56
1.00
132.00
138.52
189.03
124.60
140.20
134.90
142.76
135.36
140.20
140.21
2.00
42.00
44.85
43.97
54.95
44.52
43.99
44.97
44.00
44.52
44.52
3.00
21.00
14.47
6.82
16.15
14.13
14.69
13.85
14.62
14.13
14.14
4.00
3.00
4.58
0.79
3.56
4.49
4.96
4.19
4.92
4.49
4.49
5.00
2.00
1.42
0.07
0.63
1.42
1.69
1.25
1.68
1.42
1.43
Chi-square df
3
2
3
3
2
3
2
2
2
-value
0.302
0.001
0.024
0.191
0.157
0.087
0.194
0.093
0.091
Using AIC and BIC criterion for model comparison, we observed that the Poisson Xgamma distribution has lesser AIC and BIC values as compared to other competing models (see Table 9). Hence we conclude that the Poisson Xgamma distribution leads to a better fit than the other competing models for analyzing the data set given in Table 6.
Negative loglikelihood, AIC and BIC values for fitted models to data set in Table 6
Criterion
Poisson Xgamma
Poisson
Zero inflated poisson
Geometric
Negative binomial
Poisson lindley
Discrete weibull
Discrete lindley
Discrete poisson quasi lindley
592.222
617.184
593.272
592.480
592.267
592.708
592.2961
592.4798
592.4797
AIC
1186.445
1236.369
1190.544
1186.960
1188.534
1187.416
1188.592
1188.96
1188.95
BIC
1190.917
1240.841
1199.489
1191.432
1197.479
1191.888
1197.537
1197.904
1197.902
Conclusion
A new discrete probability model is introduced using compounding technique. Some important probabilistic properties and the problem of estimation of its parameters are studied. In addition, the discrete Poisson Xgamma distribution is appropriate for modeling both over and under dispersed data, beside equi-dispersion, since, depending on the values of the parameters, its variance can be larger or smaller than the mean, which is not the case with some known standard classical discrete distributions. Applications in handling count data are shown to signify the suitability of the proposed discrete probability model.
Footnotes
Acknowledgments
The authors sincerely thank two anonymous referees and a member of the editorial board for their valuable comments and suggestions that led to improvement of this article.
References
1.
AdilR.ZahoorA., & JanT. R. (2016). A new count data model with application in genetics and ecology. Electronic Journal of Applied Statistical Analysis, 9(1), 213-226.
2.
AltunE. (2019). A new model for over-dispersed count data: Poisson quasi-Lindley regression model. Mathematical Sciences, 1-7.
3.
BakouchH. S.JaziM. A., & NadarajahS. (2012). A new discrete distribution. Statistics: A Journal of Theoretical and Applied Statistics, 48(1), 200-240.
4.
ChakrabortyS. (2010). On some distributional properties of the family of weighted generalized Poisson distribution. Communications in Statistics–Theory and Methods, 39, 2767-2688.
5.
DubeyD. S. (1970). Compound gamma, beta and F distributions. Metrika, 16(1), 27-31.
6.
GerstenkornT. (1993). A compound of the generalized gamma distribution with the exponential one. Recherchessurles Deformations, 16(1), 5-10.
7.
GerstenkornT. (1996). A compound of the Polya distribution with the beta one. Random Oper and Stoch Equ, 4(2), 103-110.
8.
GhitanyM. E., & Al-MutairiD. K. (2009). Estimation methods for the discrete Poisson–Lindley distribution. Journal of Statistical Computation and Simulation, 79(1), 1-9. doi: 10.1080/00949650701550259
9.
GreenwoodM., & YuleG. U. (1920). An inquiry into the nature of frequency distribution representative of multiple happenings with particular reference to the occurrence of multiple attacks of disease or of repeated accidents. J Roy Stat Soc, 83, 255-279.
10.
GuptaR. C., & OngS. H. (2004). A new generalization of the negative binomial distribution. Journal of Computational Statistics and Data Analysis, 45, 287-300.
11.
LindelyD. V. (1958). Fiducial distributions and bayes theorem. Journal of the Royal Statistical Society, 20(1), 120-107.
12.
MahmoudiE., & ZakerzadehH. (2010). Generalized poisson-lindely distribution. Communications in Statistics-Theory and Methods, 39(10), 1785-1798.
13.
NakagawaT., & OsakiS. (1975). The discrete weibull distribution. IEEE Transactions on Reliability, 24(5), 300-301.
14.
ParaB. A., & JanT. R. (2018). An advanced discrete model with applications in medical science. Journal of Multiscale Modelling, 9(1), DOI: 10.1142/S1756973718500014, ISSN: 1756-9737(print).
15.
ParaB. A., & JanT. R. (2018). Discrete inverse weibull beta model: Properties and applications in health science. Pakistan Journal of Statistics, 34(3), 229-349.
16.
PlackettR. L. (1953). The truncated Poisson distribution. Biometrics, 9(4), 485-488.
17.
R Core Team (2019). R version 3.5.3: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
18.
SenS.MaitriS. S., & ChandraN. (2016). The xgamma distribution: Statistical properties and application. Journal of Modern Applied Statistical Methods, 15(1), 774-788.
19.
SkellerJ. G. (1948). A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. Journal of the Royal Statistical Society, Series B, 10, 257-261.
20.
ZamaniH., & IsmailN. (2010). Negative binomial-lindley distribution and its application. Journal of Mathematics and Statistics, 1, 4-9.