We present motivation and new commands for modeling heaped count data. These data may appear when subjects report counts that are rounded or favor multiples (digit preference) of a certain outcome, such as the number of cigarettes reported. The new commands for fitting count regression models (Poisson, generalized Poisson, negative binomial) are also accompanied by real-world examples comparing the heaped regression model with the usual regression model as well as the heaped zero-inflated model with the usual zero-inflated model.
ChannonA. A. R., PadmadasS. S., and McDonaldJ. W.2011. Measuring birth weight in developing countries: Does the method of reporting in retrospective surveys matter?Maternal and Child Health Journal15: 12–18.
2.
ConsulP. C.1989. Generalized Poisson Distributions: Properties and Applications.New York: Dekker.
3.
ConsulP. C., and FamoyeF.1992. Generalized Poisson regression model. Communications in Statistics—Theory and Methods21: 89–109.
4.
DesmaraisB. A., and HardenJ. J.2013. Testing for zero inflation in count models: Bias correction for the Vuong test. Stata Journal13: 810–835.
5.
HardinJ. W., and HilbeJ. M.2012. Generalized Linear Models and Extensions. 3rd ed. College Station, TX: Stata Press.
6.
HarrisT., YangZ., and HardinJ. W.2012. Modeling underdispersed count data with generalized Poisson regression. Stata Journal12: 736–747.
7.
KlesgesR. C., DebonM., and RayJ. W.1995. Are self-reports of smoking rate biased? Evidence from the Second National Health and Nutrition Examination Survey. Journal of Clinical Epidemiology48: 1225–1233.
8.
LawlessJ. F.1987. Negative binomial and mixed Poisson regression. Canadian Journal of Statistics15: 209–225.
9.
Lewis-EsquerreJ. M., ColbyS. M., TevyawT. O., EatonC. A., KahlerC. W., and MontiP. M.2005. Validation of the timeline follow-back in the assessment of adolescent smoking. Drug and Alcohol Dependence79: 33–43.
10.
McLainA. C., SundaramR., ThomaM., and Buck LouisG. M.2014. Semiparametric modeling of grouped current duration data with preferential reporting. Statistics in Medicine33: 3961–3972.
11.
NietertP. J., WessellA. M., FeiferC., and OrnsteinS. M.2006. Effect of terminal digit preference on blood pressure measurement and treatment in primary care. American Journal of Hypertension19: 147–152.
12.
PardeshiG. S.2010. Age heaping and accuracy of age data collected during a community survey in the Yavatmal district, Maharashtra. Indian Journal of Community Medicine35: 391–395.
13.
RidoutM., DemétrioC. G. B., and HindeJ.1998. Models for count data with many zeros. In Proceedings of the XIXth International Biometric Conference, 179–192. Cape Town: The International Biometric Society.
14.
RidoutM. S., and MorganB. J. T.1991. Modelling digit preference in fecundability studies. Biometrics47: 1423–1433.
15.
RobertsJ. M., and BrewerD. D.2001. Measures and tests of heaping in discrete quantitative distributions. Journal of Applied Statistics28: 887–896.
16.
VuongQ. H.1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica57: 307–333.
17.
WangH., and HeitjanD. F.2008. Modeling heaping in self-reported cigarette counts. Statistics in Medicine27: 3789–3804.
18.
WolffJ., and AugustinT.2003. Heaping and its consequences for duration analysis: A simulation study. Allgemeines Statistisches Archiv87: 59–86.