Counting models for overdispersed data: A review with application to tuberculosis data
Main Article Content
Abstract
The present work reviews distributions for counting data: Poisson; Negative Binomial; COM-Poisson and Generalized Poisson, and their regression models. Aspects such as parameter estimation and model choice criteria are presented. And as an application example, we use the regression models of these distributions to explain the relationship between tuberculosis notifications with the HDI Human Development Index of the 102 cities in the state of Alagoas. The existing relationship between notifications of tuberculosis with HDI is significant and overdispersion at the level α = 5% of probability, and the COM-Poisson distribution regression model was the best fit data, according to the Akaike AIC and Bayesian BIC information criteria.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
Akaike, H. A new look at the statistical model identification. IEEE transactions on automatic control 19, 716–723 (1974).
Alves, E. J. Métodos de bootstrap e aplicações em problemas biológicos MA thesis (Universidade Estadual Paulista (Unesp), 2013).
Anastasopoulos, P. C. & Mannering, F. L. A note on modeling vehicle accident frequencies with random parameters count models. Accident Analysis & Prevention 41, 153–159 (2009).
Atkins, D. C. & Gallop, R. J. Rethinking how family researchers model infrequent outcomes: a tutorial on count regression and zero-inflated models. Journal of Family Psychology 21, 726 (2007).
Barlow, R. E. & Proschan, F. Mathematical theory of reliability, SIAM (1996).
Barriga, G. D.&Louzada, F. The zero-inflatedConway–Maxwell–Poisson distribution: Bayesian inference, regression modeling and influence diagnostic. Statistical Methodology 21, 23–34 (2014).
Bartko, J. J. The negative binomial distribution. PhD thesis. Virginia Polytechnic Institute (1960).
Boatwright, P., Borle, S., Kadane, J. B., Minka, T. P. & Shmueli, G. Conjugate analysis of the Conway-Maxwell Poisson distribution. Bayesian analysis 1, 363–374 (2006).
BRASIL. Ministério da Saúde Banco de dados do Sistema Único de Saúde - DATASUS Acessado jun. 2021. 2021. http://www2.datasus.gov.br/DATASUS/index.php?area=0206&id=8065372&VObj=http://tabnet.datasus.gov.br/.
Cameron, A. C. & Trivedi, P. K. Regression analysis of count data (Cambridge university press, 2013).
Carvalho, F. J., Santana, D. G. d. & Araújo, L. B. d. Why analyze germination experiments using Generalized Linear Models? Journal of Seed Science 40, 281–287 (2018).
Carvalho, F. J. et al. Modelos lineares generalizados na agronomia: análise de dados binomiais e de contagem, zeros inflacionados e enfoque bayesiano. PhD thesis. Universidade Federal de Uberlândia (2019).
Consul, P. C. Generalized Poisson distributions: properties and applications (M. Dekker, 1989).
Conway, R. W. & Maxwell, W. L. A queuing model with state dependent service rates. Journal of Industrial Engineering 12, 132–136 (1962).
De Saúde do Estado de Alagoas, S. Anuário Estatístico do Estado de Alagoas Acesso jun. 2021. (2017). https://dados.al.gov.br/catalogo/dataset/anuario-estatistico-do-estado-de-alagoas.
De Saúde Perfil dos Municípios Alagoanos, D. Casos confirmados de doenças de notificações compulsórias de 2013 a 2020 Acesso em: 12 junho 2022. 2022. https://dados.al.gov.br/catalogo/dataset/dados-de-saude-perfil-municipal/resource/68ba2469-0cdf-453c-b750-d6fa2c854b19.
Desjardins, C. D. Modeling zero-inflated and overdispersed count data: An empirical study of school suspensions. The Journal of Experimental Education 84, 449–472 (2016).
Efron, B. The 1977 RIETZ lecture. The Annals of Statistics 7, 1–26 (1979).
Efron, B. & Tibshirani, R. J. An introduction to the bootstrap, CRC press (1994).
Guikema, S. D. & Goffelt, J. P. A flexible count data regression model for risk analysis. Risk Analysis: An International Journal 28, 213–223 (2008).
Hall, D. B. Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics 56, 1030–1039 (2000).
Hayati, M, Sadik, K & Kurnia, A. Conwey-Maxwell Poisson distribution: approach for over-andunder- dispersed count data modelling in IOP Conference Series: Earth and Environmental Science 187 (2018).
Hilbe, J. M. Negative binomial regression. Cambridge University Press (2011).
Hinde, J. & Demétrio, C. G. Overdispersion: models and estimation. Computational statistics & data analysis 27, 151–170 (1998).
Huang, A. Mean-parametrized Conway–Maxwell–Poisson regression models for dispersed counts. Statistical Modelling 17, 359–380 (2017).
IBGE Disponível em: https://cidades.ibge.gov.br/brasil/al/maceio/panorama. Acesso em: 12 junho 2022 (2022).
Kehler, A. D. Performance of dependent bootstrap confidence intervals for generalized Gamma means PhD thesis (The University of Regina (Canada), 2018).
Khan, A., Ullah, S. & Nitz, J. Statistical modelling of falls count data with excess zeros. Injury prevention 17, 266 270 (2011).
Lambert, D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34, 1–14 (1992).
Liu, Y. & Tian, G.-L. Type I multivariate zero-inflated Poisson distribution with applications. Computational Statistics & Data Analysis 83, 200–222 (2015).
Lord, D., Guikema, S. D. & Geedipally, S. R. Application of the Conway–Maxwell–Poisson generalized linear model for analyzing motor vehicle crashes. Accident Analysis & Prevention 40, 1123–1134 (2008).
Nelder, J. A. & Wedderburn, R. W. Generalized linear models. Journal of the Royal Statistical Society: Series A (General) 135, 370–384 (1972).
Park, B.-J. & Lord, D. Application of finite mixture models for vehicle crash data analysis. Accident Analysis & Prevention 41, 683–691 (2009).
Paula, G. A. Modelos de regressão: com apoio computacional (IME-USP São Paulo, 2004).
Ribeiro Junior, E. E. Contributions to the analysis of dispersed count data MA thesis (Universidade de São Paulo, 2019).
Ridout,M., Hinde, J. & Demétrio, C. G. A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives. Biometrics 57, 219–223 (2001).
Ridout, M. S. & Besbeas, P. An empirical model for underdispersed count data. Statistical Modelling 4, 77–89 (2004).
Ripley, B., Venables, B., Bates, D. M., Hornik, K., Gebhardt, A., Firth, D. & Ripley, M. B. Package ‘mass’. Cran r 538, 113–120 (2013).
Santana, R. A., Conceição, K. S., Diniz, C. A. & Andrade, M. G. Type I multivariate zeroinflated COM–Poisson regression model. Biometrical Journal 64, 481–505 (2022).
Santana, R. A. Modelos multivariados para dados de contagem com excesso de zeros PhD thesis (Universidade Federal de São Carlos, 2019).
Schwarz, G. Estimating the dimension of a model. The annals of statistics, 461–464 (1978).
SciELO. Scientific Electronic Library Online Acesso mar. 2022. 2022. https://search.scielo.org/?q=Poisson&lang=pt&count=15&from=1&output=site&sort=&format=summary&page=1&where=&filter%5Bsubject_area%5D%5B%5D=Agricultural+Sciences.
SciELO. Scientific Electronic Library Online Acesso mar. 2022. 2022. https://search.scielo.org/?q=Generalized+Poisson+distribution&lang=pt&count=30&from=1&output=site&sort=&format=summary&fb=&page=1&filter%5Bsubject_area%5D%5B%5D=Health+Sciences&q=*&lang=pt&page=1.
Sellers, K., Lotze, T., Raim, A. & Raim, M. A. Package ‘COMPoissonReg’. Package “COMPoissonReg(2019).
Sellers, K. F. & Morris, D. S. Underdispersion models: Models that are “under the radar”. Communications in Statistics-Theory and Methods 46, 12075–12086 (2017).
Sellers, K. F. & Shmueli, G. A flexible regression model for count data. The Annals of Applied Statistics, 943–961 (2010).
Sellers, K. F. & Shmueli, G. Data dispersion: now you see it. . . now you don’t. Communications in Statistics-Theory and Methods 42, 3134–3147 (2013).
Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S. & Boatwright, P. A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics) 54, 127–142 (2005).
Tabnet Disponível em: http://tabnet.datasus.gov.br/cgi/tabcgi.exe?ibge/censo/cnv/rendaal. Acesso em: 24 agosto 2022. (2022)
Team, R. C. et al. R: A language and environment for statistical computing (2013).
Yee, T. VGAM: Vector generalized linear and additive models (2017).