Ordinal data and residual analysis: Review and application
Main Article Content
Abstract
Experiments in which the response is ordinal polytomous are often performed in the agricultural sciences and, often, the cumulative logit models are used to analyze this variable. A particular characteristic is that the polytomous variables are objects of multivariate statistics and the ordinary residual, associated with the classical models available, is a vector for each subject. Consequently, these residuals are not easily interpreted, and their distribution is unknown. Residual analysis is an essential step in validating any statistical model, and not performing it can allow a model to incorrectly fit the data, resulting in erroneous conclusions and inferences. In this context, the work aims to review the residuals for ordinal data available in the literature, emphasizing the so-called surrogate residuals with continuous distribution. As a practical
application, it is present an experiment carried out with Tambaqui fish of different types of genotype. The response variable in this study is the severity of the lesions found in the livers of Tambaquis. The estimation of the parameters was performed using the maximum likelihood. The selected model by the likelihood ratio test included the proportional odds and fish genotype effect. According to this model, it was possible to verify in this study that fish with genotype 122 presented a higher probability of liver lesion classified as irreversible (71, 26%), while Tambaquis with genotype 130 had a higher probability of moderate lesion, 46, 75%. For the model diagnostics, the half-normal plot and the Kolmogorov-Smirnov test were used to examine the performance of the surrogate residual. The results obtained provided evidence of the adequacy of the selected model since the residuals did not reveal patterns or influential points in diagnostic tools.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
Agresti, A. An introduction to categorical data analysis 2nd ed., 394 (JohnWiley & Sons, Hoboken, New Jersey, 2007).
Agresti, A. Analysis of ordinal categorical data 2nd ed., 405 (John Wiley & Sons, Nova Jersey, 2010).
Agresti, A. An introduction to categorical data analysis 3rd ed., 375 (John Wiley & Sons, Nova Jersey, 2002).
Ananth, C. V. & Kleinbaum, D. G. Regression models for ordinal responses: a review of methods and applications. International journal of epidemiology 26, 1323–1333 (1997).
Arbogast, P. G. & Lin, D. Model-checking techniques for stratified case-control studies. Statistics in medicine 24, 229–247 (2005).
Arnold, T. B. & Emerson, J. W. Nonparametric goodness-of-fit tests for discrete null distributions. R Journal 3 (2011).
Atkinson, A. C. Plots, transformations and regression; an introduction to graphical methods of diagnostic regression analysis tech. rep. (1985).
Bilder, C. R. & Loughin, T. M. Analysis of categorical data with R 1st ed., 547 (Chapman and Hall/CRC Press, Boca Raton, 2014).
Christensen, R. H. B. ordinal: Regression models for ordinal data. R package version 28, 56 (2013).
Cook, R. D. Detection of influential observation in linear regression. Technometrics 19, 15–18 (1977).
Cook, R. D. &Weisberg, S. Residuals and influence in regression 248 (New York: Chapman and Hall, 1982).
Correa, R. O., Souza, A. R. B. & Martins Junior, H. Criação de tambaquis. Embrapa Amazônia Oriental Fôlder/Folheto/Cartilha (INFOTECA-E) (2018).
Dufour, J. M., Farhat, A., Gardiol, L. & Khalaf, L. Simulation-based finite sample normality tests in linear regressions. The Econometrics Journal 1, 154–173 (1998).
Efron, B. Bootstrap method: another look at the jackknife. The annals of statistics 7, 1–26 (1979).
Faraway, J. J. Extending the linear model with R: generalized linear, mixed effects and onparametric regression models 399 (CRC press, 2016).
Giolo, S. R. Introdução à análise de dados categóricos com aplicações 1st ed., 256 (Editora Blucher, São Paulo, 2017).
Greenwell, B. M., McCarthy, A., Boehmke, B. C. & Liu, D. Residuals and Diagnostics for Binary and Ordinal Regression Models: An Introduction to the sure Package. The R Journal 10, 381–394 (2018).
Junior, D. L. & Veiga, R. D. Análise de diagnóstio em modelos de regressão normal e logística. Brazilian Journal of Biometrics 38, 449–482 (2020).
Kolmogorov, A. Sulla determinazione empirica di una lgge di distribuzione. Giornali dell’Istituto Italiano degli Attuari 4, 83–91 (1933).
Lemos, T. D. O., Rodrigues, M. D. C. P., De Lara, I. A. R., De Araújo, A. M. S., De Lemos, T. L. G., Pereira, A. L. F. & De Paula, L. V. T. Modeling the acceptability of cashew apple nectar brands using the proportional odds model. Journal of Sensory Studies 30, 136–144 (2015).
Li, C. & Shepherd, B. E. A new residual for ordinal outcomes. Biometrika 99, 473–480 (2012).
Liu, D. & Zhang, H. Residuals and diagnostics for ordinal regression models: A surrogate approach. Journal of the American Statistical Association 113, 845–854 (2018).
Liu, I., Mukherjee, B., Suesse, T., Sparrow, D. & Park, S. K. Graphical diagnostics to check model misspecification for the proportional odds regression model. Statistics in medicine 28, 412–429 (2009).
Lopes, I. G., De Oliveira, R. G. & Ramos, F. M. Perfil do consumo de peixes pela população brasileira. Biota Amazônia (Biote Amazonie, Biota Amazonia, Amazonian Biota) 6, 62–65 (2016).
Marques, M. F. Associaçãao de polimorfismo microssatélite no gene GH em Tambaqui (Colossoma macropomum) com caracteríssticas fenotípicas e expressão gênica PhD thesis (Universidade de São Paulo, 2018).
May, W. L. & Johnson, W. D. Properties of simultaneous confidence intervals for multinomial proportions. Communications in Statistics-Simulation and Computation 26, 495–518 (1997).
McCullagh, P. Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological) 42, 109–127 (1980).
McCullagh, P. & Nelder, J. Generalized Linear Models 2nd ed., 375 (Chapman and Hall, London, 1989).
Moral, R. A., Hinde, J. & Demétrio, C. G. B. Half-normal plots and overdispersed models in R: the hnp package. Journal of Statistical Software 81, 1–23 (2017).
Ng, K. W., Tian, G. L. & Tang, M. L. Dirichlet and related distributions: Theory, methods and applications 1st ed., 336 (JohnWiley & Sons, 2011).
Paula, G. A. Modelos de regressão: com apoio computacional (IME-USP São Paulo, 2013).
Peterson, B. & Harrell Jr, F. E. Partial proportional odds models for ordinal response variables. Journal of the Royal Statistical Society: Series C (Applied Statistics) 39, 205–217 (1990).
R Core Team, R: A language and environment for statistical computing (2020).
Reiter, J. P. & Kohnen, C. N. Categorical data regression diagnostics for remote access servers. Journal of Statistical Computation and Simulation 75, 889–903 (2005).
Shapiro, S. S. & Wilk, M. B. An analysis of variance test for normality (complete samples). Biometrika 52, 591–611 (1965).
Silva, J. A. P. Métodos de diagnóstico em modelos logísticos trinomiais Dissertação (Mestrado em Estatística)(Universidade de São Paulo, 2003), 100p.
Simonoff, J. S. Analyzing categorical data 1st ed., 496 (Springer, 2003).
Singer, J. M., Rocha, F. M. M & Nobre, J. S. Graphical tools for detecting departures from linear mixed model assumptions and some remedial measures. International Statistical Review 85, 290–324 (2017).
Souza, E. C. Análise de influência local no modelo de regressão logística - thesis ESALQ/Universidade de São Paulo, 2006), 102.
Turkman, M. A. A. & Silva, G. L. Modelos Lineares Generalizados–da teoria à prática. Sociedade Portuguesa de Estatística, Lisboa, 153 (2000).
Tutz, G. Regression for categorical data (Cambridge University Press, Cambridge, 2011).
Williams, O. D. & Grizzle, J. E. Analysis of contingency tables having ordered response categories. Journal of the American Statistical Association 67, 55–63 (1972).