Ordinal data and residual analysis: Review and application

Main Article Content

Patrícia Peres Araripe
https://orcid.org/0000-0003-0966-4463
Idemauro Antonio Rodrigues de Lara
https://orcid.org/0000-0002-1172-9855

Abstract

Experiments in which the response is ordinal polytomous are often performed in the agricultural sciences and, often, the cumulative logit models are used to analyze this variable. A particular characteristic is that the polytomous variables are objects of multivariate statistics and the ordinary residual, associated with the classical models available, is a vector for each subject. Consequently, these residuals are not easily interpreted, and their distribution is unknown. Residual analysis is an essential step in validating any statistical model, and not performing it can allow a model to incorrectly fit the data, resulting in erroneous conclusions and inferences. In this context, the work aims to review the residuals for ordinal data available in the literature, emphasizing the so-called surrogate residuals with continuous distribution. As a practical
application, it is present an experiment carried out with Tambaqui fish of different types of genotype. The response variable in this study is the severity of the lesions found in the livers of Tambaquis. The estimation of the parameters was performed using the maximum likelihood. The selected model by the likelihood ratio test included the proportional odds and fish genotype effect. According to this model, it was possible to verify in this study that fish with genotype 122 presented a higher probability of liver lesion classified as irreversible (71, 26%), while Tambaquis with genotype 130 had a higher probability of moderate lesion, 46, 75%. For the model diagnostics, the half-normal plot and the Kolmogorov-Smirnov test were used to examine the performance of the surrogate residual. The results obtained provided evidence of the adequacy of the selected model since the residuals did not reveal patterns or influential points in diagnostic tools.

Article Details

How to Cite
Peres Araripe, P., & Lara, I. A. R. de. (2023). Ordinal data and residual analysis: Review and application. Brazilian Journal of Biometrics, 41(3), 287–310. https://doi.org/10.28951/bjb.v41i3.637
Section
Articles

References

Agresti, A. An introduction to categorical data analysis 2nd ed., 394 (JohnWiley & Sons, Hoboken, New Jersey, 2007).

Agresti, A. Analysis of ordinal categorical data 2nd ed., 405 (John Wiley & Sons, Nova Jersey, 2010).

Agresti, A. An introduction to categorical data analysis 3rd ed., 375 (John Wiley & Sons, Nova Jersey, 2002).

Ananth, C. V. & Kleinbaum, D. G. Regression models for ordinal responses: a review of methods and applications. International journal of epidemiology 26, 1323–1333 (1997).

Arbogast, P. G. & Lin, D. Model-checking techniques for stratified case-control studies. Statistics in medicine 24, 229–247 (2005).

Arnold, T. B. & Emerson, J. W. Nonparametric goodness-of-fit tests for discrete null distributions. R Journal 3 (2011).

Atkinson, A. C. Plots, transformations and regression; an introduction to graphical methods of diagnostic regression analysis tech. rep. (1985).

Bilder, C. R. & Loughin, T. M. Analysis of categorical data with R 1st ed., 547 (Chapman and Hall/CRC Press, Boca Raton, 2014).

Christensen, R. H. B. ordinal: Regression models for ordinal data. R package version 28, 56 (2013).

Cook, R. D. Detection of influential observation in linear regression. Technometrics 19, 15–18 (1977).

Cook, R. D. &Weisberg, S. Residuals and influence in regression 248 (New York: Chapman and Hall, 1982).

Correa, R. O., Souza, A. R. B. & Martins Junior, H. Criação de tambaquis. Embrapa Amazônia Oriental Fôlder/Folheto/Cartilha (INFOTECA-E) (2018).

Dufour, J. M., Farhat, A., Gardiol, L. & Khalaf, L. Simulation-based finite sample normality tests in linear regressions. The Econometrics Journal 1, 154–173 (1998).

Efron, B. Bootstrap method: another look at the jackknife. The annals of statistics 7, 1–26 (1979).

Faraway, J. J. Extending the linear model with R: generalized linear, mixed effects and onparametric regression models 399 (CRC press, 2016).

Giolo, S. R. Introdução à análise de dados categóricos com aplicações 1st ed., 256 (Editora Blucher, São Paulo, 2017).

Greenwell, B. M., McCarthy, A., Boehmke, B. C. & Liu, D. Residuals and Diagnostics for Binary and Ordinal Regression Models: An Introduction to the sure Package. The R Journal 10, 381–394 (2018).

Junior, D. L. & Veiga, R. D. Análise de diagnóstio em modelos de regressão normal e logística. Brazilian Journal of Biometrics 38, 449–482 (2020).

Kolmogorov, A. Sulla determinazione empirica di una lgge di distribuzione. Giornali dell’Istituto Italiano degli Attuari 4, 83–91 (1933).

Lemos, T. D. O., Rodrigues, M. D. C. P., De Lara, I. A. R., De Araújo, A. M. S., De Lemos, T. L. G., Pereira, A. L. F. & De Paula, L. V. T. Modeling the acceptability of cashew apple nectar brands using the proportional odds model. Journal of Sensory Studies 30, 136–144 (2015).

Li, C. & Shepherd, B. E. A new residual for ordinal outcomes. Biometrika 99, 473–480 (2012).

Liu, D. & Zhang, H. Residuals and diagnostics for ordinal regression models: A surrogate approach. Journal of the American Statistical Association 113, 845–854 (2018).

Liu, I., Mukherjee, B., Suesse, T., Sparrow, D. & Park, S. K. Graphical diagnostics to check model misspecification for the proportional odds regression model. Statistics in medicine 28, 412–429 (2009).

Lopes, I. G., De Oliveira, R. G. & Ramos, F. M. Perfil do consumo de peixes pela população brasileira. Biota Amazônia (Biote Amazonie, Biota Amazonia, Amazonian Biota) 6, 62–65 (2016).

Marques, M. F. Associaçãao de polimorfismo microssatélite no gene GH em Tambaqui (Colossoma macropomum) com caracteríssticas fenotípicas e expressão gênica PhD thesis (Universidade de São Paulo, 2018).

May, W. L. & Johnson, W. D. Properties of simultaneous confidence intervals for multinomial proportions. Communications in Statistics-Simulation and Computation 26, 495–518 (1997).

McCullagh, P. Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological) 42, 109–127 (1980).

McCullagh, P. & Nelder, J. Generalized Linear Models 2nd ed., 375 (Chapman and Hall, London, 1989).

Moral, R. A., Hinde, J. & Demétrio, C. G. B. Half-normal plots and overdispersed models in R: the hnp package. Journal of Statistical Software 81, 1–23 (2017).

Ng, K. W., Tian, G. L. & Tang, M. L. Dirichlet and related distributions: Theory, methods and applications 1st ed., 336 (JohnWiley & Sons, 2011).

Paula, G. A. Modelos de regressão: com apoio computacional (IME-USP São Paulo, 2013).

Peterson, B. & Harrell Jr, F. E. Partial proportional odds models for ordinal response variables. Journal of the Royal Statistical Society: Series C (Applied Statistics) 39, 205–217 (1990).

R Core Team, R: A language and environment for statistical computing (2020).

Reiter, J. P. & Kohnen, C. N. Categorical data regression diagnostics for remote access servers. Journal of Statistical Computation and Simulation 75, 889–903 (2005).

Shapiro, S. S. & Wilk, M. B. An analysis of variance test for normality (complete samples). Biometrika 52, 591–611 (1965).

Silva, J. A. P. Métodos de diagnóstico em modelos logísticos trinomiais Dissertação (Mestrado em Estatística)(Universidade de São Paulo, 2003), 100p.

Simonoff, J. S. Analyzing categorical data 1st ed., 496 (Springer, 2003).

Singer, J. M., Rocha, F. M. M & Nobre, J. S. Graphical tools for detecting departures from linear mixed model assumptions and some remedial measures. International Statistical Review 85, 290–324 (2017).

Souza, E. C. Análise de influência local no modelo de regressão logística - thesis ESALQ/Universidade de São Paulo, 2006), 102.

Turkman, M. A. A. & Silva, G. L. Modelos Lineares Generalizados–da teoria à prática. Sociedade Portuguesa de Estatística, Lisboa, 153 (2000).

Tutz, G. Regression for categorical data (Cambridge University Press, Cambridge, 2011).

Williams, O. D. & Grizzle, J. E. Analysis of contingency tables having ordered response categories. Journal of the American Statistical Association 67, 55–63 (1972).

Most read articles by the same author(s)