Using the Box-Cox family of distributions to model censored data:a distributional regression approach
Main Article Content
Abstract
The study of the expected time until an event of interest is a recurring topic in different fields, such as medical, economics and engineering. The Kaplan-Meier method and the Cox proportional hazards model are the most used methodologies to deal with such kind of data. Nevertheless, in recent years, the generalised additive models for location, scale and shape (GAMLSS) models – which can be seen as distributional regression and/or beyond the mean regression models – have been standing out as a result of its highly flexibility and ability to fit complex data. GAMLSS are a class of semi-parametric regression models, in the sense that they assume a distribution for the response variable, and any and all of its parameters can be modelled as linear and/or non-linear functions of a set of explanatory variables. In this paper, we present the Box-Cox family of distributions under the distributional regression framework as a solid alternative to model censored data.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
Akaike, H. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–723 (1974).
Alizadeh, M, Ramires, T. G., MirMostafaee, S.-K., Samizadeh, M & Ortega, E. M. M. A new useful four-parameter extension of the Gumbel distribution: Properties, regression model and applications using the GAMLSS framework. Communications in Statistics – Simulation and Computation 48, 1746–1767 (2019).
Buuren, S & Fredriks, M. Worm plot: a simple diagnostic device for modelling growth reference curves. Statistics in Medicine 20, 1259–1277 (2001).
Castro, M, Cancho, V. G. & Rodrigues, J. A hands-on approach for fitting long-term survival models under the GAMLSS framework. Computer Methods and Programs in Biomedicine 97, 168–177 (2010).
Cole, T. J. & Green, P. J. Smoothing reference centile curves: the lms method and penalized likelihood. Statistics in Medicine 11, 1305–1319 (1992).
Cox, D. R. Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B (Methodological) 34, 187–202 (1972).
Dunn, P. K. & Smyth, G. K. Randomized quantile residuals. Journal of Computational and Graphical Statistics 5, 236–244 (1996).
Emmert-Streib, F. & Dehmer, M. Introduction to Survival Analysis in Practice. Machine Learning and Knowledge Extraction 1, 1013–1038 (2019).
Fabrizi, F., Donato, F. M. & Messa, P. Association Between Hepatitis B Virus and Chronic Kidney Disease: a Systematic Review and Meta-analysis. Annals of Hepatology 16, 21–47 (2017).
Gijbels, I. Censored data. Wiley Interdisciplinary Reviews: Computational Statistics 2, 178–188 (2010).
Hastie, T. J. & Tibshirani, R. J. Generalized Additive Models (Chapman and Hall/CRC, 1990).
Heller, G. Z., Robledo, K. P. & Marschner, I. C. Distributional regression in clinical trials: treatment effects on parameters other than the mean. BMC Medical Research Methodology 22, 56 (2022).
Kaplan, E. L. & Meier, P. Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association 53, 457–481 (1958).
Kneib, T. Beyond mean regression. Statistical Modelling 13, 275–303 (2013).
Lee, J. D., Sun, D. L., Sun, Y & Taylor, J. E. Exact post-selection inference, with application to the lasso. The Annals of Statistics 44, 907–927 (2016).
Nakamura, L. R., Cerqueira, P. H. R., Ramires, T. G., Pescim, R. R., Rigby, R. A. & Stasinopoulos, D. M. A new continuous distribution on the unit interval applied to modelling the points ratio of football teams. Journal of Applied Statistics 46, 416–431 (2019).
Nakamura, L. R., Rigby, R. A., Stasinopoulos, D. M., Leandro, R. A., Villegas, C & Pescim, R. R. Modelling location, scale and shape parameters of theBirnbaum-Saunders generalized t distribution. Journal of Data Science 15, 221–238 (2017).
Nelder, J. A. & Wedderburn, R. W. M. Generalized Linear Models. Journal of the Royal Statistical Society: Series A (General) 135, 370–384 (1972).
R Core Team. R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing (Vienna, Austria, 2022). https://www.R-project.org/.
Ramires, T. G., Nakamura, L. R., Righetto, A. J., Carvalho, R. J., Vieira, L. A. & Pereira, C. A. B. Comparison between highly complex location models and GAMLSS. Entropy 23, 469 (2021a).
Ramires, T. G., Nakamura, L. R., Righetto, A. J., Ortega, E. M. M. & Cordeiro, G. M. Predicting survival function and identifying associated factors in patients with renal insufficiency in the metropolitan area of Maringá, Paraná State, Brazil. Cadernos de Saúde Pública 34, e00075517 (2018).
Ramires, T. G., Nakamura, L. R., Righetto, A. J., Pescim, R. R., Mazucheli, J & Cordeiro, G. M. A new semiparametric Weibull cure rate model: fitting different behaviors within GAMLSS. Journal of Applied Statistics 46, 2744–2760 (2019).
Ramires, T. G., Nakamura, L. R., Righetto, A. J., Pescim, R. R., Mazucheli, J, Stasinopoulos, D. M. & Rigby, R. A. Validation of stepwise-based procedure in GAMLSS. Journal of Data Science 19, 96–110 (2021b).
Rigby, R. A. & Stasinopoulos, D. M. Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society: Series C (Applied Statistics) 54, 507–554 (2005).
Rigby, R. A. & Stasinopoulos, D. M. Smooth centile curves for skew and kurtotic data modelled using the Box-Cox power exponential distribution. Statistics in Medicine 23, 3053–3076 (2004).
Rigby, R. A. & Stasinopoulos, D. M. Using the Box-Cox t distribution in GAMLSS to model skewness and kurtosis. Statistical Modelling 6, 209–229 (2006).
Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z. & De Bastiani, F. Distributions for Modeling Location, Scale, and Shape: Using GAMLSS in R (CRC Press, 2019).
Schwarz, G. Estimating the Dimension of a Model. The Annals of Statistics 6, 461–464 (1978).
Sprangers, B, Nair, V, Launay-Vacher, V, Riella, L. V. & Jhaveri, K. D. Risk factors associated with post–kidney transplant malignancies: an article from the Cancer-Kidney International Network. Clinical Kidney Journal 11, 315–329 (2018).
Stasinopoulos, D. M. & Rigby, R. A. Generalized additive models for location, scale and shape (GAMLSS) in R. Journal of Statistical Software 23, 1–46 (2007).
Stasinopoulos, D. M., Rigby, R. A., Heller, G. Z., Voudouris, V & De Bastiani, F. Flexible Regression and Smoothing: Using GAMLSS in R (CRC Press, 2017).
Tangri, N. et al. Multinational Assessment of Accuracy of Equations for Predicting Risk of Kidney Failure: A Meta-analysis. The Journal of the American Medical Association 315, 164–174 (2016).