EXPONENTIATED DISCRETE WEIBULL DISTRIBUTION FOR CENSORED DATA

This paper further develops the statistical inference procedure of the exponentiated discrete Weibull distribution (EDW) for data with the presence of censoring. This generalization of the discrete Weibull distribution has the advantage of being suitable to model non-monotone failure rates, such as those with bathtub and unimodal distributions. Inferences about EDW distribution are presented using both frequentist and bayesian approaches. In addition, the classical Likelihood Ratio Test and a Full Bayesian Significance Test (FBST) were performed to test the parameters of EDW distribution. The method presented is applied to simulated data and illustrated with a real dataset regarding patients diagnosed with head and neck cancer.


Introduction
The Weibull distribution is widely used to model survival data. This popularity is due to its versatility and relative simplicity (RINNE, 2008). The Weibull distribution and its generalizations have been applied over the years in survival analysis when the data are continuous. However, in some cases the data are discrete. This can arise, for example, when the survival time is measured in months, cycles or counts. It is not always acceptable to use a continuous model to analyze discrete data. To overcome this problem, Nakagawa and Osaki (1975) introduced the discrete Weibull distribution (DW), which models discrete data with monotone (increasing or decreasing) failure rates (more details on the discretization of Weibull distribution can be seen in Vila et al. (2019)). Nevertheless, the Nakagawa and Osaki's DW distribution does not have a reasonable parametric fit for phenomena that present non-monotone failure rates, such as those with bathtub or unimodal distribution.
In recent years, new families of distributions for continuous data have been proposed based on modifications of the Weibull distribution to deal with bathtubshaped or unimodal failure rates. An example is the exponentiated Weibull distribution introduced by Mudholkar and Srivastava (1993), which is part of the family of exponentiated distributions introduced by Gupta and Kundu (2001). An analog for discrete data of the exponentiated Weibull distribution was introduced by Nekoukhou and Bidram (2015). However, inferences about the parameters of the exponentiated discrete Weibull distribution (EDW) have only been presented for uncensored data.
This paper formulates the EDW model in a context of survival analysis, allowing the presence of censored data in the inference process. The inferences are made according to the frequentist and bayesian approaches. The methods developed are illustrated with simulated data and a real dataset on survival time of patients suffering from head and neck cancer. All the simulations and estimates were carried out using the free R software (R CORE TEAM, 2019).

Exponentiated discrete Weibull model
Let X be a continuous random variable. The corresponding discrete variable is denoted by T = [X], where [X] represents "the integer part of X".
By considering that, X ∼ W eibull(η, β) the cumulative distribution function of the discrete Weibull distribution (DW) according to Nakagawa and Osaki (1975) is given by: where q = exp −1 η β , 0 < q < 1 and β > 0 is the shape parameter. The cumulative distribution function of the exponentiated discrete Weibull distribution (EDW) can be expressed by: Here, γ > 0 is the shape parameter, which can also be called the resilience parameter (NEKOUKHOU; BIDRAM, 2015). Consequently, the EDW has probability distribution given by: where γ > 0, β > 0 and 0 < q < 1.
For illustration of the special cases of EDW distribuition, Figure 1 shows the probability distribution for different values of q, β and γ.
The survival function of the EDW distribution is expressed by: Through equations (2) and (3), the failure rate function of the EDW distribution is defined by: Figure 2 presents the failure rate graph of the EDW distribution for different values of q, β and γ.
One of the advantages of working with this generalization of the DW distribution is the fact it provides a parametric fit that allows working with various types of failure rates described in the literature. Figure 2 shows that depending on the combination of parameters, the failure rate function assumes increasing, decreasing, bathtub or unimodal behavior. Thus, the EDW distribution is more flexible to fit data than the DW distribution, which accommodates only monotone and constant failure rates.
An alternative way to find the EDW is first to exponentiated the continuous Weibull distribution, obtaining the exponentiated Weibull distribution proposed by Mudholkar and Srivastava (1993), and then to discretize the exponentiated continuous Weibull to obtain the exponentiated discrete Weibull distribution.

Maximum likelihood estimators
Let t = (t 1 , t 2 , ..., t n ), be a random sample observed from a population with the probability distribution and survival function of the EDW defined in equations (2) and (3). The vector of parameters θ is defined by θ = (q, β, γ) and δ = (δ 1 , δ 2 , . . . , δ n ), where δ i denotes the censoring indicator variable, which is equal to 1 if time t i is failure and 0 for right censoring according to Colosimo and Giolo (2006). The likelihood function of the EDW is given by: where γ > 0, β > 0 and 0 < q < 1. The log likelihood function is given by: where c is a constant that does not depend on θ. Deriving (6) in relation to q, β and γ we obtain the likelihood equations (Appendix A): The values ofq,β andγ that satisfy equations (7) are the maximum likelikood estimates (MLE) of the EDW distribution. Note that the solutions of equation (7) cannot be obtained analytically. However, the MLE can be obtained numerically through Newton-Raphson algorithm.
Since the parameters of the EDW distribution are limited in the parametric spaces, i.e., γ > 0, β > 0 and 0 < q < 1 , it is necessary to perform a transformation to make then unconstrained. Thus, for the parameter q we considered the loglog transformation and for the parameters β and γ we considered the logarithmic transformation. This causes the confidence intervals of q, β and γ to be given, respectively, by: wherev = logβ, and whereŵ = logγ and Z 1−α/2 is the quantile (1 − α/2) of a standard normal distribution.
The estimates of the variances, V ar(û), V ar(v) and V ar(ŵ), can be obtained numerically by the delta method (CASELLA; BERGER, 2002).

Bayesian inference
In considering the likelihood function (5) and θ = (q, β, γ), since β > 0, γ > 0 and 0 < q < 1, we considered the following prior distributions for the parameters: where a 1 , a 2 , a 3 , b 1 , b 2 and b 3 are known positive hyperparameters. Assuming independence of the prior parameters, the joint posterior distribution of θ = (q, β, γ) is proportional to: and the posterior conditional distributions are given by: The posterior distribution (11) cannot be obtained analytically, but samples from it can be obtained numerically through MCMC (Markov Chain Monte Carlo) methods by applying steps of the Metropolis-Hastings algorithm (METROPOLIS et al., 1953;HASTINGS, 1970).
The point and interval estimates of the parameters were obtained from the posterior mean and its respective highest posterior density (HPD) interval. The predictive values of the survival function of the EDW distribution are given by: where θ = (q, β, γ) and π(θ|t, δ) is the posterior distribution given by (11). Note that the expectancy in (12) can be easily obtained from the values generated by the posterior distribution (11).
We performed hypothesis testing of the parameters of the EDW model by the full bayesian significance test (FBST), which is based on the e-value (evidence of the null hypothesis). To describe the FBST, the interest is to test a null hypothesis H 0 : θ ∈ Θ 0 , where Θ 0 ⊂ Θ. Let Θ be the parametric space and θ = (q, β, γ) the vector of parameters of the EDW distribution. The set tangent to H 0 is: where θ * H0 is the vector of parameters that maximizes the posterior density, π(θ|t, δ), under the null hypothesis H 0 . Hence, the proposed evidence measure (the e-value) is defined by: The e-value is the volume (probability) of the posterior density in the set of the parametric spaces that consists of the points with posterior density smaller than the maximum point of the density under the hypothesis (De BRAGANÇ A PEREIRA and STERN, 1999). More details about the FBST can be found in De Bragança Pereira and Stern (1999) and Madruga et al. (2001).

Simulation study
This section presents the computational simulations and their results, which were obtained with the R software. The objective of the simulations is to generate survival times of the EDW distribution by the inversion method, considering a random right censoring mechanism. In the same way as Brunello and Nakano (2015), the censoring was incorporated in the samples independently of the survival time by the censoring indicator variable generated by a Bernoulli distribution, with the censoring percentages specified below.
We considered different scenarios for the simulation, considering the behavior of the failure rate in the distribution under analysis, as depicted in Figure 2. Table  1 describes the scenarios chosen for the simulations.   The results reported in Table 2 show the accuracy of the proposed model, since the estimates of bias and MSE are small, even for relatively small samples (n = 30), and the bias estimates tend to values near zero when the sample size increases. Table 3 presents the estimated bias and MSE values for each parameter in Scenario 2 for different sample sizes and censoring percentages It can be seen in Table 3 that the bias and MSE estimates are larger when the sample is small and gradually decline as the sample increases, attesting to the accuracy of the EDW model. Table 4 refers to the estimated bias and MSE values for each parameter in Scenario 3 for different sample sizes and censoring percentages.
Scenario 3 considers a bathtub-shaped failure rate. The estimated bias and MSE values presented in Table 4 are very near zero irrespective of the sample size, indicating the accuracy of the EDW model for this scenario.  Table 5 contains the estimated bias and MSE values for each parameter in Scenario 4 for different sample sizes and censoring percentages.
The estimated bias and MSE values presented in Table 5 are relatively small and tend to decrease as the sample size increases, again confirming the accuracy of the model.  In general, the estimated bias and MSE values decline with increasing sample size in the situation without censoring, since the likelihood function only relies on the probability distribution. With censoring, bias will always exist, because the "true" values of the parameters do not consider censoring, so it is expected for bias to appear. It is natural for the estimates of bias and MSE to increase when the censoring percentage is higher since these estimates are naturally biased by the fact the fact that the likelihood function in the presence of censoring counts on the contribution of the survival function. The accuracy of the EDW was demonstrated in all the scenarios. Therefore, the model can be used when the failure rate is increasing (Scenario 1) and decreasing (Scenario 2). Besides this, another advantage is that the model can be used when the risk function has a bathtub (Scenario 3) or unimodal (Scenario 4) distribution.
We also performed a sensitivity study of the choice of hyperparameters of the prior distribution of γ, where γ ∼ Gamma(a 2 , b 2 ). We chose γ because the main inferences in this work are focused on this parameter. We kept the values of the hyperparameters of the prior of β (a 1 and b 1 ) constant and equal to 0.001 and the values of the hyperparameters of the prior of q (a 3 and b 3 ) constant and equal to 1. In other words, we considered non-informative (diffuse) priors for the parameters β and q. This sensitivity study considered Scenario 1 (q = 0.8, β = 1.5, γ= 2.5). The estimates were based on a chain with size 10,000, considering a burn-in of 1,000. The convergence of marginal posteriors was verified by Geweke Diagnostic. Table  6 presents the results of a sensitivity study of the choice of hyperparameters of the prior distribution of γ with the point estimates of the (posterior mean) of the EDW model, the respective 95% HPD credible intervals and the e-value of the FBST for hypothesis H 01 : γ = 1 of Scenario 1, considering n = 200 and 0% censoring. Here we decided to use a sample without censoring to allow better control of the true value of the parameters in the samples.  Table 6 shows that the inference is robust when the prior variance is not small. Note that the estimates of the parameters and their HPD intervals do not vary greatly when the prior variance of γ is larger than 1, even when varying the values of the prior mean. Recall that if γ ∼ Gamma(a 2 , b 2 ), then the mean and variance of γ are, respectively, a 2 /b 2 and a 2 /b 2 2 (according to the parameterization adopted here). However, in cases when the prior variance is small (informative priors), and hence the prior mean is smaller than 1, the values of the posterior mean of γ are smaller. In these cases it is possible to note the influence of the priors on the posterior estimates of the parameters.
We performed the FBST for each pair of hyperparameters in Table 6, considering the limit of 0.05 for the e-value. Note that in Scenario 1, the true value of the parameter adopted in the simulations (γ=2.5) is different than 1. Therefore, for diffuse priors (priors with large variance), the inference is dominated by the sample, and consequently hypothesis H 01 : γ = 1 is rejected (e-value smaller than 0.05). Besides this, as a 2 increases (the prior mean of γ increases), the posterior mean of γ tends to rise. Consequently, the e-value gradually declines, providing further evidence to reject H 01 . In counterpart, when the prior distribution is informative (small variance) and is centered in values smaller than 1, the estimate of γ tends to decrease (more closely approaches H 01 ), strengthening the evidence that hypothesis H 01 can be true.
Source: data from Efron (1988) Table 8 presents the point and interval estimates of the parameters of the EDW model for the data in Table 7 according to the classic and bayesian approaches. In the bayesian analysis, we adopted diffuse priors for the parameters, i.e., q ∼ Beta(1, 1), γ ∼ Gamma(0.001, 0.001) and β ∼ Gamma(0.001, 0.001). The bayesian estimates were obtained through the empirical posterior distribution obtained via MCMC, considering a chain size of 10,000 with burn-in of 1,000. The point estimates (posterior means) of the parameters according to the bayesian approach are near the MLE. However, the interval estimates obtained by the bayesian approach (HPD interval) for the parameters β and γ have smaller amplitude than the classic confidence intervals.
We calculated the estimates of the survival function according to the EDW, DW, exponentiated geometric (EG) and geometric (G) distributions. These functions were estimated by two approaches, frequentist and bayesian. Figure 3 shows the bayesian fit of these distributions to the data in Table 7. The results according to the frequentist approach were similar.  It can be seen that the EDW distribution produces estimates nearest to the Kaplan-Meier (K-M) estimates, indicating this model best fits the dataset. The estimates obtained by the DW, EG and G distributions present larger deviations in relation to the K-M estimate, suggesting they do not adequately fit this dataset (Figure 3).
To check whether a simpler distribution (with fewer parameters) than the EDW can be used, we tested the following hypotheses: • H 01 : γ = 1: The DW distribution is adequate; • H 02 : β = 1: The EG distribution is adequate; and • H 03 : γ = 1 and β = 1: The G distribution is adequate.
We tested these hypotheses by the likelihood ratio test (LRT), considering 5% significance, and by the FBST, considering a cutoff point of 5% for the e-value. The results are reported in Table 9.  Table 9 shows that hypotheses H 01 , H 02 e H 03 are rejected in the two approaches (frequentist and bayesian). This indicates that the simpler distributions, such as the DW and EG, which accommodate monotone failures, and the G distribution, which accommodates constant failures, are not adequate for this dataset. Figure 4 presents the K-M estimate of the survival function and the point and interval estimates of the survival function of the EDW distribution for the data in Table 7. The classical confidence interval of the EDW distribution's survival function was obtained by the delta method and the bayesian credible interval was obtained by the predictive distribution (12). Figure 5 presents the estimate of the failure rate according to the EDW model for the data on patients with head and neck cancer. Note that the failure rate has unimodal behavior, which is in accordance with rejection of the adjustment of the GE, DW and G distributions, which only accommodate monotone failure rates.   Table 7, according to the frequentist and bayesian approaches.   Table 7, according to the frequentist and bayesian approaches.

Conclusions
The EDW distribution is very versatile, since it accommodates constant, monotone (increasing and decreasing) and non-monotone (bathtub and unimodal) failure rates.
In this paper, the EDW distribution was formulated from exponentialization (GUPTA; KUNDU, 2001) of the DW distribution of Nakagawa and Osaki (1975). However, this distribution can also be characterized by discretization of the exponentiated continuous Weibull distribution, presented by Mudholkar and Srivastava (1993). Both procedures result in the same probability distribution.
The EDW distribution reduces to the exponentiated geometric, discrete Weibull, geometric, exponentiated discrete Rayleigh and discrete Rayleigh distributions for determined fixed shape parameter values. This fact permitted applying hypothesis testing of the shape parameters of the EDW distribution as a model selection procedure. We considered the likelihood ratio test (LRT) and the full bayesian significance test (FBST). Both these tests were useful to decide on the simplest model to be used to fit the data.
The application of the EDW distribution was illustrated with a real dataset on survival of patients with head and neck cancer. The distribution proved to be a good option in this case, where the risk function presented non-monotone behavior.