One of the main characteristics of data from survival analysis is that the random variable of interest is not always observed, so that some observations are censored. The usual methods consider that these observations do not carry information about the distribution of the response variable (non-informative censoring). In other words, it is considered that an observation is censored simply by the fact that the event of interest (failure or death) did not occur during the period of study. However, in many situations, the survival time is clearly perturbed by the censoring mechanism, so the effect produced must be included in the analysis. The question is that once informative censoring is assumed to be non-informative, the results of the analysis can mask biases and thus weakening the model’s predictive power. Therefore, we consider the informative censoring mechanism in the odd-logistic Weibull regression model, based on the method described in Huang and Wolfe (2002), to analyze the variations which occur for estimating the model parameters. We obtain maximum likelihood estimates of the parameters by considering censored data and evaluate local influence on the estimates for different perturbation schemes. In addition, we define martingale and deviance residuals to detect outliers and evaluate the model assumptions. We show that the proposed regression model is useful to the analysis of real data and may give more realistic fits than other special regression models.
In general, the data from survival analysis differ from the data found in other statistical problems by the fact that the variable of interest is not always observed. One of the main characteristics of data of this nature is the presence of censoring. Censoring can be defined as a partial observation of the response variable, and it can be caused by many aspects, such as the end of the study before the occurrence of the event, death of an individual in the sample for a cause other than that studied or abandonment of treatment by the patient, among others. In survival models, it is generally assumed the independence of the failure times and censoring (non-informative censoring) times. However, there are many other causes and types of censoring that pre-determined the end of an observation period (right censoring). In clinical trials and other medical and biological studies, for example, various reasons can cause censoring to be informative. According to Lagakos (1979), these are:
When individuals withdraw from a clinical trial for reasons that can be related to the therapy being analyzed;
When individuals are removed from a clinical trial by design and are no longer regarding survival times, but have already experienced a specific critical event;
When individuals in a study experience a failure caused by an event of secondary interest and there is censoring of the failure time caused by the event of primary interest.
In all these cases, it is necessary to verify whether these partial observations bring useful information about the distribution of the failure times. As reported by Lagakos (1979), a situation where an individual withdraws from a clinical trial for failure to adapt to the treatment of interest can be considered a case of informative censoring, since the reason for censoring is related to the subsequent survival time of that individual. The problem is that once non-informative censoring has been mistakenly assumed in a data set, the results of the analysis can be biased and the usual methods will overestimate (underestimate) the survival function (Staplin et al., 2012). Also, the biases of the estimates will tend to increase when the number of censored observations rises. In response to this problem, some authors have formulated proposals to model survival data under the assumption of informative censoring. Lagakos and Williams (1978) proposed a model that presents a scalar parameter that involves the hidden censoring mechanism. For 1, the likelihood function only depends on the survival distribution. In contrast, for 0 1, the censoring effects become increasingly important for estimating the model parameters. Wang et al. (2001) modeled the occurrence of recurrent events with the use of a non-stationary Poisson process and a latent variable, considering the distribution parameters of the (informative) censoring and the latent variables as nuisance parameters. Then, they considered a multiplicative intensity model for nonparametric estimation of the hazard function. Rotnitzky et al. (2007) obtained estimators of the survival function in the presence of right-censoring mechanisms. In the same study, they also performed sensitivity analysis of the estimator representing the potentially informative censoring, assuming that after the adjustment for all the prognostic factors, the failure and censoring times are independent. In this same line of reasoning, Scharfstein and Robins (2002) presented a method that, besides fitting models considering informative censoring, allows simultaneously quantifying the sensitivity of the inference for residual dependence between the failure and the censoring due to uncontrolled factors. However, this method does not allow more than one censoring mechanism, meaning that all the censoring of the data should be considered as informative. Therefore, the model of Rotnitzky et al. (2007) is an extension of this method under multiple causes of censoring.
In all these works, the greatest difficulty is in identifying the dependent censoring mechanism in the data (Tsiatis, 1975). In this case, several authors have proposed sensitivity analysis methods to assess the effects of dependent censoring on estimates of time-to-failure distribution parameters (Siannis, 2004; Siannis et al., 2005; Zhang & Heitjan, 2006; Huang & Zhang, 2008). Siannis (2004) and Siannis et al. (2005) used a proportional risk structure along with a linear predictor to allow estimation of the individual changes of the estimates. A more comprehensive sensitivity analysis for models in the presence of informative censoring was proposed by Siannis (2011) using the Cox proportional risk model, which is more flexible than standard parametric survival models. In the context of competitive risks, Lu and Tsiatis (2011) adopted auxiliary covariables to obtain estimators to quantify informative censoring that are more efficient than those that disregard these covariables. Freitas and Rodrigues (2013) considered the standard exponential cure rate model under informative censoring and investigated through a simulation study the impact caused by informative censoring on the probabilities of coverage and length of the asymptotic confidence intervals of the parameters of interest. The objectives of this work are to fit a new parametric model to real survival data assuming that the failure and censoring times are conditionally independent, given a frailty (informative censoring mechanism), and to analyze the variations in the estimates of the parameters when considering the usual informative censoring methods.
For the assessment of model adequacy, we develop diagnostic studies to detect possible influential or extreme observations that can cause distortions to the results of the analysis. Further, we compare two types of residuals to assess departures from the error assumptions as well as to detect outlying observations in the log-odd logistic-Weibull (LOLLW) regression model with informative censoring.
The paper is organized as follows. In Section 2, we define the LOLLW regression model. In Section 3, we study the informative censoring mechanism in the location-scale regression model. In Section 4, we adopt several diagnostic measures under three perturbation schemes in the proposed regression model with informative censoring and we define two kinds of residuals from the fitted model to assess departures from the error distribution assumption and to detect outlying observations. In Section 5, we analyze a real data set to show the flexibility, practical relevance and applicability of our regression model. We offer some concluding remarks in Section 6.
The LOLLW regression model
Most generalized Weibull distributions have been proposed in reliability literature to provide better fits to certain data sets than the traditional two- or three-parameter Weibull models. See, for example, the distributions discussed in Tables 1 and 2 by Pham and Lai (2007). Recently, da Cruz et al. (2015) introduced a three-parameter odd log-logistic Weibull (OLLW) distribution having probability density function (pdf)
where 0 is a scale parameter and 0 and 0 are shape parameters. Henceforth, we denote by a random variable with pdf Eq. (1). The survival function of is
Then, the hazard rate function (hrf) of is . The great flexibility of this model to fit lifetime data is due to different forms of the hrf: (i) if 1, it is the Weibull hazard function; (ii) if and for some values of and , it can have bathtub-shaped; (iii) if 1 and special combinations of and , it is unimodal.
Plots of the OLLW density for some parameter values. (a) For different values of , and . (b) For different values of and with 2.45.
Some plots of the density of for selected parameter values, including well known distributions, are displayed in Fig. 1a and b. A characteristic of the OLLW distribution is that its pdf can be monotonically (increasing or decreasing), unimodal, bimodal, increasing-decreasing-increasing shaped, among others, depending basically on the parameter values.
Alternatively, other works had introduced using the odd log-logistic family of distributions, for example, Mendoza et al. (2016) presented the exponentiated log-logistic geometric distribution with dual activation and Cordeiro et al. (2016) considered a the odd log-logistic generalized half-normal lifetime distribution. Recently, da Silva et al. (2017) introduced the odd log-logistic Student distribution, da Cruz et al. (2017) proposed the bivariate odd-log-logistic Weibull regression model for oral health-related quality of life, Prataviera et al. (2018a) presented a generalized odd log-logistic flexible Weibull regression model with applications in repairable systems and Prataviera et al. (2018b) defined the heteroscedastic odd log-logistic generalized gamma regression model for censored data.
Let ba a random variable having the LOLLW distribution. Recently, da Cruz et al. (2015) proposed the LOLLW regression model given by
where , , 0 and 0 are unknown parameters and is the explanatory variable vector modeling the linear predictor . Hence, the linear predictor vector of the LOLLW regression model is simply , where is a known model matrix. Equation (2) is also referred to as the log-location-scale or accelerated failure time model.
We define the standardized random variable . The density function of the response can be expressed as
where , , 0 and 0.
The survival function, hrf and cumulative hrf of are given by
and
respectively.
Informative censoring in the regression model
In this section, we construct the marginal likelihood function under informative censoring by considering that the censored times carry information about the times to failure (informative or dependent censoring). We follow the method described by Huang and Wolfe (2002) using the gamma function to build the frailty distribution and assuming that it acts in multiplicative form (Santos Jr., 2012). Under a right censoring mechanism, let be a random variable representing the failure time of an observation (time until the occurrence of the event of interest), and be another random variable, which represents the censoring time associated with this observation. For , the observed data will consist of the pairs , where is the logarithm of the time observed for the th observation (, ) and is the variable indicating failure, i.e., 1 if or 0 if . Further, the density and survival functions of and are denoted by , , and , respectively, where is the parameter vector associated with the failure time distribution and is the parameter vector associated with the censoring time.
If an association exists between the failure and censoring times, the standard methods used to analyze censored data may not be robust, thus causing the need to formulate a structure able to incorporate this dependence. For this purpose, based on the work of Huang and Wolfe (2002), we consider that the random variables and are independent when individually conditioned to a frailty (random effect). Thus, as presented in Santos Jr. (2012), we assume that . Although other distributions can be used, such as the uniform, Weibull and log-normal (Vaupel & Yashin, 1983), the frailty to the gamma distribution in semi-parametric models is more widely used, basically due to its algebraic convenience. The choice of the frailty distribution with mean and variance equal to guarantees identifiability of the model.
Considering the model in which the frailty acts in a multiplicative form, the conditional hazard function for the logarithm of the failure times is
where is the basic hazard function for the logarithm of the failure times. The conditional survival function is
where is the basic cumulative hazard function for . Likewise, the risk and survival functions can be defined for the censoring times. Hence, the conditional hazard function for the censoring times can be expressed as
where is the basic hazard function for the logarithm of the censoring times. The conditional survival function has the form
where is the basic cumulative hazard function for .
Also, the marginal survival can be expressed as
Then, considering that and , we have
So, when considering the logarithms of the failure or censoring times, the marginal survival function will only change with respect to the cumulative failure rate, which is associated with the distribution taken for the current times. The maximum likelihood estimators (MLEs) under the assumption that and are conditionally independent, given a frailty , are found by maximizing the marginal likelihood function. By assuming the frailty in the joint distribution of and , the marginal likelihood function has the form
By assuming that and are conditionally independent given the frailty , the marginal likelihood function Eq. (9) can be rewritten as
By substituting the functions Eqs (4)–(7) into Eq. (10) and considering the relations
the likelihood function can be rewritten as
By substituting the density function from the frailty , we obtain
The kernel of a gamma distribution appears in the integral, i.e.
So, the marginal likelihood function reduces to
The marginal likelihood function Eq. (11) will have closed-form when the basic hazard functions and cumulative hazard functions have closed-forms. Hence, the log-likelihood function is
Consider a sample of independent observations, where the random response is defined by . We assume informative censoring and the LOLLW distribution for the log-lifetime and log-Weibull for the log-censoring as in Section 2. The log-likelihood function for the vector , where , and , is obtained from the models Eqs (4) and (6) as
where is the number of uncensored observations (failures),
We consider the regression structure only in the log-lifetime, which is the main interest. Future research can be developed to add the regression structure simultaneously in both log-lifetime and log-censoring.
The log-likelihood can be maximized either directly by using the SAS (NLMixed procedure), R (optim) or MaxBFGS routine in the matrix programming language Ox (Doornik, 2007) or by solving the nonlinear likelihood equations obtained by differentiating Eq. (12). Initial values for , , , and can be taken from the fit of the log-Weibull regression model with informative censoring for 1.
Let be the MLE of . Approximate confidence intervals and hypothesis tests on the model parameters require the total observed information matrix . Under general conditions, the asymptotic distribution of is , where is the expected information matrix. In practice, we can replace by , i.e. the observed information matrix evaluated at . The observed information matrix can be obtained from the authors upon request.
We can construct approximate confidence intervals for the parameters based on the multivariate normal distribution. Further, likelihood ratio (LR) statistics can be used to compare the LOLLW regression model with informative censoring and some of its sub-models. We can compute the maximum values of the unrestricted and restricted log-likelihoods to obtain LR statistics for testing some sub-models of the LOLLW regression model with informative censoring. For example, the test of 1 versus is not true is equivalent to compare the LOLLW and log-Weibull regression models with informative censoring. In this case, the LR statistic is , where and are the MLEs under and , respectively. For large samples, has approximately a chi-square distribution with two degrees of freedom.
Checking model: Diagnostic and residual analysis
An important step in the analysis of a fitted model is to check for possible deviations from the model assumptions. In this context, it is important to detect the presence of outliers in the data and to evaluate their impact on the inferential results. Therefore, an analysis of the residuals can help to validate the stability and robustness of the inferential results.
Influence diagnostic is important in the analysis of real data, since it can reveal the inadequacy model fit or influential observations. Since regression models are sensitive to the underlying model assumptions, generally performing a sensitivity analysis is strongly advisable. Cook (1986) used this idea to motivate this assessment of influence analysis. He suggested that more confidence can be put in a model which is relatively stable under small modifications. Another approach, also suggested by Cook (1986), is to weight observations instead of removing them. Previous works on local influence curvatures in regression models for censored data are due to Escobar and Meeker (1992), Ortega et al. (2003, 2009, 2011), Silva et al. (2008), Silva et al. (2009) and Hashimoto et al. (2010). The calculation of local influences can be carried out for model Eq. (2) with informative censoring. If the likelihood displacement is used, where denotes the MLE under the perturbed model, then the normal curvature for at direction , , is , where is a matrix which depends on the perturbation scheme. The elements of this matrix are given by , and , evaluated at and , where is the no-perturbation vector. For the LOLLW regression model with informative censoring, the elements of can be obtained from the authors under request. We can also calculate the normal curvatures , and to perform various index plots such as the index plot of , the eigenvector corresponding to , the largest eigenvalue of the matrix , and the index plots of , and , which are together called the total local influence (Lesaffre & Verbeke, 1998), where denotes an vector of zeros with one at the th position. Thus, the curvature at the direction has the form , where denotes the th row of . It is commonplace to point out cases for which , where .
Next, for three perturbation schemes, we calculate the matrix:
We consider the model Eq. (2) with informative censoring and its log-likelihood function Eq. (12). Let be the vector of weights.
Case-weight perturbation
In this case, the log-likelihood function has the form
where 0 1, and , , and are defined in Eq. (12). Here, can be calculated numerically.
Response perturbation
Next, we consider that each is perturbed as , where is a scale factor that may be estimated by the standard deviation of the observed response and . The perturbed log-likelihood function can be expressed as
where
and . The matrix is found numerically.
Explanatory variable perturbation
Consider now an additive perturbation on a particular continuous explanatory variable, say , by setting , where is a scale factor and . The perturbed log-likelihood function is
where
and . The matrix is determined numerically.
The assessment of the fitted model is an important part of data analysis, particularly in regression models, and residual analysis is a helpful tool to validate the fitted model. Examination of residuals can be used, for instance, to detect the presence of outlying observations, the absence of components in the systematic part of the model and departures from the error and variance assumptions. However, finding appropriate residuals in non-normal regression models has been an important topic of research, particularly under censoring. For more details, see Ortega et al. (2009), Hashimoto et al. (2010) and Silva et al. (2011).
Martingale residual
In parametric lifetime models with informative censoring, the martingale residual can be expressed as
where 1 indicates that the observation is uncensored and 0 indicates that the observation is censored, and are the survival functions for the failure and censoring times calculated from Eqs (5) and (7), respectively.
Thus, the martingale residual for the LOLLW regression model with informative censoring takes the form
where and are defined in Section 3.
Modified deviance residual
Another possibility is to use a transformation of the martingale residual based on the deviance component residual for the Cox proportional hazard model with no time-dependent explanatory variables as introduced by Therneau et al. (1990), and defined by
where is the martingale residual presented in Eq. (13). Thus, the deviance modified residual for the LOLLW regression model with informative censoring takes the form
We use this transformation to obtain a new residual symmetrically distributed around zero.
Application
In this section, we analyze the informative censoring mechanism by means of a real data set from the book by Collett (2003). During treatment for leukemia, the patients are often submitted to bone marrow transplantation to help bring their corpuscle count to a normal level. But this can trigger a potentially fatal side effect, graft-versus-host disease, in which the transplanted cells attack the host cells. The data considered come from 37 patients who were in remission from acute myeloid leukemia (AML) or acute lymphocytic leukemia (ALL) or suffering from chronic myeloid leukemia (CML) and received a non-impoverished allogeneic bone marrow transplant. In this application, we consider the following variables:
: survival time in days of patients who were in complete remission from AML or ALL or in the chronic phase of chronic myeloid leukaemia CML and received a non-depleted allogeneic bone marrow transplant;
: censoring indicator;
: pregnancy of the donor (0 no, 1 yes);
: donor-versus-host disease (0 no, 1 yes).
Descriptive analysis
For these 37 patients, the event of interest occurred in 17 of them, i.e., 46% of the observations failed and 56% were censored. Some descriptive measures are given in Table 1 for the bone marrow transplant data considering failure and censoring times separately.
Descriptive statistics for the bone marrow transplant data
Mean
Median
SD
Skewness
Kurtosis
Min.
Max.
Failure times
271.82
142
306.91
1.76
2.18
41
1181
Censoring times
1008.05
1006
319.16
0.12
1.52
572
1504
Figure 2 displays the histograms for all observed times, the failure times only and for the censoring times only. For the three cases it is difficult to choose a distribution, since they do not have known forms.
Table 2 gives the MLEs, their standard errors (SEs) in parentheses and the values of the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and Consistent Akaike Information Criterion (CAIC). For the censoring times, the AIC, CAIC and BIC values are lower for the Weibull distribution, thus indicating a better fit than the OLLW model. However, for the failure times, the OLLW model provides the lowest values of these statistics.
Figure 3a and b display the histograms of the fitted Weibull and OLLW densities for the censoring and failure times, respectively. Figure 3a shows that the Weibull distribution yields a better fit for the censoring times than the OLLW model. However, Fig. 3b reveals that the OLLW distribution gives a better fit to the failure times. Based on the plots in Fig. 3, it is reasonable to consider that the times to the events follow the OLLW distribution.
MLEs, SEs (in parentheses) and the AIC, CAIC and BIC statistics for the Weibull and OLLW models under censoring and failure times fitted to the bone marrow transplant data
Model
AIC
CAIC
BIC
Censoring times
Weibull
3.6468
1121.06
1
289.8
290.1
293.0
(0.6518)
(72.6025)
(–)
OLLW
7.5219
1152.68
0.4348
290.4
291.1
295.5
(4.1264)
(84.0535)
(0.2704)
Failure times
Weibull
1.0581
278.0
1
228.5
229.3
230.1
(0.1857)
(67.6267)
(–)
OLLW
0.4829
279.95
3.9955
219.6
221.5
222.1
(0.5308)
(37.1882)
(4.4379)
Histograms for the: (a) Observed times (full sample). (b) Failure times. (c) Censoring times.
Next, we fit the LOLLW regression model with informative censoring
where the variable has the LOLLW distribution given by Eq. (3) (for ).
Table 3 lists the MLEs for the LOLLW regression model with non-informative and informative censoring. The covariable becomes significant when considering informative censoring. Also, the SEs of the MLEs are much smaller under informative censoring, thus indicating a better fit for this case.
MLEs and their SEs for the LOLLW regression model fitted to the bone marrow transplant data under non-informative and informative censoring
Non-informative
Informative
MLE
SE
-value
MLE
SE
-value
0.2084
0.2820
–
0.0973
0.0188
–
0.2357
0.3153
–
0.1304
0.0098
–
–
–
–
7.0042
0.0713
–
–
–
–
0.2457
0.0434
–
–
–
–
5.2310
3.9063
–
8.0774
0.5790
0.001
9.2274
0.0032
0.001
1.0546
0.4662
0.0304
0.9774
0.3430
0.001
1.1286
0.6205
0.0780
2.2651
0.3216
0.001
The fitted Weibull and OLLW densities. (a) Censoring times and (b) Failure times.
Comparing non-nested models
Note that the LOLLW regression model with informative censoring and the LOLLW regression model with non-informative censoring are non-nested. An alternative generalized LR statistic for discriminating among non-nested models is discussed in the book by Cameron and Trivedi (1998, p. 184). Consider two non-nested models – model with density function and model with density function . A distance between the two models measured in terms of the Kullback-Liebler information criterion is
For strictly non-nested models, the statistic converges in distribution to a standard normal distribution under the null hypothesis of equivalence of the models. Thus, the null hypothesis is not rejected if . On the other hand, we reject (at the % significance level) the null hypothesis of equivalence of the models in favor of model being better (or worse) than model if (or ).
We shall use Eq. (3) to represent the pdf under informative censoring () and non-informative censoring ().
The generalized LR statistic is 22.6911. Since 1.96, we reject (at the 5% significance level) the null hypothesis of equivalence of the LOLLW models with informative censoring and non-informative censoring. Further, the value of this statistic indicates that the model with informative censoring is the best model for the current data.
Local and total influence analysis
In this section, we analyze local influences with respect to the bone marrow transplant data using the LOLLW regression model with informative censoring.
Case-weight perturbation
We apply the local influence framework developed in Section 4 in which case-weight perturbation is used. For the maximum curvature, we have the value 1.9204. In Fig. 4a, we plot the eigenvector corresponding to . The plot of the total influence is displayed in Fig. 4b. The observations 1, 8, 25 and 28 are very distinguished in relation to the others.
(a) Index plot of for (case-weight perturbation) and (b) total local influence for (case-weight perturbation) based on the current fitted model to the bone marrow transplant data.
Response variable perturbation
Here, the influence of perturbations on the observed survival times is analyzed. The value for the maximum curvature is 97.708. Figure 5a displays the plot for versus the observation index, which reveals that the observation 9 is more salient in relation to the others. Figure 5b displays the plot of the total local influence (), thus indicating that the observations 9 and 25 again stand out.
Impact of the detected influential observations
The diagnostic analysis detected the cases 25 and 28 as potentially influential observations. The observation 25 corresponds to the lowest survival time ( 572) and lifetime 28 is the highest in the failure time ( 572).
In order to reveal the impact of these two observations on the parameter estimates, we refit the model under some situations. First, we individually eliminate each one them. Next, we remove from the set “A” (original data set) the totality of potentially influential observations.
The figures in Table 4 give the relative change (in percentage) of each estimate defined by , and the corresponding -value, where is the MLE of after the “set ” of observations be removed. Table 4 provides the following sets: , and . It indicates that the estimates of the LOLLW regression model with informative censoring are not highly sensitive under deletion of the outstanding observations. In general, the significance of the estimates does not change (at the significance level of 5%) after removing the set . Hence, we do not have inferential changes after removing the observations handed out in the diagnostic plots.
Relative changes [RC in %], estimates and their -values (in parentheses) for some sets
Set(A)
–
–
–
–
–
–
–
–
0.0973
0.1304
7.0042
0.2457
5.2310
9.2274
0.9774
2.2651
(–)
(–)
(–)
(–)
(–)
( 0.0010)
(0.0075)
( 0.0010)
[1.0637]
[0.1992]
[0.2986]
[1.3553]
[2.4480]
[0.8348]
[12.5455]
[8.2005]
0.0983
0.1307
7.0251
0.2424
5.3590
9.3044
1.1001
2.4508
(–)
(–)
(–)
(–)
(–)
( 0.0010)
(0.0075)
( 0.0010)
[3.2922]
[1.3677]
[0.2182]
[20.3612]
[0.3638]
[0.3187]
[11.6610]
[1.5614]
0.0943
0.1322
6.9889
0.2957
5.2119
9.2568
0.8635
2.3004
(–)
(–)
(–)
(–)
(–)
( 0.0010)
(0.0273)
( 0.0010)
[4.8551]
[4.6130]
[0.5171]
[17.4850]
[22.2952]
[1.5168]
[67.8432]
[10.8243]
0.0925
0.1244
7.0405
0.2027
4.0647
9.0874
1.6406
2.0199
(–)
(–)
(–)
(–)
(–)
( 0.0010)
( 0.0010)
( 0.0010)
(a) Index plot of for (simultaneous response perturbation ) and (b) total local influence for (simultaneous response perturbation) based on the model fitted to the bone marrow transplant data.
Residual analysis
To detect possible outlying observations in fitting the LOLLW regression model to non-informative censoring and the LOLLW regression model with informative censoring, Fig. 6 provides the index plot of . It indicates that the residuals are not randomly scattered around zero for the LOLLW regression model with non-informative censoring. This plot also shows that the residuals discloses the formation of two groups. The appearance of Fig. 6b gives a much better randomly scattered plot of the residuals around zero for the LOLLW regression model with informative censoring. It also indicates that this regression model is more appropriate to fit the data since it does not present outliers.
Index plot of the deviance component residuals for the bone marrow transplant data. (a) LOLLW regression model under non-informative censoring. (b) LOLLW regression model under informative censoring.
Conclusions
We introduce and study the log-odd log-logistic Weibull (LOLLW) distribution and construct the LOLLW regression model to investigate the informative censoring mechanism in a type of location-scale regression model. We use maximum likelihood to estimate the model parameters. We adopt several diagnostic measures considering three perturbation schemes in the new regression model with informative censoring. We define two kinds of residuals from the fitted model to assess departures from the error distribution assumption and outlying observations. The flexibility, practical relevance and applicability of the proposed regression model are illustrated by means of a real data set. The fitted LOLLW regression model with informative censoring is more effective to the current data because its predictive power is better as shown by the smaller standard errors of the maximum likelihood estimates of the model parameters and also for yielding improved residuals then for the fitted model with non-informative censoring.
Footnotes
Acknowledgments
We are very grateful to a referee and an associate editor for helpful comments that considerably improved the paper. We gratefully acknowledge financial support from CAPES and CNPq.
References
1.
CameronA.C., & TrivediP.K. (1998). Regression Analysis of Count Data. Cambridge University Press, New York.
2.
CollettD. (2003). Modelling Survival Data en Medical Research. Chapman and Hall, London.
3.
CookR.D. (1986). Assessment of local influence (with discussion). Journal of the Royal Statistical Society B, 48, 133-169.
4.
CordeiroG.M.AlizadehM.PescimR.R., & OrtegaE.M.M. (2016). The odd log-logistic generalized half-normal lifetime distribution: Properties and applications. Communications in Statistics – Theory and Methods, 46, 4195-4214.
5.
da CruzJ.N.OrtegaE.M.M., & CordeiroG.M. (2015). The log-odd log-logistic Weibull regression model: Modeling, estimation, influence diagnostics and residual analysis. Journal of Statistical Computation and Simulation, 86, 1516-1538.
6.
da CruzJ.N.OrtegaE.M.M.CordeiroG.M.SuzukiA.K., & MialheF.L. (2017). Bivariate odd-log-logistic-Weibull regression model for oral health-related quality of life. Communications for Statistical Applications and Methods, 24, 271-290.
7.
da SilvaA.B.CordeiroG.M.OrtegaE.M.M., & SilvaG.O. (2017). The odd log-logistic student t distribution: Theory and applications. Journal of Agricultural Biological and Environmental Statistics, 22, 615-639.
8.
DoornikJ.A. (2007). An Object-Oriented Matrix language Ox 5. Timberlake Consultants Press, London.
9.
EscobarL.A., & MeekerW.Q. (1992). Assessing influence in regression analysis with censored data. Biometrics, 48, 507-528.
10.
FreitasL.A., & RodriguesJ. (2013). Standard exponential cure rate model with informative censoring. Communications in Statistics – Simulation and Computation, 42, 8-23.
11.
HashimotoE.M.OrtegaE.M.M.CanchoV.G., & CordeiroG.M. (2010). The log-exponentiated Weibull regression model for interval-censored data. Computational Statistics and Data Analysis, 54, 1017-1035.
12.
HuangX., & WolfeR.A. (2002). A frailty model for informative censoring. Biometrics, 58, 510-520.
13.
HuangX., & ZhangN. (2008). Regression survival analysis with an assumed copula for dependent censoring: A sensitivity analysis approach. Biometrics, 64, 1090-1099.
14.
LagakosS.W. (1979). General right Censoring and its impact on the analysis of survival data. Biometrics, 35, 139-156.
15.
LagakosS.W., & WilliamsJ.S. (1978). A cone class of variable-sum models. Biometrika, 65, 181-189.
16.
LesaffreE., & VerbekeG. (1998). Local influence in linear mixed models. Biometrics, 54, 570-582
17.
LuX., & TsiatisA.A. (2011). Semiparametric estimation of treatment effect with time-lagged response in the presence of informative censoring. Lifetime Data Analysis, 17, 566-593.
18.
MendozaN.V.R.OrtegaE.M., & CordeiroG.M. (2016). The exponentiated log-logistic geometric distribution: Dual activation. Communications in Statistics – Theory and Methods, 13, 3838-3859.
19.
OrtegaE.M.M.BolfarineH., & PaulaG.A. (2003). Influence diagnostics in generalized log-gamma regression models. Computational Statistics and Data Analysis, 42, 165-186.
20.
OrtegaE.M.M.CanchoV.G., & PaulaG.A. (2009). Generalized log-gamma regression models with cure fraction. Lifetime Data Analysis, 15, 79-106.
21.
OrtegaE.M.M.CordeiroG.M., & HashimotoE.M. (2011). A log-linear regression model for the Beta-Weibull distribution. Communications in Statistics – Simulation and Computation, 40, 1206-1235.
22.
PhamH., & LaiC.D. (2007). On recent generalizations of the Weibull distribution. IEEE Transactions on Reliability, 56, 454-458.
23.
PratavieraF.OrtegaE.M.M.CordeiroG.M.PescimR.R., & VerssaniB.A.W. (2018a). A new generalized odd log-logistic flexible Weibull regression model with applications in repairable systems. Reliability Engineering and System Safety,176, 13-26.
24.
PratavieraF.OrtegaE.M.M.CordeiroG.M., & da SilvaA.B. (2018b). The heteroscedastic odd log-logistic generalized gamma regression model for censored data. Communications in Statistics – Simulation and Computation, 48, 1-25.
25.
RotnitzkyA.FarallA.BergesioA., & ScharfsteinD. (2007). Analysis of failure time data under competing censoring mechanisms. Journal of the Royal Statistical Society B, 69, 307-327.
26.
SantosP.C., Jr., (2012). Análise de sobrevivência na presença de censura informativa. Dissertação (Mestrado em Estatística) – Universidade Federal de Minas Gerais-Belo Horizonte/MG.
27.
ScharfsteinD.O., & RobinsJ.M. (2002). Estimation of the failure time distribution in the presence of informative censoring. Biometrika, 89, 617-634.
28.
SiannisF. (2004). Applications of a parametric model for informative censoring. Biometrics, 60, 704-714.
29.
SiannisF. (2011). Sensitivity analysis for multiple right censoring processes: Investigating mortality in psoriatic arthritis. Statistics in Medicine, 6, 77-91.
30.
SiannisF.CopasJ., & LuG. (2005). Sensitivity analysis for informative censoring in parametric survival models. Biostatistics, 6, 77-91.
31.
SilvaG.O.OrtegaE.M.M., & CordeiroG.M. (2009). A log-extended Weibull regression model. Computational Statistics and Data Analysis, 53, 4482-4489.
32.
SilvaG.O.OrtegaE.M.M.GaribayV.C., & BarretoM.L. (2008). Log-Burr XII regression models with censored data. Computational Statistics and Data Analysis, 52, 3820-3842.
33.
SilvaG.O.OrtegaE.M.M., & PaulaG.A. (2011). Residuals for log-Burr XII regression models in survival analysis. Journal of Applied Statistics, 38, 1435-1445.
34.
StaplinN.D.KimberA.C.CollettD., & RoderickP.J. (2014). Dependent censoring in piecewise exponential survival models. Statistical Methods in Medical Research. 24, 325-341.
TsiatisA. (1975). A noindentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences, 72, 20-22.
37.
VaupelJ.W., & YashinA.I. (1983). The Deviant Dynamics of Death in Heterogeneous Populations. International Institute for Applied Systems Analysis Research Report, Laxenburg, Austria.
38.
WangM.C.QinJ., & ChiangC.T. (2001). Analyzing recurrent event data with informative censoring. Journal of the American Statistical Association, 96, 1057-1065.
39.
ZhangJ., & HeitjanD.F. (2006). A simple local sensitivity analysis tool for nonignorable coarsening: Application to dependent censoring. Biometrics, 62, 1260-1268.