Abstract
We propose two regression models based on the beta modified Weibull distribution. The first one is the long-term mixture lifetime regression applied to survival data when some individuals may never experience the event of interest for possible presence of long-term survivors in the data. This regression attempts to estimate the effects of covariates on the surviving fraction. The second one is regression model based on the log-beta modified Weibull distribution as an alternative to the log-modified Weibull regression. This model aims to estimate the effects of covariates on the survival times. These new models generalize some existing regressions in the literature. For both cases, the model parameters are estimated by the method of maximum likelihood for censored data. We derive the appropriate matrices for assessing the local influence on parameter estimates under different perturbation schemes and present a global sensitivity analysis. A model check based on the quantile residuals is performed to select the appropriate regressions. We reanalyze two data sets available in the literature, one for each regression.
Introduction
In the last decade, new classes of distributions have been proposed for modeling lifetime data by extending the Weibull distribution. One important extension is the beta modified Weibull (BMW) distribution (Silva et al., 2010). It includes as special cases some important distributions such as the generalized modified Weibull (GMW), beta Weibull (BW), exponentiated Weibull (EW), beta exponential (BE), modified Weibull (MW) and Weibull distributions, among several others. The BMW distribution has attracted several proposals in the area of reliability. The main motivation for using the BMW model is due to its flexibility in accommodating four types of the hazard rate function (hrf) (i.e. increasing, decreasing, unimodal and bathtub) depending on its parameters.
In this paper, we define two new regressions based on this distribution as feasible alternatives for modeling survival times that present or not cure fraction. Cure rate models have been used for modeling time-to-event data for various types of cancers, including breast cancer, non-Hodgkins lymphoma, leukemia, prostate cancer and melanoma.
Perhaps the most popular cure rate regressions is the mixture models (MMs) (Farewell, 1982). Li et al. (2001) investigated MMs in the presence of dependent censoring from the perspective of competing risks and model the dependence between the censoring and survival times using a class of Archimedean copula models. Zeng et al. (2006) studied a class of transformation models for survival data with a cure fraction. The class of transformation models is motivated by biological aspects, and it includes as special cases both the proportional hazards and proportional odds cure models. Further, Rizzato et al. (2009) introduced the generalized log-gamma mixture regression with covariates, Lanjoni et al. (2016) formula a new cure rate survival model, where the time to this event has the Burr XII geometric distribution and recently Ortega et al. (2017) proposed two new models with a cure rate called the odd Birnbaum-Saunders mixture and odd Birnbaum-Saunders geometric models and Ramires et al. (2018) propose a flexible semi-parametric cure rate survival model called the sinh Cauchy cure rate distribution. In this paper, we propose a new regression called the beta modified Weibull mixture (BMWM) regression for survival data with cure rate and long-term survivors. The proposed regression includes as special cases the traditional cure rate models.
In the absence of long term survivors, the location-scale regression model (Lawless, 2003) is distinguished since it is frequently used in clinical trials, when lifetimes are affected by explanatory variables. We define a log-linear regression called the log-beta modified Weibull (LBMW) regression for modeling data whit bathtub shaped, monotonically increasing or decreasing and upside-down bathtub. This new regression model can be an good alternative model for analysis survival data. The main motivation ao inves de motivação poderiamos usar vantagem for using the LBMW regression is that several models listed above, for example, the log-generalized modified Weibull and log-beta Weibull regressions, can be embedded in the LBMW regression. So, this model can be used too for select adequate model.
We consider a frequentist analysis for both regressions. The inferential part is carried out using the asymptotic distribution of the maximum likelihood estimators (MLEs). After modeling, it is important to check the assumptions in the regression and to conduct a robustness study to detect influential or extreme observations that can distort the results of the analysis. We discuss the diagnostic influence based on case-deletion introduced by Cook (1977). But, when case-deletion is used, all information from a single subject is deleted at once and therefore it is hard to tell whether that subject has some influence on a specific aspect of the model. A solution to this problem can be found in the local influence approach, where one investigates how the results of an analysis changed under small perturbations of the model, but where these perturbations can have specific interpretations. Cook (1986) introduced a general framework to detect the influence of observations which indicate how sensitive the analysis is when small perturbations are provoked in the data or in the model. We also used this methodology to detect influential subjects in the BMWM and LBMW regressions. Additionally, quantile residuals (Dunn & Smyth, 1996) are adopted to check classical assumptions in the regressions.
In recent years, several authors proposed various regression models with present or not cure fraction. See, for example, Korkmaz et al. (2019), Lanjoni et al. (2019), Pescim et al. (2019), Scudilio et al. (2019) and Yousof et al. (2019).
The rest of the paper is organized as follows. In Section 2, we review the BMW distribution and define the BMWM regression based on this distribution. In Section 3, we define the BMWM regression for censored data. We estimate the model parameters by the method of maximum likelihood and discuss some goodness-of-fit statistics in Section 4. Several diagnostic measures are addressed in Section 5 by considering case-deletion and normal curvatures of local influence under various perturbation schemes with censored observations. In Section 6, the quantile residuals (qrs) are used to assess departures from the underlying BMWM and LBMW regressions to detect outliers. In Section 7, two real data sets are analyzed, which prove empirically the flexibility, practical relevance and applicability of the proposed regressions. Section 8 ends with some concluding remarks.
The BMWM regression
The BMW distribution
Let
where
The density function and hazard rate function (hrf) corresponding to Eq. (1) are
and (by using
Henceforth,
According to Silva et al. (2010), a characteristic of the BMW distribution is that its hrf can be bathtub shaped, monotonically increasing or decreasing and upside-down bathtub depending basically on the parameter values. Another important characteristic of this distribution is that it contains as special cases several well-known distributions. For example, it simplifies to the BW distribution when
The
where
In a sample of censored survival times, the proportion of individuals who are immune and not subject to death, failure or relapse may be indicated by a relatively high number of individuals with large censored survival times. In this section, the BMW model is modified to become a new mixture cure rate model.
For formulating the BMWM regression, we consider that the studied population is a mixture of susceptible (uncured) individuals, who may experience the event of interest, and non-susceptible (cured) individuals, who will experience it (Maller & Zhou, 1996). This approach allows to estimate simultaneously whether the event of interest will occur, which is called incidence, and when it will occur, given that it can occur, which is called latency. Let
where
where
The BMWM regression when
In the BMWM regression, the parameters of interest are
Then, the estimated mean cure fraction is
Consider data in the form
and the contribution of an individual that is at risk at
Combining the last two equations, the log-likelihood function for the parameter vector
Here,
and
where
The log-regression model
We propose a log-regression model where the response variable
Considering the transformations
where
The survival function of
where
By expanding
where
The ordinary moments of
where the moments
There are many real situations where the response variable
Further, we consider that the location parameter
where
where
In this case, the survival function of
The LBMW regression Eq. (8) opens new possibilities for fitting several different types of data. It contains as special cases the following well-known regressions and some other new regressions: for
For the interpretation of the estimated coefficients, a possible proposal is based on the ratio of median times (see Hosmer & Lemeshow, 1999). Hence, when the covariable is binary (1 or 0), and considering the ratio of median times with
Consider the sample
where
The MLE of the parameter vector
For testing nested models, the likelihood ratio test (LR) can be used to discriminate such models. We can compute the maximum values of the unrestricted and restricted log-likelihoods to construct the LR statistics for testing some sub-models. We consider the partition
Further, we can select the best model based on the values of the Akaike information criterion (AIC), corrected Akaike information criterion (AICc) and global deviance (GD) statistics. Given a data set, a researcher has an onerous task to select the best model among some possible alternatives. The model with the smallest AIC, AICc and GD values can be selected as the best one. We can also use the minimum value of the-log-likelihood function to choose the best fitted model.
Sensitivity analysis
After fitting the model, it is important to check its assumptions and to conduct a robustness study to detect influential or extreme observations that can cause distortions to the results of the analysis. The first tool to perform sensitivity analysis is the global influence starting from case deletion (see, Cook, 1977). Case deletion is a common approach to study the effect of dropping the
Another approach is suggested by Cook (1986), where instead of removing observations, weights are given to them. This is a local influence approach. In survival analysis, several authors have investigated the assessment of local influence as, for instance, Pettit and Bin Daud (1989) and Weissfeld (1990), who investigated local influence in proportional hazard regressions. Escobar and Meeker (1992) adapted local influence methods to regression analysis with censoring, Weissfeld and Schneider (1990) compared local influence and deletion case for the Weibull regression. Besides, there are various papers exploring new distributions such as the followings. Ortega et al. (2003) considered the problem of assessing local influence in generalized log-gamma regression with censored observations, Silva et al. (2008) addressed local influence in the log-Burr regression with censored data, Ortega et al. (2010) investigated local influence for the generalized log-gamma regression with cure fraction, Hashimoto et al. (2012) discussed local influence for the log-Burr XII regression for grouped survival data and Hashimoto et al. (2013) studied local influence in the new Neyman type A beta Weibull regression. Further, Fachini et al. (2014) worked with local influence for the bivariate regression with cure fraction and Ortega et al. (2014) derived the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes for the odd Weibull regression.
Global influence
A first tool to perform sensitivity analysis, as stated before, is by means of global influence starting from case-deletion (see Cook, 1977; Cook et al., 1988). Case-deletion is a common approach to study the effect of dropping the
In the following, a quantity with subscript “(i)” means the original quantity with the
where
Another popular measure of the difference between
For the BMWM and LBMW regressiona, the log-likelihood functions are given in Sections 2.3 and 3.2, respectively. The index plots of GD and LD can be used to assess influence of the
Local influence
Another approach is suggested by Cook (1986), where instead of removing observations, weights are given to them. If likelihood displacement
Next, for three perturbation schemes, we obtain the matrix
by considering the log-likelihood Eqs (5) and (10) and the vector of weights
Case-weight perturbation
In this case, for the BMWM regression, the log-likelihood function has the form
and, for the LBMW regression, it becomes
where
Response perturbation
For the BMWM regression, each
Explanatory variable perturbation
Consider now an additive perturbation on a particular continuous explanatory va-riable, say
Residual analysis
The assessment of the fitted regression is an important part of data analysis, particularly with censored data, and especially plots of the residuals play a central role in the checking of statistical regressions. By examining the residuals, we can detect the presence of outlying observations, the absence of components in the systematic part of the regression and departures from the error and variance assumptions.
We use randomized qrs (Dunn & Smyth, 1996) given by
where
The BMWM regression with cure rate
where
The LBMW regression for censored data
where
For a right censored continuous response,
In this section, we prove the flexibility and applicability of the proposed regressions by means of two real data sets.
Application 1: The BMWM regression
In the first example, we fit the BMWM regression to a data set on cutaneous melanoma described and analyzed by Ibrahim et al. (2001) and Mizoi et al. (2007). The data set was from phase III of the melanoma clinical trial (E1690) conducted by Eastern Cooperative Oncology Group. The objective of the study was to compare a treatment with high-dose of interferon against observations from a control group. Relapse-free survival (years) was the outcome of interest, which was defined as the time from randomization to progression of tumor or death. There are
Figure 1 displays the estimated survival curves for interferon group and the control group. An obvious plateau can be noted after about a 5 years’ follow-up, which offers empirical evidence for a cure possibility in E1690.
Kaplan-meier curves of high-dose interferon and observation groups in E1690.
First, we consider the BMWM regression with long-term survivors given in Eq. (4) with all explanatory variables, i.e.
MLEs for the parameters of the BMWM regression with long-term survivors fitted to the cutaneous melanoma data
We determine the MLEs of the parameters using the NLMixed procedure in SAS. Table 1 lists these estimates for the BMWM regression. The figures in Table 1 reveal that the variable nodule is significant (at 5% level) for the cure fraction. A summary of the AIC, AICc and GD values to compare the BMWM, MWM and WM regressions with long-term survivors is reported in Table 2. The BMWM regression gives the best fit according to these statistics.
Some statistics for comparing regression models with long-term survivors
An analysis under the BMWM regression with long-term survivors provides a check on the appropriateness of the MWM and WM sub-models and then indicates the extent for which inferences depend upon the regression. For example, the LR statistic for testing the hypotheses
The worm plots (wp function in R) and qqnorm of the qrs in Fig. 2 reveal that the BMWM regression is appropriate to these data.
(a) worm plots and (b) qqnorm of the qrs for the cutaneous melanoma data.
In summary, we recommend the BMWM regression based on the analysis above. Thus, all subsequent analyzes are based on this regression. In order to detect possible outlying observations, we use the Ox computational program to compute the case-deletion measures
MLEs for the BMWM regression fitted to the cutaneous melanoma data.
(a) index plot of 
Next, we perform an analysis of local influence for the data using the BMWM regression. Local influence figures are omitted.
The cases
Further, we turn to a simplified model retaining only nodule category as an explanatory variable. The estimates for the BMWM regression with long-term survivors fitted to the cutaneous melanoma data are listed in Table 3, where the only significant variable is
The estimated mean cure fraction is
In order to assess if the final fitted regression is appropriate, Fig. 4a displays the empirical survival function and the estimated marginal survival functions given by Eq. (3). Also, the estimated survival functions and the Kaplan-Meier estimate of the cure rate patients stratified by nodule category are displayed in Fig. 4b, from which we verify significant fractions of survivors for all nodule categories. It can be checked that the BMWM regression with long-term survivors provides a good fit to these data. Also, no observation appears as a possible outlier.
MLEs of the parameters for the LBMW and LMW regressions and the corresponding SEs for the complete lung cancer data
(a) kaplan-meier curves (solid lines), the estimated BMWM survival functions and the estimate of the cure fraction for the cutaneous melanoma data and (b) Kaplan-Meier curves (solid lines) and estimated BMWM survival functions stratified by nodule category (1–4, from top to bottom).
As a second example, we consider the data from the Veterans Administration lung cancer trial given in Prentice (1973) and reported in Kalbfleisch & Prentice (2002). These data considered males with advanced inoperable lung cancer that received chemotherapy. The survival time is the time from the start of the treatment. The main purpose of the study was to compare the effects of two chemotherapy treatments in prolonging survival time. The explanatory variables in the study are: performance status at diagnosis (
Kalbfleisch and Prentice (2002) fitted a generalized F regression to analyze these data and used only the most important covariates: tumor type and performance status (PS). Lawless (2003) and Silva et al. (2009) used only one of the subgroup of patients- no priori therapy or priori therapy- and some covariates. Following Lawless (2003), we center only the explanatory variables
where
We fit the LBMW and LMW regressions to the current data. Table 4 gives the estimates of the parameters and their standard errors (SEs) for the two regressions. The required numerical evaluations are implemented in a Quasi-Newton algorithm through the MaxBFGS sub-routine in Ox (see, Doornik, 2007). The MLEs of the
AIC, AICc and GD statistics for comparing the fitted LBMW and LMW regressions
The plots comparing the empirical survival function and the fitted survival functions of the BMW and MW models are displayed in Fig. 5. These plots indicate that both corresponding regressions provide satisfactory fits. However, the BMW regression presents a better fit to these data.
Empirical and estimated survival functions for the BMW and MW models for lung cancer data.
Further, we select the best fitted regression based on the AIC, AICc and GD values. A summary of these values to compare the LBMW and LMW regressions are given in Table 5. These results indicate that the LBMW regression is more appropriate for fitting these data. In addition, the LR statistic for testing the hypothesis
The worm plots (wp function in R) and qqnorm of the qrs in Fig. 6 for the fitted LBMW regression reveal that it is acceptable.
In summary, we recommend the LBMW regression based on the analysis above. Thus, all subsequent analyzes are based on this regression. In order to detect possible outlying observations, we use Ox to compute the case-deletion measures
MLEs for the LBMW regression fitted to the cancer data
(a) worm plots and (b) qqnorm of the qrs on the complete lung cancer data.
(a) index plot of 
Further, we perform an analysis of local influence for the data using the LBMW regression. Local influence figures are omitted. The observations
So, the final regression has the form
The MLEs of the parameters in the final fitted regression are given in Table 6. In summary, we recommend the LBMW regression to fit these data on the basis of the above analysis. We interpret the estimated coefficients of the regression as follows: the expected survival time should increase approximately
In this paper, the log-beta modified Weibull and the beta modified Weibull mixture regressions with the presence of censored data are proposed as good alternatives to model survival lifetime data. We use the Quasi-Newton algorithm to calculate the maximum likelihood estimates. We adopt asymptotic likelihood ratio statistics for testing the model parameters. We discuss the local influence theory for the parameter estimates. So, we propose general attractive regressions for modeling censored and uncensored lifetime data. We prove empirically good adjustment of both regressions through sensitivity analysis in two applications to real data sets. We hope these generalizations may attract wider applications in survival analysis.
Footnotes
Acknowledgments
Special thanks to Fundação de Amparo à Pesquisa da Bahia- FAPESB and CNPq.
