Frailty models are used in the survival analysis to account for the unobserved heterogeneity in individual risks to disease and death. To analyze the bivariate data on related survival times (e.g. matched pairs experiments, twin or family data), the shared frailty models were suggested. Shared frailty models are used despite their limitations. To overcome their disadvantages correlated frailty models may be used. In this paper, we introduce the correlated compound geometric frailty models based on reversed hazard rate with three different baseline distributions namely, the generalized log-logistic type I, the generalized log-logistic type II and the modified inverse Weibull. We introduce the Bayesian estimation procedure using Markov Chain Monte Carlo (MCMC) technique to estimate the parameters involved in these models. We present a simulation study to compare the true values of the parameters with the estimated values. We also apply the proposed models to the Australian twin data set and a better model is suggested.
Frailty models (Vaupel et al., 1979) are extensively used in the survival analysis to account for the unobserved heterogeneity in individual risks to disease and death. The frailty model is a random effect model for time to event data which is an extension of the Cox’s proportional hazards model. To analyze the bivariate data on related survival times (e.g. matched pairs experiments, twin or family data), the shared frailty models were suggested. Bivariate survival data arises whenever each study subjects experience two events. Particular examples include failure times of paired human organs, (e.g. kidneys, eyes, lungs, breasts, etc.) and the first and the second occurrences of a given disease. In the medical literature, several authors considered paired organs (McGilchrist & Aisbett, 1991; Huster et al., 1989) of an individual as a two-component system, which work under interdependency circumstances. In industrial applications, these data may come from systems whose survival depend on the survival of two similar components.
Research on the bivariate survival models has grown rapidly several years in the past. Clayton’s (1978) random effect model of the bivariate survival was a key innovation. He introduced the notion of the shared relative risk. This model was further developed by Oakes (1982) to analyze the association between two non-negative random variables. Clayton and Cuzick (1985) added observed covariates to the bivariate survival model with the shared relative risk. Crowder (1985) and Hougaard (1986) proposed the random effect models of the bivariate Weibull distributions. A shared frailty model with a positive stable distribution of frailty was suggested by Hougaard (2000). He also discussed several other bivariate distributions with biomedical and reliability applications.
Hanagal and Bhambure (2014b) and Hanagal and Kamble (2014) developed positive stable frailty models to analyze kidney infection data. Hanagal and Bhambure (2017b) obtained positive stable frailty model to analyze Australian twin data. Hanagal (2011, 2017, 2019) gave extensive literature review on different shared frailty models.
Shared frailty explains correlations between subjects within clusters. However, it does have some limitations. Firstly, it forces the unobserved factors to be the same within the cluster, which may not always reflect reality. For example, at times it may be inappropriate to assume that all partners in a cluster share all their unobserved risk factors. Secondly, the dependence between survival times within the cluster is based on marginal distributions of survival times. However, when covariates are present in a proportional hazards model with gamma distributed frailty the dependence parameter and the population heterogeneity are confounded (Clayton and Cuzick, 1985). This implies that the joint distribution can be identified from the marginal distributions (Hougaard, 1986, 2000). Thirdly, in most cases, a one-dimensional frailty can only induce positive association within the cluster. However, there are some situations in which the survival times for subjects within the same cluster are negatively associated. For example, in the Stanford Heart Transplantation Study, generally the longer an individual must wait for an available heart, the shorter he or she is likely to survive after the transplantation. Therefore, the waiting time and the survival time afterwards may be negatively associated.
To avoid these limitations, correlated frailty models are being developed for the analysis of multivariate failure time data, in which associated random variables are used to characterize the frailty effect for each cluster. Correlated frailty models provide not only variance parameters of the frailties as in shared frailty models, but they also contain additional parameter for modeling the correlation between frailties in each group. Frequently one is interested in construction of a bivariate extension of some univariate family distributions (e.g., gamma). For example, for the purpose of genetic analysis of frailty one might be interested in estimation of correlation of frailty. It turns out that it is possible to carry out such extension for the class of infinitely-divisible distributions (Iachine 1995a, 1995b). In this case an additional parameter representing the correlation coefficient of the bivariate frailty distribution is introduced.
Correlated frailty based on reversed hazard
In many practical situations reversed hazard rate (RHR) is more appropriate to analyze the survival data. Reversed hazard rate was proposed as a dual to the hazard rate by Barlow et al. (1963). Shaked and Shantikumar (1994) and Block et al. (1998) provided a general definition of reversed hazard rate (RHR) as,
The reversed hazard rate specifies the instantaneous rate of death or failure at time , given that it failed before time . Thus in a small interval, is the approximate probability of failure in the interval, given failure by the end of the interval . Hanagal and Bhambure (2014a, 2017a, 2017b) analyzed Australian twin data using shared gamma and inverse Gaussian frailty models based on reversed hazard rates. Hanagal and Pandey (2014, 2015a, 2015b, 2015c, 2015d, 2015e, 2016a, 2016b, 2017a, 2017b) and Hanagal et al. (2017a) developed shared gamma and inverse Gaussian frailty models based on reversed hazard rates to analyze Australian twin data using different baseline distributions.
Aalen (1992) considered a compound Poisson distribution as a mixture distribution in survival analysis. Also Aalen and Tretli (1999), Moger and Aalen (2005), Hanagal (2010a, 2010b, 2010c, 2010d) have considered compound Poisson frailty models. Hanagal and Dabade (2012) and Hanagal and Kamble (2015) developed compound Poisson frailty models to analyze kidney infection data. Even though the compound Poisson distribution has many attractive properties in non-susceptible or zero susceptibility type of data a convenient frailty distribution. Some other examples of zero susceptibility type of data if then we can interpret as aggregate heterogeneity due to failures before we get first success can be given as, in case of marriage data, may represent heterogeneity due to difficulties in finding a marriageable partner before an individual meet first suitable partner. In case of fertility, may be heterogeneity due to miss-carriages observed or unable to conceive a child with a couple before they have their first child. Some mothers would like to deliver babies until she delivers a baby boy. Politicians go on contesting elections until they win. Individuals go on changing the jobs until he/she get a suitable job. In such situations compound geometric distribution is a suitable choice of distribution for variate N
Some individuals have zero susceptibility type of data if so that they will get success at the first attempt without any failures. For instance, some people get suitable life partner at the first attempt. In case fertility, no miss-carriages observed. Some mothers able deliver a baby boy as their first child. Politicians win the election at the very first attempt. in unemployment data, individuals get a suitable job at the very first attempt. Students pass the exam at very first attempt. In such type of data, compound distribution having some positive mass at zero value can be a suitable choice. Compound geometric distribution is conveniently used in the literature, since it has an explicit Laplace transform and it deals with the feature that some people may have zero susceptibility. Also Hanagal and Dabade (2013b) and Hanagal and Kamble (2016) developed compound geometric frailty models to analyze kidney infection data.
The use of the compound geometric distribution for is not only mathematically convenient, but might also be seen as natural in a more substantial sense. The distribution arises as a sum of a random number of independent gamma variables, where the number of terms in the sum is geometric distributed. This might be viewed random number of failures until an individual gets succuss.
A random variable is said to have the compound geometric distribution if is given by
where is geometric distributed with probability of sucess , while are independent and identically gamma distributed random variables with scale parameter and shape parameter .
The distribution of consists of two parts; a discrete part which corresponds to the probability of zero susceptibility, and a continuous part on the positive real line. The discrete part is,
where is the probability of success.
We know that, if are independent and identically distributed gamma variates with scale parameter and shape parameter then for given , is also gamma variate with same scale parameter and shape parameter having density function,
Also is a geometric variable with probability mass function,
Then the distribution of the continuous part of compound geometric distribution can be found by conditioning on as,
Thus density function of compound geometric variate is given by,
The parameter set for the compound geometric distribution is .
The Laplace transform of gamma distribution is given by,
and Laplace transform of geometric distribution is,
If is the Laplace transform of then,
where and are Laplace transform of gamma distribution and geometric distribution respectively. substituting the Laplace transforms of and in the above equation we get the Laplace transform of compound geometric distribution is given by,
The mean and variance of are given by
For identifiability condition, , from Eq. (4) we have restriction placed on the parameters is,
therefore under the above restriction the variance of is given by, and Laplace transform is,
The correlated frailty model is the important concept in the area of multivariate frailty models. It is a natural extension of the shared frailty approach on the one hand, and of the univariate frailty model on the other. The conditional distribution function in the bivariate case ( without observed covariates) is
where and are two correlated frailties and is cumulative reversed hazard rate. The distribution of the random vector needs to be specified and determines the association structure of the event times in the model.
We are assuming that the frailties are acting multiplicatively on the baseline hazard function (proportional hazards model) and that the observations in a pair are conditionally independent, given the frailties. Hence, the reversed hazard of the -th individual of the -th pair has the form
where denotes age or time, is a vector of observed covariates, is a vector of regression parameters describing the effect of the covariates , are baseline reversed hazard functions, and are frailties. Bivariate correlated frailty models are characterized by the joint distribution of a two-dimensional vector of frailties . If the two frailties are independent, the resulting lifetimes are independent, and no clustering is present in the model. If the two frailties are equal, the shared frailty model is obtained as a special case of the correlated frailty model with correlation one between the frailties (Wienke, 2011).
In order to derive a marginal likelihood function, the assumption of conditional independence of lifespans, given the frailty, is used. Let be a censoring indicator for individual in pair . Indicator is 1 if the individual has experienced the event of interest, and 0 otherwise. According to Eq. (13), the conditional distribution function of the th individual in the th pair is
with denoting the cumulative baseline hazard function. Here and in the following, is used as a generic symbol for a distribution function. The contribution of individual in pair to the conditional likelihood is given by
where stands for observation time of individual from pair . Assuming the conditional independence of life spans, given the frailty, and integrating out the frailty, we obtain the marginal likelihood function
where is the probability density function of the corresponding frailty distribution. All these formulas can be easily extended to the multivariate case, but need a specification of the correlation structure between individuals in a cluster in terms of the multivariate density function, which complicates analysis. Hanagal et al.(2017b) discussed correlated gamma frailty models for bivariate survival data to analyze kidney infection data and Hanagal and Pandey (2017a) proposed correlated gamma frailty models for bivariate survival data based on reversed hazard rate for Australian twin data. Hanagal and Pandey (2020) and Hanagal (2021) proposed correlated inverse Gaussian and Correlated positive stable frailty models to analyze kidney infection data. Hanagal (2022a, 2022b) proposed correlated inverse Gaussian and correlated positive stable frailty models for bivariate survival data based on reversed hazard rate to analyze Australian twin data. For more details see (Hanagal(2011, 2019) and Wienke(2011)).
Correlated compound poisson frailty model
Let be an infinitely divisible frailty variable with Laplace transformation and , then there exist random variables each with univariate Laplace transform such that the Laplace transform of is given by:
If has a variance the .
The respective bivariate survival model is identifiable under mild regularity conditions on provided that . The case is known as the shared frailty model.
The above Eq. (12) can be extended to multivariate case () as below.
The case leads to shared frailty. If , are mutually independent.
Let be the compound geometric distributed with parameters and , and the Laplace transform of univariate compound geometric frailty using Eq. (6) is
Using the bivariate Laplace transform given in Eq. (12), we obtain bivariate distribution function with correlated compound geometric frailty in terms of cumulative reversed hazard rates using Eq. (6) as follows.
where .
The correlated frailty model with compound geometric frailty distribution in the presence of covariates is characterized by the bivariate distribution function of the form:
where and are the cumulative baseline reversed hazard rate functions of the life time random variables and respectively and .
The bivariate distribution in the presence of covariates, when the frailty variable is degenerate is given by
According to different assumptions on the baseline distributions we get different correlated compound geometric frailty models.
Baseline distributions
We present the modified inverse Weibull distribution, generalized log-logistic type I and generalized log-logistic type II as baseline distribution with the interesting properties.
Modified inverse Weibull distribution
The modified inverse Weibull distribution is more convenient for computational point of view for left censored data. Kumar and Singh (2011) obtained recurrence relations for single and product moments of lower record values from modified inverse Weibull distribution. The cumulative distribution function, the reversed hazard rate and the cumulative reversed hazard rate of the modified inverse Weibull (MIW) are respectively as follows.
When , this distribution reduces to the inverse Weibull distribution. The reversed hazard rate of the modified inverse Weibull distribution is decreasing function of .
Generalized log-logistic type-I distribution
The cumulative distribution function of the generalized log-logistic type-I distribution (GLL-I) is
The corresponding reversed hazard rate and cumulative reversed hazard rate are respectively as follows.
Generalized log-logistic Type-II distribution
The cumulative distribution function of the generalized log-logistic type-II distribution (GLL-II) is
The corresponding reversed hazard rate and cumulative reversed hazard rate are respectively as follows.
When , this distribution reduces to log-logistic distribution. The reversed hazard rate of the generalized log-logistic distribution is decreasing function of .
Proposed models
Substituting cumulative reversed hazard function for the modified inverse Weibull baseline distribution, generalized log-logistic type I and generalized log-logistic type II in the Eqs (14) and (15), we get following six models.
Here onwards we call Eqs (25), (26), (27), (28), (29), and (30) as Model I, Model II, Model III, Model IV, Model V and Model VI respectively. Model I and Model II are the modified inverse Weibull (MIW) baseline distribution with and without correlated compound geometric (CCG) frailty, Model III and Model IV are the generalized log-logistic baseline distribution type I (GLL-I) with and without correlated compound geometric (CCG) and likewise Model V and Model VI are the baseline with generalized log-logistic baseline distribution type II (GLL-II) with and without correlated compound geometric (CCG) frailty.
Likelihood specification and bayesian estimation of parameters
Suppose there are individuals under study, whose first and second observed failure times are represented by (). Let and be the observed censoring times for the individual () for the first and the second recurrence times respectively. We use the left censoring scheme. We assume that the censoring time and the lifetimes of an individual are independently distributed.
The contribution of the bivariate lifetime random variable of the individual to likelihood function is given by,
and likelihood function is,
where , and are respectively the frailty parameter , the vector of baseline parameters and the vector of regression coefficients respectively. For without frailty model, likelihood function is .
For without frailty model, the parameters , and are absent and in the frailty model, these parameters are present. The counts and be the numbers of individuals for which first and second failure times () lie in the ranges ; ; and respectively and
where and . Substituting cumulative reversed hazard rate , , reversed hazard rate , and distribution function for six proposed models into the last relations we get the likelihood functions with and without frailty models.
Unfortunately computing the maximum likelihood estimators (MLEs) involves solving a eleven dimensional optimization problem for Model I, Model III and Model V and nine dimensional optimization problem for Model II, Model IV and Model VI. As the method of maximum likelihood fails to estimate the parameters due to convergence problem in the iterative procedure, so we use the Bayesian approach. The traditional maximum likelihood approach to estimation is commonly used in survival analysis, but it can encounter difficulties with frailty models. Moreover, standard maximum likelihood based inference methods may not be suitable for small sample sizes or situations in which there is heavy censoring (see Kheiri et al. (2007)). Thus, in our problem a Bayesian approach, which does not suffer from these difficulties, is a natural one, even though it is relatively computationally intensive.
To estimate the parameters of the model, the Bayesian approach is now popularly used, because computation of the Bayesian analysis become feasible due to advances in computing technology. Several authors have discussed Bayesian approach for the estimation of parameters of the frailty models. Some of them are, Ibrahim et al.(2001) and references their in, Santos and Achcar (2010). Santos and Achcar (2010) considered parametric models with Weibull and generalized gamma distribution as baseline distributions and gamma, log-normal as frailty distributions. Ibrahim et al. (2001) and references therein considered Weibull model and piecewise exponential model with gamma frailty. They also considered positive stable frailty models.
To estimate the parameters of the model, we used Metropolis-Hastings algorithm and Gibbs sampler. We monitored the convergence of a Markov chain to a stationary distribution by Geweke test (Geweke, 1992) and Gelman-Rubin Statistics (Gelman & Rubin, 1992). Trace plots, coupling from the past plots and sample autocorrelation plots are used to check the behaviour of the chain, to decide burn-in period and autocorrelation lag respectively.
Simulation study
To evaluate the performance of the Bayesian estimation procedure we carry out a simulation study. For the simulation purpose we have considered only one covariate which we assume to follow binomial distribution. The frailty variable and are assumed to have positive stable distribution with known variance and correlation . Lifetimes () for individual are conditionally independent for given frailty = and = . We assume that follows one of the baseline distribution modified inverse Weibull distribution, Generalized log-logistic distribution type I and Generalized log-logistic distribution type II. As the Bayesian methods are time consuming, we generate only twenty five pairs of lifetimes which is our sample size.
A widely used prior for frailty parameter is the gamma distribution . For correlation parameter and zero susceptibility parameter , we use uniform distribution . In addition, we assume that the regression coefficients are normal with mean zero and large variance say 1000. Similar types of prior distributions are used in Ibrahim et al. (2001), Sahu et al. (1997) and Santos and Achcar (2010). So in our study we also use same non informative prior for frailty parameter , and regression coefficients . Since we do not have any prior information about baseline parameters, and , prior distributions are assumed to be flat. We consider two different non-informative prior distributions for baseline parameters, one is and another is . All the hyper-parameters and are known. Here is the gamma distribution with the shape parameter and the scale parameter and represents uniform distribution over the interval . We use different value of baseline parameters for Model I, Model III and Model V, details are given in Tables 1–3. We assume the value of the hyper-parameters as and .
Modified Inverse Weibull Baseline Distribution (Model I) with Correlated Compound Geometric Frailty (Simulation for Model I: Sample size 50)
Parameter (value)
Estimate
Standard error
Lower credible limit
Upper credible limit
Bayesian coverage probability
Geweke values
values
Gelman & rubin values
Burn in period 2500; autocorrelation lag 250
(2.0)
1.956
0.114
1.749
2.235
0.945
0.007
0.498
0.998
(1.5)
1.535
0.097
1.465
1.546
0.944
0.008
0.497
0.998
(2.5)
2.546
0.125
2.264
2.764
0.952
0.007
0.501
1.002
(2.2)
2.224
0.047
2.134
2.265
0.955
0.003
0.497
1.002
(2.5)
2.534
0.017
2.458
2.554
0.944
0.010
0.493
0.987
(3.0)
3.067
0.136
2.774
3.275
0.945
0.008
0.498
0.998
(0.5)
0.486
0.067
0.477
0.525
0.948
0.007
0.501
1.006
(0.5)
0.475
0.062
0.476
0.532
0.945
0.007
0.501
1.007
(0.7)
0.732
0.033
0.657
0.751
0.947
0.006
0.494
0.995
(0.50)
0.536
0.035
0.426
0.571
0.946
0.008
0.498
0.998
Generalized Logistic Baseline Distribution-Type I (Model-III) with Correlated Compound Geometric Frailty (Simulation for Model III: Sample size 50)
Parameter (value)
Estimate
Standard error
Lower credible limit
Upper credible limit
Bayesian coverage probability
Geweke values
values
Gelman & rubin values
Burn in period 7000; autocorrelation lag 350
(2.0)
1.965
0.037
1.735
2.253
0.947
0.003
0.503
1.001
(1.5)
1.536
0.028
1.488
1.557
0.945
0.003
0.502
1.001
(2.5)
2.475
0.086
2.439
2.527
0.953
0.011
0.496
0.998
(2.2)
2.224
0.058
2.158
2.296
0.954
0.007
0.504
1.012
(2.5)
2.556
0.057
2.483
2.532
0.946
0.007
0.504
1.006
(2.5)
2.447
0.113
2.235
2.662
0.945
0.007
0.497
0.996
(0.5)
0.523
0.017
0.493
0.547
0.945
0.011
0.503
1.006
(0.5)
0.526
0.018
0.485
0.527
0.953
0.013
0.503
1.006
(0.7)
0.756
0.084
0.726
0.795
0.943
0.008
0.504
1.005
(0.5)
0.475
0.016
0.458
0.519
0.953
0.021
0.494
0.996
Generalized Logistic Type II Baseline Distribution (Model-V) with Correlated Compound Geometric Frailty (Simulation for Model V: Sample size 50)
Parameter (value)
Estimate
Standard error
Lower credible limit
Upper credible limit
Bayesian coverage probability
Geweke values
values
Gelman & rubin values
Burn in period 7000; autocorrelation lag 250
(2.0)
2.243
0.155
1.754
2.236
0.953
0.006
0.494
0.993
(1.5)
1.521
0.026
1.475
1.535
0.945
0.012
0.504
1.003
(3.5)
3.523
0.126
3.375
3.637
0.946
0.008
0.502
1.003
(2.2)
2.181
0.122
1.964
2.335
0.954
0.014
0.495
0.997
(2.5)
2.532
0.028
2.425
2.624
0.954
0.009
0.502
1.002
(3.5)
3.476
0.158
3.427
3.613
0.945
0.004
0.497
1.007
(0.5)
0.524
0.017
0.496
0.546
0.946
0.008
0.497
1.003
(0.5)
0.523
0.017
0.475
0.567
0.954
0.0067
0.497
1.002
(0.7)
0.726
0.112
0.621
0.766
0.946
0.006
0.497
1.007
(0.5)
0.475
0.048
0.443
0.524
0.953
0.023
0.497
1.002
We run two parallel chains for model one using two sets of prior distributions with the different starting points using Metropolis-Hastings algorithm and Gibbs sampler based on normal transition kernels. We iterate both the chains for 100000 times. There is no effect of prior distribution on posterior summaries because the estimates of parameters are nearly the same and the convergence rate of Gibbs sampler for both the prior sets is almost the same. Also for both the chains the results were somewhat similar. For all models, the trace plots, the coupling from the past plots, the running mean plots and the sample autocorrelation plots for the simulation study are not provided due to lack of space. Tables 1–3 presents the estimates, the credible intervals, Bayesian coverage probabilities for 95% confidence interval of all the parameters for the Model I, Model III and Model V based on the simulation study. These also contains the Gelman-Rubin (Gelman & Rubin, 1992) convergence statistic and the Geweke test (Geweke, 1992) for all the parameters of the Model I, Model III and Model V based on the simulation study. The Gelman-Rubin convergence statistic values are nearly equal to one and also the Geweke test values are quite small and the corresponding -values are large enough to say that the chain attains stationary distribution. Simulated values of the parameters have the autocorrelation of lag . So that every iteration is selected as a sample from the posterior distribution.
Analysis of australian twin data
Now we apply the all six models to the Australian twin data given in Duffy et a1. (1990). The data consists of six zygote categories. We consider the subset of the data with zygote category 4. The data consists of males gender only and consist of 350 pair of twins with 9 and 11 censored in twin 1 and twin 2 respectively. An individual having age at onset less than 11 are considered as left censored observations. The data has information on the age at appendectomy of twins. The genetic effect involved in the risk of appendectomy is the frailty variable. Here there is a common covariate age for both and and one covariate each for , i.e., presence or absence of appendectomy. To check goodness of fit of Australian twin data set, We obtain Kolmogorov-Smirnov(K-S) statistics and their -values for and . For Model I, Model III and Model V -values of observe that -values for Kolmogorov-Smirnov(K-S) statistics are provided in Table 4. Thus from p values of K-S test are quite high. We can say that there is no statistical evidence to the reject the hypothesis that data are from these three models.
-values of K-S statistics for goodness of fit test for Australian Twin Data
Recurrence time
Model
Distribution
First
Second
Model I
CCG-MIW
0.645
0.678
Model III
CCG-GLL-I
0.885
0.826
Model V
CCG-GLL-II
0.988
0.989
Posterior Summary for australian twin data: Model-I (Modified Inverse Weibull Baseline with Frailty)
Parameter
Estimate
Standard error
Lower credible limit
Upper credible limit
Geweke values
values
Gelman & rubin values
Burn in period 7500; autocorrelation lag 1300
35.754
2.547
32.624
39.563
0.017
0.503
1.004
0.654
0.056
0.603
0.702
0.016
0.497
0.993
0.432
0.024
0.405
0.467
0.025
0.497
0.996
32.646
2.628
29.437
35.523
0.085
0.503
1.005
0.553
0.026
0.514
0.598
0.026
0.503
1.006
0.354
0.013
0304
0.406
0.023
0.498
0.997
0.716
0.047
0.658
0.758
0.026
0.496
0.995
0.847
0.074
0.794
0.895
0.017
0.497
0.996
0.164
0.062
0.125
0.214
0.016
0.496
0.995
0.042
0.004
0.026
0.057
0.015
0.495
0.995
0.054
0.032
0.096
0.034
0.065
0.495
0.994
0.036
0.043
0.057
0.025
0.066
0.494
0.994
Posterior Summary for Australian Twin Data: Model-II (modified Inverse Weibull Baseline Without Frailty)
Parameter
Estimate
Standard error
Lower credible limit
Upper credible limit
Geweke values
values
Gelman & rubin values
Burn in period 12000; autocorrelation lag 180
11.449
0.555
10.470
12.344
0.003
0.498
1.007
0.069
0.011
0.049
0.087
0.017
0.506
1.004
0.102
0.003
0.094
0.109
0.001
0.499
1.006
10.439
0.527
9.447
11.341
0.004
0.498
1.000
0.071
0.010
0.051
0.088
0.007
0.497
1.003
0.099
0.003
0.091
0.106
0.000
0.500
1.000
0.005
0.002
0.001
0.009
0.003
0.501
1.000
0.016
0.071
0.040
0.029
0.007
0.497
1.000
0.063
0.123
0.176
0.292
0.004
0.501
1.000
Posterior Summary for Australian Twin Data: Model-III (Generalized Log-logistic Type-I Baseline with Frailty)
Parameter
Estimate
Standard error
Lower credible limit
Upper credible limit
Geweke values
values
Gelman & rubin values
Burn in period 7500; autocorrelation lag 1100
12.357
0.257
10.653
14.763
0.032
0.502
1.003
0.246
0.037
0.136
0.302
0.028
0.493
0.994
10.357
0.247
8.865
12.476
0.057
0.494
0.995
10.457
0.356
8.573
12.594
0.054
0.502
1.003
0.326
0.047
0.025
0.374
0.023
0.497
0.998
9.856
0.265
7.547
11.648
0.062
0.495
0.996
0.657
0.027
0.627
0.688
0.038
0.503
1.003
0.854
0.132
0.807
0.906
0.073
0.495
0.995
0.156
0.134
0.124
0.186
0.075
0.495
0.996
0.624
0.047
0.567
0.702
0.027
0.506
1.007
0.135
0.026
0.185
0.105
0.042
0.502
1.003
0.021
0.036
0.046
0.032
0.013
0.503
1.004
As in case of simulation, here also we assume the same set of prior distributions. We run two parallel chains for all models using two sets of prior distributions with the different starting points using the Metropolis-Hastings algorithm and the Gibbs sampler based on normal transition kernels. We iterate both the chains for 100000 times. As seen in simulation study here also we got nearly same estimates of parameters for both the set of priors, so estimates are not dependent on the different prior distributions. Convergence rates of Gibbs sampler for both the prior sets are almost the same. Also both the chains show somewhat similar results, so we present here the analysis for only one chain with as prior for baseline parameters and as the prior for the frailty parameter . Burn in period is decided by using coupling from the past plot. However, a sequence of draws after burn-in period may have autocorrelation. Because of autocorrelation consecutive draws may not be random, but values at widely separated time points are approximately independent. So, a pseudo random sample from the posterior distribution can be found by taking values from a single run of the Markov chain at widely spaced time points (autocorrelation lag) after burn-in period. The autocorrelation of parameters become almost negligible after the certain lag.
Posterior Summary for Australian Twin Data: Model-IV (Generalized Log-logistic Type-I Baseline without Frailty)
Parameter
Estimate
Standard error
Lower credible limit
Upper credible limit
Geweke values
values
Gelman & rubin values
Burn in period 6500; autocorrelation lag 200
1.480
0.217
1.082
1.879
0.007
0.497
1.009
0.047
0.003
0.041
0.053
0.008
0.496
1.000
1.302
0.151
1.194
1.439
0.006
0.502
1.001
0.045
0.002
0.013
0.056
0.009
0.496
1.009
3.626
0.195
3.540
3.751
0.009
0.496
1.008
3.613
0.168
3.258
3.992
0.010
0.504
1.001
0.005
0.002
0.000
0.011
0.001
0.499
1.000
0.002
0.002
0.004
0.004
0.000
0.500
1.005
0.059
0.128
0.201
0.291
0.000
0.500
1.013
Posterior Summary for Australian Twin Data: Model-V (Generalized Log-logistic Type-II Baseline with Frailty)
Parameter
Estimate
Standard error
Lower credible limit
Upper credible limit
Geweke values
values
Gelman & rubin values
Burn in period 6500; autocorrelation lag 300
0.524
0.063
0.467
0.586
0.024
0.501
1.002
0.254
0.043
0.204
0.308
0.028
0.502
1.002
37.367
1.342
34.463
40.474
0.085
0.502
1.003
0.342
0.048
0.295
0.393
0.059
0.502
1.003
0.265
0.032
0.226
0.307
0.014
0.503
1.004
39.743
1.658
36.643
42.673
0.043
0.503
1.004
0.723
0.086
0.684
0.764
0.058
0.496
0.997
0.868
0.055
0.835
0.905
0.026
0.502
1.002
0.142
0.053
0.104
0.186
0.027
0.501
1.002
0.264
0.054
0.228
0.306
0.025
0.501
1.002
0.086
0.024
0.108
0.057
0.007
0.502
1.003
0.068
0.028
0.097
0.034
0.053
0.497
0.998
Posterior Summary for Australian Twin Data: Model-VI (Generalized Log-logistic Type-II Baseline without Frailty)
Parameter
Estimate
Standard error
Lower credible limit
Upper credible limit
Geweke values
values
Gelman & rubin values
Burn in period 6500; autocorrelation lag 200
0.604
0.102
0.420
0.834
0.011
0.504
1.00
0.047
0.002
0.042
0.052
0.006
0.497
1.03
5.189
0.596
4.119
6.437
0.011
0.495
1.00
0.673
0.088
0.504
0.841
0.016
0.506
1.00
0.046
0.002
0.040
0.052
0.008
0.496
1.00
4.733
0.445
4.545
4.941
0.013
0.494
1.01
0.004
0.004
0.004
0.011
0.004
0.501
1.00
0.001
0.023
0.044
0.045
0.012
0.504
1.00
0.048
0.122
0.198
0.283
0.005
0.497
1.01
The Gelman-Rubin convergence statistic values are nearly equal to one and the Geweke test statistic values are quite small and corresponding -values are large enough to say the chains attains stationary distribution. The posterior mean and standard error with 95% credible intervals for baseline parameters, frailty parameter and regression coefficients are presented in Tables 5–10. The posterior summery of the Model I, Model II, Model III, Model IV, Model V and Model VI are given in Tables 5–10. Tables 5–10 presents estimates, credible intervals, Geweke test and Gelman-Rubin statistics for all the parameters of the Model I, Model II, Model III, Model IV, Model V and Model VI respectively, based on data.
Comparison of AIC, BIC and DIC in Proposed Models and Existing Models
Model
Corr Frailty
Baseline
AIC
BIC
DIC
Model-I
Compound geometric
MIW
4864.42
4858.43
4842.56
Model-II
No Frailty
MIW
5384.16
5426.84
5375.31
Model-III
Compound geometric
GLL-I
4838.54
4830.84
4825.58
Model-IV
No Frailty
GLL-I
5355.81
5396.76
5351.89
Model-V
Compound geometric
GLL-II
4808.64
4802.57
4785.47
Model-VI
No Frailty
GLL-II
5781.33
5901.93
5935.09
Gamma
MIW
5355.41
5873.61
5343.78
Gamma
GLL-I
5078.68
5102.25
5067.80
Gamma
GLL-II
5046.70
5070.91
5037.08
Inverse Gaussian
MIW
5155.71
5188.81
5113.98
Inverse Gaussian
GLL-I
5071.70
5082.21
5057.81
Inverse Gaussian
GLL-II
5016.70
5018.91
5003.09
Positive Stable
MIW
5114.63
5127.18
5110.41
Positive Stable
GLL-I
5031.12
5028.33
5019.78
Positive Stable
GLL-II
4933.72
4948.61
4921.54
Compound Poisson
MIW
4952.32
4963.43
4938.52
Compound Poisson
GLL-I
4935.24
4929.42
4925.66
Compound Poisson
GLL-II
4911.54
4915.23
4891.21
Bayes factor values and decision for models fitted to Australian twin data
Numerator model against denominator model
Bayes factor
Range
Evidence against model in denominator
Model I against Model II
51.48
Very strong
Model III against Model IV
312.46
Very strong
Model V against Model VI
2112.58
Very strong
Model V against Model I
375.75
Very strong
Model V against Model III
56.85
Very strong
Model III against Model I
324.56
Very strong
Where , is posterior likelihood of Model-i.
For Model I, Model III and Model V the estimates of the variance of the frailty variable, (for Model-I, , for Model-III, and for Model-V) from compound geometric frailty models show that there is a high degree of heterogeneity or frailty present between the pairs of twins and the models with frailty are proper choice for fitting the data. The correlation coefficient between and are , and for Model-I, Model-III and Model-V respectively which shows the two frailty variables are correlated. Bayes factor for Model I against Model II is , for Model III against Model IV is and Model V against Model VI is . This is also a Bayesian test based on Bayes factor for testing against and which supports alternative hypothesis, i.e., models with frailty fits better. The credible interval of regression coefficient does not contain zero for all models except, Model VI. The credible interval of regression coefficient contain zero for all models except, Model III and Model V. The credible interval of regression coefficient contain zero for all models. Hence age is the significant covariate for Model I, Model II, Model III, Model IV and Model V. The convergence rate of Gibbs sampling algorithm does not depend on these choices of prior distributions in our proposed model for Australian twin data. The Geweke test values are near to zero and corresponding -values are quite high and the Gelman-Rubin Statistics for all the parameters of all six models based on data are very close to one.
To compare six models we use three model selection criteria, Aikaike information criteria (AIC), Bayesian information criteria (BIC) and deviance information criteria (DIC) values which are given in Table 11 and Bayes factor in Table 12. The AIC, BIC and DIC values for Model V is least among all six models. On the basis of AIC, BIC and DIC values Model V is the best among all six models. Similarly the Bayes factor show that models with frailty (Model I, Model III and Model V) are better than the models without frailty and Model V, the correlated compound geometric frailty based on reversed hazard rate with generalised log-logistic type II baseline is the best model. We compare the AIC, BIC, and DIC values of the existing correlated gamma (CG) frailty models suggested by Hanagal and Pandey (2017b), correlated inverse Gaussian (CIG) frailty models by Hanagal (2022a), correlated positive stable (CPS) frailty models by Hanagal (2022b) and correlated compound Poisson (CCP) frailty models by Hanagal (2023) with the proposed correlated compound geometric (CCG) frailty models and observe that the correlated compound Poisson (CCP) frailty models perform better and more suitable for analyzing Australian twin data than the existing correlated frailty models.
Conclusions
The choice of the best model for Australian twin data is based on AIC, BIC, DIC and Bayes factor values. We found that Model V is a best Model on the basis of AIC, BIC, DIC and Bayes factor values. The age is the significant covariate for all models except Model IV. Correlated compound Poisson frailty models(Model I, Model III and Model V) are better than their baseline model. We also compare the proposed correlated compound geometric frailty models with correlated gamma frailty models suggested by Hanagal and Pandey (2017b), correlated inverse Gaussian frailty proposed by Hanagal (2019, 2022a), correlated positive stable frailty proposed by Hanagal (2022b), correlated compound Poisson frailty proposed by Hanagal (2023) and observe that correlated compound geometric frailty based on reversed hazard rate with generalized log-logistic type II baseline performs better and more suitable than the existing correlated frailty models proposed by Hanagal and Pandey (2017b) and Hanagal (2022a, 2022b, 2023) for analyzing Australian twin data set, with left censored observations. The methods discussed in this paper may be extended into other frailty models and correlated frailty models with different baseline distributions, using the Bayesian approach, provided the models fit to the data.
Footnotes
Acknowledgments
I thank the referee and the editorial board member for the valuable suggestions and comments which improved the earlier version of the manuscript.
References
1.
AalenO.O. (1992). Modelling heterogeneity in survival analysis by the compound Poisson distribution.Annals of Applied Probability, 2, 951-72.
2.
AalenO.O., & TretliS. (1999). Analyzing incidence of tests cancer by means of a frailty model.Cancer Causes Control, 10, 285-92.
3.
BarlowR.E.MarshalA.W., & ProschanF. (1963). Properties of the probability distribution with monotone hazard rate.Ann Math Statist, 34, 375-389.
4.
BlockH.W.SavitsT.H., & SinghH. (1998). On the reversed hazard rate function.Probability in the Engineering and Informational Sciences, 12, 69-90.
5.
ClaytonD.G. (1978). A model for association in bivariate life tables and its applications to epidemiological studies of familial tendency in chronic disease incidence.Biometrica, 65, 141-151.
6.
ClaytonD. G., & CuzickJ. (1985). Multivariate generalizations of the proportional hazards model (with discussion). Journal of Royal Statistical Society, Ser., A, 148, 82-117.
7.
CrowderM. (1985). A distributional model for repeated failure time measurements. Journal of Royal Statistical Society, Ser. B, 47, 447-452.
8.
DuffyD. L.MartinN. G., & MathewsJ. D. (1990). Appendectomy in Australian twins.Australian Journal of Human Genetics, 47(3), 590-92.
9.
GelmanA., & RubinD. B. (1992). A single series from the Gibbs sampler provides a false sense of security. In Bayesian Statistics 4 (BernardoJ. M.BergerJ.DawidA. P. and SmithA. F. M., eds.). Oxford. Oxford Univ. Press. pp. 625-632.
10.
GewekeJ. (1992). Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments. In Bayesian Statistics 4 (eds. BernardoJ.M.BergerJ.DawidA.P. and SmithA.F.M.), Oxford: Oxford University Press, pp. 169-193.
11.
HanagalD.D. (2010a). Correlated compound Poisoon frailty model for the bivariate survival data.International Journal of Statistics and Management Systems, 5, 127-40.
12.
HanagalD.D. (2010b). Modeling heterogeneity for bivariate survival data by compound Poisson distribution.Model Assisted Statistics and Applications, 5(1), 01-09.
13.
HanagalD.D. (2010c). Modeling heterogeneity for bivariate survival data by Weibull distribution.Statistical Papers, 51(4), 947-58.
14.
HanagalD.D. (2010d). Modeling heterogeneity for bivariate survival data by the compound Poisson distribution with random scale.Statistics and Probability Letters, 80, 1781-90.
15.
HanagalD. D. (2011). Modeling Survival Data Using Frailty Models. Chapman & Hall/CRC, Boca Rotan.
16.
HanagalD.D. (2017). Frailty Models in Public Health. Handbook of Statistics, 37(B), 209-247. Eds. Arni S. R. Srinivasa Rao, Saumyadipta Pyne, C. R. Rao. Elsevier Publishers; Amsterdam.
17.
HanagalD. D. (2019). Modeling Survival Data Using Frailty Models, Second Edition. Springer Nature, Singapore.
18.
HanagalD. D. (2021). Correlated positive stable frailty models. Communications in Statistics, Theory and Methods, 50(23), 5617-5633.
19.
HanagalD. D. (2022a). Correlated inverse Gaussian frailty models based on reversed hazard rate.Statistics and Applications, 20(1), 89-111.
20.
HanagalD. D. (2022b). Correlated positive stable frailty models based on reversed hazard rate.Statistics in Biosciences, 14, 42-65.
21.
HanagalD. D. (2023). Correlated compound Poisson frailty models based on reversed hazard rate. Communications in Statistics, Theory and Methods, doi: 10.1080/03610926.2022.2098336.
22.
HanagalD.D.BhambureS.M. (2014a). Shared inverse Gaussian frailty model based on reversed hazard rate for modeling Australian twin data.Journal of Indian Society for Probability and Statistics, 15, 9-37.
23.
HanagalD.D., & BhambureS.M. (2014b). Analysis of kidney infection data using positive stable frailty models.Advances in Reliability, 1, 21-39.
24.
HanagalD.D., & BhambureS.M. (2017a). Shared gamma frailty models based on reversed hazard rate for modeling Australian twin data. Communications in Statistics, Theory & Methods, 46(12), 5812-5826.
25.
HanagalD.D.BhambureS.M. (2017b). Modeling Australian twin data using shared positive stable frailty models based on reversed hazard rate. Communications in Statistics, Theory & Methods, 46(8), 3754-3771.
26.
HanagalD.D., & DabadeA.D. (2012). Modeling Hetrogeneity in bivariate survival data by compound Poisson distribution using Bayesian approach.International Journal of Statistics and Management Systems, 7(1-2), 36-84.
27.
HanagalD.D., & KambleA.T. (2014). Bayesian estimation in shared positive stable frailty models.Journal of Data Science, 13, 615-640.
28.
HanagalD.D., & PandeyA. (2014). Gamma shared frailty model based on reversed hazard rate for bivariate survival data.Statistics & Probability Letters, 88, 190-196.
29.
HanagalD.D., & KambleA.T. (2015). Bayesian estimation in shared compound Poisson frailty models.Journal of Reliability and Statistical Studies, 8(1), 159-180.
30.
HanagalD.D., & PandeyA. (2015a). Gamma frailty model based on reversed hazard rate. Communications in Statistics: Theory & Methods, 45(7), 2071-2088.
31.
HanagalD.D., & PandeyA. (2015b). Shared Frailty Models Based on Reversed Hazard Rate for Modeling Australian Twin Data. Indian Association for Product, Quality and Reliability Transactions, 40(1), 61-93.
32.
HanagalD.D., & PandeyA. (2015c). Modeling bivariate survival data based on reversed hazard rate.international Journal of Mathematical Modelling and Numerical Optimisation, 6(1), 72-199.
33.
HanagalD.D., & PandeyA. (2015d). Shared frailty models based on reversed hazard rate for modeling Australian twin data.Indian Association for Product, Quality and Reliability Transactions, 40(1), 61-93.
34.
Hanagal, D.D., & Pandey, A. (2015e). Inverse Gaussian shared frailty models with generalized exponential and generalized inverted exponential as baseline distributions. Journal of Data Science, 13(2), 569-602.
35.
HanagalD.D., & PandeyA. (2016a). Inverse Gaussian shared frailty models based on reversed hazard rate.Model Assisted Statistics and Applications, 11, 137-151.
36.
HanagalD.D., & PandeyA. (2016b). Gamma shared frailty model based on reversed hazard rate. Communications in Statistics: Theory & Methods, 45(7), 2071-2088.
37.
HanagalD.D., & PandeyA. (2017a). Shared frailty models based on reversed hazard rate for modified inverse Weibull distribution as a baseline distribution.Communications in Statistics, Theory and Methods, 46(1), 234-246.
38.
HanagalD. D., & , (2017b). Correlated Gamma Frailty Models for Bivariate Survival Data Based on Reversed Hazard Rate. International Journal of Data Science, 2(4), 301-324.
39.
HanagalD. D., & PandeyA. (2020). Correlated inverse Gaussian frailty models for bivariate survival data. Communications in Statistics.Theory and Methods, 49(4), 845-863.
40.
HanagalD.D.PandeyA., & SankaranP.G. (2017a). Shared frailty model based on reversed hazard rate for left censoring data.Communications in Statistics: Simulation and Computation, 46(1), 230-243.
41.
HanagalD. D.PandeyA., & GangulyA. (2017b). Correlated gamma frailty models for bivariate survival data. Communications in Statistics.Simulation and Computation, 46(5), 3627-3644.
42.
HougaardP. (1986). Survival models for heterogeneous populations derived from stable distributions.Biometrika, 73, 387-396.
43.
HougaardP. (2000). Analysis of Multivariate Survival Data. Springer: New York.
44.
HusterW.J.BrookmeyerR., & SelfS. G. (1989). Modelling paired survival data with covariates.Biometrics, 45, 145-156.
45.
IachineI.A. (1995a). Correlated frailty concept in the analysis of bivariate survival data. Bachelor project, Department of Mathematics and Computer Science, Odense University, Denmark.
46.
IachineI.A. (1995b). Parameter estimation in the bivariate correlated frailty model with observed covariates via the EM-algorithm. Working Paper Series: Population Studies of Aging 16, CHS, Odense University, Denmark.
47.
IbrahimJ.G., Ming-Hui C. and SinhaD. (2001). Bayesian Survival Analysis. Springer, Verlag.
48.
KheiriS.KimberA., & MeshkaniM. R. (2007). Bayesian analysis of an inverse Gaussian correlated frailty model.Computational Statistics and Data Analysis, 51, 5317-5326.
49.
KumarD., & SinghA. (2011). Recurrence relations for single and product moments of lower record values from modified inverse Weibull distribution.Gen Math Notes, 3(1), 26-31.
50.
McGilchristC.A., & AisbettC.W. (1991). Regression with frailty in survival analysis.Biometrics, 37, 461-466.
51.
OakesD. (1982). A model for association in bivariate survival data. Journal of Royal Statistical Society, B, 44, 414-422.
52.
SahuS.K.DeyD.K.AslanidouH., & SinhaD. (1997). A Weibull regression model with gamma frailties for multivariate survival data.Life Time Data Analysis, 3, 123-137.
53.
SantosC.A., & AchcarJ.A. (2010). A Bayesian analysis for multivariate survival data in the presence of covariates.Journal of Statistical Theory and Applications, 9, 233-253.
54.
ShakedM., & ShantikumarJ.G. (1994). Stochastic Orders and Their Applications. Academic Press, New York.
55.
SpiegelhalterD.J.BestN.G.CarlinB.P., & Van der LindeA. (2002). Bayesian measure of model complexity and fit (with discussion). Journal of the Royal Statistical Society, B., 64, 583-639.
56.
VaupelJ. W.MantonK.G., & StallaedE. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality.Demography, 16, 439-454.
57.
WienkeA. (2011). Frailty Models in Survival Analysis. Chapman & Hall/CRC.