Abstract
The marginal approach and the conditional approach are two different ways to model clustered dental failure time data. We compare the two approaches in the context of a Cox regression analysis, where the aim is to estimate the effect of a covariate (e.g., dental treatment) on the risk of failure. Specifically, we treat within-cluster correlation as if it was introduced by unobserved cluster level covariates, and study the small sample behaviour of the marginal and the conditional approach. We show that in a non-randomized setting where an unobserved cluster variable is correlated with the variable of interest, both the marginal and the conditional approaches can give misleading results. We argue that this is an important message, since most often it is assumed that the frailty term and the covariates of interest are independent.
Introduction
Dental studies typically have a clustered data structure. Multiple teeth, fillings, implants, veneers, etc., are studied in the same mouth; multiple patients are treated by the same dentist. Two main approaches exist to handle clustered responses, i.e., the marginal approach and the conditional approach. Glidden and Vittinghoff, (2004) contains a survey of the two approaches. In this article, we study clustered dental failure times and extend the work of Glidden and Vittinghoff (2004) by exploring the performance of both approaches when an often made statistical assumption is violated. The research questions are addressed by simulations. While no new statistical models are proposed in this article nor solutions given to the posed problems, we hope that our explorations are thought provoking and will encourage statisticians to be cautious when drawing scientific conclusions from multilevel studies. We also hope that our findings will trigger new statistical approaches that can address the issues raised here. Finally, in the context of a Cox regression model, we aim at deriving guidelines for a researcher who needs to decide for a modelling strategy and eventually has to interpret the results of the selected model.
We base all discussion on the hypothesis that the clustered failure times are generated by a conditional Cox model, where the within-cluster correlation can be explained by an unobserved cluster level covariate, the frailty. Under this hypothesis, the marginal Cox model is misspecified and estimates obtained from this model are biased. Assuming that the observed covariates are independent of the frailty, it is possible to prove that the absolute value of the true hazard ratio is systematically underestimated by the marginal approach. This well-studied phenomenon is called attenuation (Bretagnolle and Huber-Carol, 1988; Struthers and Kalbfleisch, 1986).
When the frailty is independent of the observed covariates, the conditional approach provides consistent estimates for the hazard ratio of the observed covariates for a range of frailty distributions (Parner, 1998; Zeng et al., 2008). Thus, one motivation for introducing a frailty term into a Cox model is to remove the attenuation bias of the corresponding marginal analysis.
We generate data to study what can happen if an unobserved confounder is correlated with observed covariates. It turns out that both the marginal and the conditional approaches may produce biased estimates of the conditional regression coefficients and that the bias can be in either direction. We believe that this message is important for the interpretation of non-randomized clinical studies in oral health research but also for survival studies of clustered data in general.
Randomization is a well-known tool to break down the dependence of one variable of interest (e.g., treatment) and the observed and unobserved confounders. We further investigate estimating the effect of a randomized covariate if non-randomized covariates, that are correlated with an unobserved confounder, are included into the model.
In a practical situation, it is usually not possible to know if and how the observed outcome and covariates depend on the unobserved cluster-specific confounders. However, by discussion with the subject-matter expert who collected the data it is often possible to come up with an educated guess. This could then be used to study the robustness of a given analysis and to infer the direction of the potential bias. To illustrate this idea, we perform a prototype sensitivity analysis for a dental implant study (Gerds and Vogeler, 2005).
The article is organized as follows. In Section 2, we discuss two real-life examples in which dental failure times depend on unobserved variables. In Section 3, we review the marginal and the conditional approaches in the context of the Cox regression model. In Section 4, we explain the modelling options for the case where one believes in a conditional Cox regression model, and Section 5 presents results from simulated data in various scenarios. In Section 6, we perform a prototype sensitivity analysis which aims at identifying different possible parameter constellations that are consistent with the results of the observed data. Further discussion is postponed to Section 7.
Background: Clustered dental research
It is important to control statistical analyses for confounding. However, it is usually not possible to observe all potential confounders. In a dental study with multiple study units (e.g., teeth) in one cluster (e.g., patient) all the patient-specific and unit-specific characteristics that affect the risk of failure are potential confounders. The consequences of not observing or omitting an important factor can be serious. We illustrate this by two real-life scenarios.
Our first example is an illustration of the impact of omitting patient-specific characteristics in the model. Consider a comparative study of dental implant types where between one and eight implants are placed in the same patient. Now suppose that the expected implant survival chances are highly dependent on whether or not the patient has a genetic predisposition to inflammation. It may be unlikely that genetic predisposition to inflammation leads to a preference of a specific implant type. However, often only one implant type can be placed in the same patient. Then, chance can make it happen—in a naive analysis—that one of the implant types is falsely accused to have worse survival chances just because there were few frail patients carrying many implants of this type.
As a second example, we show the impact of ignoring tooth-specific characteristics in the model. Consider an observational study on the longevity of dental filling materials. Suppose that the larger cavities were systematically more often treated with the strongest material and the small cavities systematically more often with the most aesthetic material. Suppose further that for both materials the expected survival time is higher if the cavity is smalller. Then, a marginal analysis of survival chances which does not include cavity size will likely suffer from confounding bias. And, if the cavity size is not recorded in the study protocol, then it is not possible to correct the analysis for the cavity size. Note that in this situation the observed covariate (treatment material) depends on the unobserved confounder (cavity size). In this article, we demonstrate that in this situation both a standard marginal Cox regression analysis and a standard frailty Cox regression analysis can be severely misleading.
Review of marginal and frailty models for survival regression analysis
How should the cluster effect be modelled? Marginal estimates reflect the average effect in the population, and conditional estimates describe the effects on subjects, e.g., teeth, implants or fillings (Glidden and Vittinghoff, 2004; Hougaard, 2000; Martinussen and Scheike, 2006).
Marginal models
The marginal approach models population averaged covariate effects and does not explicitly incorporate cluster effects. A model is assumed for the effect of covariates on the hazards of the subjects under a working independence assumption.
The most common modelling assumption is that the marginal hazard has a proportional hazards form. Under this model, the event time of the jth subject in the ith cluster,
where
Frailty models
In frailty models, covariate effects are specified conditionally on an unobserved random effect (the frailty) that accounts for the within-cluster correlation. Often one assumes a proportional hazards model given a cluster-shared frailty variable. That is, the conditional hazard function for
where
By writing the conditional hazard as
where
The regression parameters b should be interpreted conditionally on the frailty. For example, consider a treatment comparison with a single covariate coded as 1 for ‘treated’ and 0 for ‘untreated’. The hazard ratio of a treated tooth from patient i compared to an untreated tooth from patient
The hazard ratio is constant over time, but is different for different patient pair comparisons. The
The interpretation of the regression parameters (hazard ratios) depends on which of the two models that is believed to generate the failure times. If a conditional Cox proportional hazards model is generating the data, then the corresponding marginal model can be a complex function of the covariates and generally does not have proportional hazards. Also, the other way around, if the data is generated by a marginal Cox model, then the effect of the covariates in a conditional model will generally be a complex function of time and of the other parameters. It is only in the special case when the frailty follows a positive stable distribution, that the marginal and conditional formulations both yield proportional hazards. A reference for all this is Martinussen and Scheike, (2006, Example 9.2.2).
Attenuation bias
For the Cox proportional hazards model it is known that excluding an important factor which is independent of the observed covariates from the analysis leads to estimates of the covariate effects which are systematically too small in absolute value (Struthers and Kalbfleisch, 1986). That is, if a frailty is present but ignored then the covariate effects will be smaller than those in the underlying frailty model (attenuation bias). The last statement is only valid if the frailty is not correlated with the observed covariates (Gerds and Schumacher, 2001).
Modelling options
The scenario of dental survival analysis
For simplicity, we consider only one level of clustering, where, e.g., multiple teeth are studied within the same patient, and focus on the effects of the covariates on the risk of failure.
Suppose we have n patients and

Here
In real applications, the dimension of any unobserved covariates will usually be unknown. Also, it is generally not possible to deduce from the observed data if and in which way the unobserved data correlate with the observed variables. This yields many different scenarios (see Figure 1).
Formally the dependencies can be described by a structural equation model (Sánchez et al., 2005) by combining Equation (4.1) with models for the other variables. For example,
where expit is the inverse logit function. In this example,
The marginal Cox approach corresponds to specifying a linear predictor for a Cox regression model based on the observed covariates:
If
Modeller’s choice: The conditional approach
With the conditional approach a frailty term Wi is introduced which plays the role of the unobserved patient covariate(s):
If
Simulation studies
We wish to examine the impact of a correlation between the frailty term and the observed covariates on the estimated regression coefficients in terms of bias and variability. Two cases are of interest here, i.e., a patient-specific covariate and a tooth-specific covariate. These cases refer to the two examples in Section 2. This impact will be examined via simulations with Scenarios 1 and 2 pertaining to the first case and Scenario 3 to the second case.
We simulate teeth from 300 patients under a number of scenarios. Survival times are generated from a Cox-exponential model following variants of (2) with constant baseline hazard 0.1. Right censoring times are generated from an exponential distribution with hazard 0.1. In all settings, the random effect
For each patient, we first generate eight teeth and then introduce random missingness for each tooth with a probability of 50%. This yields in total about four times as many teeth as patients. We report the average bias for estimating the conditional effect and root-mean-squared error (RMSE) across 1000 runs.
Scenario 1: Estimating a mouth specific treatment effect
Estimating a treatment effect correlated with the frailty
We draw binary data
The first three rows of Table 1 show the average bias in estimating the effect of Xi on survival,
In the last three rows of the table where
Average bias (root-mean-square error) across 1000 simulations in estimating the effect βxt of a treatment which is the same for all teeth in the same patient. Columns 3 and 5 show results for the marginal Cox regression model and columns 4 and 6 for the log-normal frailty Cox regression model
Average bias (root-mean-square error) across 1000 simulations in estimating the effect βxt of a treatment which is the same for all teeth in the same patient. Columns 3 and 5 show results for the marginal Cox regression model and columns 4 and 6 for the log-normal frailty Cox regression model
The parameter
We introduce two new covariates,
Table 2 shows that the association between the two auxiliary covariates and the random effect does not lead to any considerable bias in the conditional estimate of
Average bias (root-mean-squared error) across 1000 simulations in estimating the effect βxt of a randomized treatment (βux = 0, βzx = 0, βvx = 0) which is the same for all teeth in the same patient. The unobserved confounder is correlated with survival and with two auxiliary covariates: βut = 0.7, βuz = 0.7, βuv = 0.7. Columns 3 and 5 show results for the marginal Cox regression model, and columns 4 and 6 for the log-normal frailty Cox regression model
Average bias (root-mean-squared error) across 1000 simulations in estimating the effect βxt of a randomized treatment (βux = 0, βzx = 0, βvx = 0) which is the same for all teeth in the same patient. The unobserved confounder is correlated with survival and with two auxiliary covariates: βut = 0.7, βuz = 0.7, βuv = 0.7. Columns 3 and 5 show results for the marginal Cox regression model, and columns 4 and 6 for the log-normal frailty Cox regression model
We now consider a similar setting as in the previous subsection but allow a positive association between
Average bias (root-mean-squared error) across 1000 simulations in estimating the effect βxt = 0.7 of a treatment which is the same for all teeth in the same patient (columns 3 and 4) and in estimating the effect βzt = 0.7 of an auxiliary tooth-specific covariate (columns 5 and 6). The unmeasured confounder is not correlated with the treatment (βux = 0) but with the tooth-specific covariate (βuz = 0.7). Columns 3 and 5 show results for the marginal Cox regression model, and columns 4 and 6 for the log-normal frailty Cox regression model
Average bias (root-mean-squared error) across 1000 simulations in estimating the effect βxt = 0.7 of a treatment which is the same for all teeth in the same patient (columns 3 and 4) and in estimating the effect βzt = 0.7 of an auxiliary tooth-specific covariate (columns 5 and 6). The unmeasured confounder is not correlated with the treatment (βux = 0) but with the tooth-specific covariate (βuz = 0.7). Columns 3 and 5 show results for the marginal Cox regression model, and columns 4 and 6 for the log-normal frailty Cox regression model
Now we repeat the same simulation setup as in the previous subsection, but without including Zij in the regression model when estimating the effect of Xi on survival, βxt. The results are shown in Table 4.
When there is a direct effect of
Average bias (root-mean-squared error) across 1000 simulations in estimating the effect βxt = 0.7 of a treatment which is the same for all teeth in the same patient when the variable Zij is not included in the regression analyses. Column 3 shows results for the marginal Cox regression model and column 4 results for the log-normal frailty Cox regression model
Average bias (root-mean-squared error) across 1000 simulations in estimating the effect βxt = 0.7 of a treatment which is the same for all teeth in the same patient when the variable Zij is not included in the regression analyses. Column 3 shows results for the marginal Cox regression model and column 4 results for the log-normal frailty Cox regression model
We conclude from the different simulation exercises under Scenario 1, that:
if a patient-specific covariate correlates directly with the unobserved covariate, both approaches can give biased estimates of the conditional regression coefficients. if the patient-specific covariate of interest is randomized, then there is no appreciable bias of the frailty model, even in the presence of mouth- and tooth-specific confounders that are related to the frailty term. if the patient-specific covariate of interest is not randomized and relates indirectly to the frailty term, then there could be an appreciable bias for both approaches whether or not one corrects for the confounders that relate to the frailty term. That bias can be even larger for the conditional approach, from which we simulated the survival times.
Scenario 2: Estimating a tooth-specific treatment effect when auxiliary covariates are correlated with the frailty
We perform similar simulations as in the previous sections with a tooth-specific variable of interest (treatment). These settings cover the popular split-mouth design were several treatments are applied within the same mouth. However, when we do not randomize the treatment within mouths, the results are very similar to those obtained for a patient-specific treatment and do not lead to new findings. Therefore, we do not report these results here. Instead, we consider a binary treatment variable and randomize it within each patient (hereafter termed randomized split-mouth design). For simplicity and to reflect what is often done in practice, we exclude patients who contribute only one tooth and randomly choose two teeth from patients who contribute with more than two teeth. In this setting, the treatment assignment is independent of the measured and unmeasured variables that affect survival. However, the observed variables can still be correlated with an unmeasured confounder.
We design a new simulation study with exactly two teeth per patient which are assigned to different treatments. Otherwise we use the same setup as in Section 5.1. We study the bias and root-mean-squared error for estimating a treatment effect of
Average bias (root-mean-squared error) across 1000 simulations in estimating the effect βxt = 0.7 of a treatment which is randomly assigned to two studied teeth within each patient. Columns 3 and 4 show results from adjusted models and columns 5 and 6 from unadjusted regression. Columns 3 and 5 show results for the marginal Cox regression model, and columns 4 and 6 results for the log-normal frailty Cox regression model
Average bias (root-mean-squared error) across 1000 simulations in estimating the effect βxt = 0.7 of a treatment which is randomly assigned to two studied teeth within each patient. Columns 3 and 4 show results from adjusted models and columns 5 and 6 from unadjusted regression. Columns 3 and 5 show results for the marginal Cox regression model, and columns 4 and 6 results for the log-normal frailty Cox regression model
Average bias (root-mean-squared error) across 1000 simulations in estimating the effects βvt = 0.7 of a patient-specific covariate and βzt = 0.7 of a tooth-specific covariate. Columns 3 and 5 show results for the marginal Cox regression model, and columns 4 and 6 results for the log-normal frailty Cox regression model
Table 5 shows that the conditional approach is clearly advantageous for estimating the treatment effect and also that adjusting for auxiliary variables can significantly improve the results. The latter leads to the following recommendation: it is useful to control for auxiliary variables in a randomized split-mouth study even if it is not possible to estimate the effects of the auxiliary variables themselves very well (see Table 6).
We conclude from the different simulation exercises under Scenario 2, that
In the mixed case with up to 8 teeth per mouth the expected results for a tooth-specific variable are similar to those seen for a patient-specific variable in Scenarios 1 and 2. In the randomized split-mouth design with two teeth studied in each mouth, the conditional approach may outperform the marginal approach if there is clustering among the two teeth. In the randomized split-mouth design with two teeth studied in each mouth it is important to adjust for auxiliary covariates even if their own effect on survival cannot be well estimated. In the balanced design with two teeth, the bias for a tooth-specific variable is relatively smaller than that for a patient-specific variable. This occurs in the confounded as well as in the un-confounded situations.
Dental implant study
We reconsider data from a dental study of 211 TPS-SteriOss® implants carried by 61 patients (Gerds and Vogeler, 2005; Gerds et al., 2009). Between 2 and 10 study implants were placed in each patient. The patients were followed for a median observation time of 4.6 years (CI95 = [3.8; 4.8] years). Overall only 13 implants failed during their individual study period. The Kaplan–Meier estimate of the cumulative incidence after 4.6 years was 5.6% (CI95: [2.1;9.0]%). The patients were between 27 and 80 years old (mean=49.2 years), 29 of them were female and 32 male. In total, these patients carried 211 implants of which 166 were placed in the lower jaw (mandible) and 45 in the upper jaw (maxilla).
Marginal Cox regression analysis
Dental implants placed in the lower jaw (mandible) have a significantly lower hazard of failure compared to dental implants placed in the upper jaw. Table 7 shows results of a marginal Cox regression model applied to the TPS-SteriOss® implant data adjusting the hazard ratio for patient’s age and sex. A robust variance estimator (sandwich estimator) is used for the confidence limits and p-values (Lin and Wei, 1989).
Results of a marginal Cox regression analysis of implant placement upper jaw versus lower jaw adjusted for patient’s age and sex
Results of a marginal Cox regression analysis of implant placement upper jaw versus lower jaw adjusted for patient’s age and sex
Using the same data as in the previous section and using the same linear predictor, we also fitted a shared gamma frailty model (see Table 8). The estimated hazard ratio for upper versus lower jaw from the marginal analysis (3.62) is considerably lower than that of the frailty analysis (5.72). As detailed in the previous sections, a natural explanation for this difference is the attenuation bias induced by unobserved patient-specific variables which are picked up by the frailty analysis but ignored in the marginal analysis.
Results of a frailty Cox regression analysis of implant placement upper jaw versus lower jaw adjusted for patient’s age and sex
Results of a frailty Cox regression analysis of implant placement upper jaw versus lower jaw adjusted for patient’s age and sex
To further analyze the differences between the marginal and the frailty Cox regression analysis, in the context of the simulation results of the previous section, we conduct a sensitivity study. The aim of the sensitivity analysis is to identify parameter constellations which are consistent with the marginal Cox regression analysis of the real implant dataset, in situations where it is misspecified because it ignores an important patient level confounder. In essence, we wish to show that several true configurations are compatible with the outcome of a marginal Cox regression analysis when there is a correlation between the covariate of interest and the frailty term.
Setup
We simulate data that are alike the dental implant data from the study (Gerds and Vogeler, 2005). We simulate covariate data based on the observed patient’s age, sex and implant jaw placement distributions, where we approximate the age distribution by a normal distribution and the sex and jaw distributions by binomial distributions. We assume independence between the age, sex (male/female) and jaw (lower=Mandible/upper=Maxilla) distributions. We fit a parametric Cox-Weibull regression model (Bender et al., 2005) using the same linear predictor as the marginal Cox regression model shown in Table 7. In our basic setting, we simulate eight implant failure times per patient based on the parameters from the Weibull regression model. We then introduce randomly missing values as in the previous section to achieve that different patients contribute with differently many (between 1 and 8) implants. In the basic setting, we do not consider unmeasured confounders, and, hence, here the marginal model fitted to the real data is the data-generating model. A second Weibull model is fit to the right censoring times but without considering covariate effects. The parameters are used to simulate right censoring times similar to those in the real data.
We then deviate from the basic setting by introducing a normally distributed patient-specific confounder. Simulations are repeated for varying effects (hazard ratios) of the confounder on the implant failure time, and for varying the effects of the confounder on the probability of the implant being placed in the upper jaw (odds ratios in a univariate logistic regression model).
We define a parameter constellation as consistent with the outcome of the marginal Cox regression analysis of the real data if the hazard ratio of a corresponding marginal Cox model is within a given tolerance interval (see below) around the estimated hazard ratio in the real data. The hazard ratio is estimated by simulating data based on the parameter constellation. We focus on the hazard ratio of one of the covariates (jaw) and disregard possible deviations of the other parameters. The search for parameter constellations that are consistent with the marginal hazard ratio for jaw (HR=3.62) in the dental implant data is done as follows. In each setting, we start by simulating 100 independent datasets using the hazard ratios estimated by the marginal approach for the variables jaw, age and sex (see Table 7). In each of these datasets, we fit a marginal Cox regression model and compute the median of the 100 estimated hazard ratios. We then compare the median hazard ratio with the tolerance interval [3.4; 3.9] which corresponds to ± 0.05 (on the log scale) of the estimated hazard ratio for jaw in the marginal Cox model. If the median value is inside the tolerance interval, we conclude that the parameter setting is consistent with the marginal Cox regression analysis of the real implant data. If the median value is above (or below) the tolerance interval we increase (or decrease) the data-generating hazard ratio for jaw in the simulation setting by 5% and repeat the previous step keeping all the other simulation parameters fixed. This procedure is repeated until the median is inside the tolerance interval. The final hazard ratio for the jaw together with the other parameters in the corresponding simulation setting is considered a consistent parameter constellation.
Results
Our first observation from the simulation study is that the low event rate combined with the relatively small sample size yields a too large sampling variability. To avoid chance findings, we increase the sample size from 61 patients (real data) to 1000 patients which result in about 4000 implants in each simulated sample. We also reduce the censoring hazard by a factor 1000 and thereby prolong the follow-up period from 4.6 years of the real data to about 19 years.
Figure 2 shows the parameter constellations that are consistent with the different simulation scenarios obtained by varying the data-generating hazard ratios of the unobserved confounder on the implant failure times, and by also varying the data-generating odds ratio of the unobserved confounder on the probability of placing an implant in the upper jaw.

The sensitivity analysis shows that multiple parameter constellations are consistent with the real-data marginal Cox regression results. The assumptions of the frailty model are met only when the unobserved confounder has no effect on the probability of placing an implant in the upper jaw, that is when the log-odds ratio is 0. In this case, the results are symmetric and constellations with data generating log-hazard ratios of both –1.5 and 1.5 are consistent with both the real data frailty and the real data marginal Cox regression analysis. However, if the unobserved confounder has an effect on the probability of placing an implant in the upper jaw, then even data-generating hazard ratios below the marginal estimate are consistent.
In this article, we have investigated modelling options for dental failure time analysis when multiple teeth are studied within the same mouth. Using simulated data with a typical dental structure, we compared the small sample properties of the marginal approach and the conditional approach. We showed, as expected, that the marginal approach underestimates the effect of the underlying conditional Cox model if the unobserved variable is independent of the variable of interest. This so-called attenuation bias is well known in univariate survival; see e.g., Henderson and Oman (1999). We also derived the direction of the bias in the non-randomized case and confirmed the theoretical results of Gerds and Schumacher (2001), who showed that the bias can be in both directions when the unobserved (or omitted) variable depends on the variable of interest.
In this article, we have shown that a severe bias may occur if the frailty is associated with the observed covariates. In fact, even if the data-generating model is a conditional model, it may occur that the deviation of a frailty Cox model from the data-generating model can be larger than that of a corresponding marginal Cox model when the unobserved confounder strongly correlates with the variable of interest. This result indicates that randomization is important for estimating dental treatment effects. However, even in a random treatment study, it may happen that the unobserved confounder is correlated with auxiliary variables. Our simulations indicate that by including the auxiliary variables into the model at least the estimates for the randomized variable are nearly unbiased.
In a dental setting, Chuang and Cai (2006) practically compared the results of fitting a marginal and a frailty model on a specific dataset and found that the confidence intervals for the survival predictions of implants were smaller in frailty models compared to that in the marginal approach.
In our simulation study we have only considered conditional data-generating models in which the underlying model is consistent with a frailty model. It would be interesting, but is beyond the scope of this article, to conduct a similar study where we instead generate survival times from a marginal model, and then investigate the bias in estimating the marginal parameters with other approaches.
In real applications suitable discussion with the subject matter people, i.e., the dentists who are conducting the study, may reveal expert opinions on the possible correlations of the frailty with the observed treatment variable and the observed auxiliary variables. We have demonstrated how a sensitivity analysis can be performed to show parameter constellations that are consistent with the expert opinion and the estimated Cox regression analyses.
