Abstract
Background.
Researchers widely use discrete choice experiments (DCEs) to assess health preferences across subgroups. However, variations in decision consistency, rather than true differences in preferences, can drive observed utility differences. Despite the growing use of DCEs to assess health preference heterogeneity, recent studies highlight a persistent lack of methodological transparency in accounting for unobserved heterogeneity, underscoring the need for technically robust approaches to support credible and actionable comparisons across groups. This study improves health preference research methods by directly addressing scale heterogeneity and reducing bias when comparing subgroups.
Methods.
A simulated DCE evaluated hypothetical cancer treatments across 2 imagined groups (patients, caregivers). Each task presented 3 alternatives (including a status quo), varying in months gained, survival rate, side-effect severity, and out-of-pocket cost. Mixed logit models were estimated. Scale heterogeneity was addressed using the Swait–Louviere 2-step procedure. Willingness to pay (WTP) was computed and compared across groups via the Poe et al. (2005) simulation-based test.
Results.
The Swait–Louviere test confirmed significant scale heterogeneity (P < 0.05) but no meaningful taste differences (P > 0.10). Once scale effects were accounted for, the analysis revealed a shared preference structure across patients and caregivers, with variability driven by inconsistent decision making rather than true preference divergence. Consistent with this, none of the between-group WTP differences were statistically significant, reinforcing the absence of meaningful subgroup contrasts and underscoring the importance of separating scale from taste to avoid biased inference.
Conclusions.
Adjusting for scale heterogeneity strengthens DCE validity by reducing bias from decision noise and enabling accurate subgroup comparisons. Using simulated data, this study applied the Swait–Louviere 2-step and scale-invariant WTP contrasts to separate taste from scale; both methods converged, showing that heterogeneity reflected scale rather than true preference differences, with negligible WTP gaps. Routine scale diagnostics, taste (preference) tests under equalized scale, and welfare space reporting are recommended to ensure valid inference. However, as this study used simulated data with no real respondents, its findings are illustrative only and not intended for real-world inference; generalizability and external drivers of scale heterogeneity were not assessed.
Key Highlights
The study enhances methodological rigor by explicitly addressing scale heterogeneity—an often-overlooked bias that improves the validity and real-world relevance of preference-based insights.
Applying the Swait–Louviere test and willingness to pay, whenever possible, enables researchers to distinguish true preference differences from response inconsistency across choice datasets.
The findings advocate for the routine inclusion of scale diagnostics in stated-preference research to strengthen health decision making and modeling practice.
This is a visual representation of the abstract.
Keywords
Get full access to this article
View all access options for this article.
