Taste or Scale? Methodological Approach to Health Preferences Comparison across Groups

Abstract

Background.

Researchers widely use discrete choice experiments (DCEs) to assess health preferences across subgroups. However, variations in decision consistency, rather than true differences in preferences, can drive observed utility differences. Despite the growing use of DCEs to assess health preference heterogeneity, recent studies highlight a persistent lack of methodological transparency in accounting for unobserved heterogeneity, underscoring the need for technically robust approaches to support credible and actionable comparisons across groups. This study improves health preference research methods by directly addressing scale heterogeneity and reducing bias when comparing subgroups.

Methods.

A simulated DCE evaluated hypothetical cancer treatments across 2 imagined groups (patients, caregivers). Each task presented 3 alternatives (including a status quo), varying in months gained, survival rate, side-effect severity, and out-of-pocket cost. Mixed logit models were estimated. Scale heterogeneity was addressed using the Swait–Louviere 2-step procedure. Willingness to pay (WTP) was computed and compared across groups via the Poe et al. (2005) simulation-based test.

Results.

The Swait–Louviere test confirmed significant scale heterogeneity (P < 0.05) but no meaningful taste differences (P > 0.10). Once scale effects were accounted for, the analysis revealed a shared preference structure across patients and caregivers, with variability driven by inconsistent decision making rather than true preference divergence. Consistent with this, none of the between-group WTP differences were statistically significant, reinforcing the absence of meaningful subgroup contrasts and underscoring the importance of separating scale from taste to avoid biased inference.

Conclusions.

Adjusting for scale heterogeneity strengthens DCE validity by reducing bias from decision noise and enabling accurate subgroup comparisons. Using simulated data, this study applied the Swait–Louviere 2-step and scale-invariant WTP contrasts to separate taste from scale; both methods converged, showing that heterogeneity reflected scale rather than true preference differences, with negligible WTP gaps. Routine scale diagnostics, taste (preference) tests under equalized scale, and welfare space reporting are recommended to ensure valid inference. However, as this study used simulated data with no real respondents, its findings are illustrative only and not intended for real-world inference; generalizability and external drivers of scale heterogeneity were not assessed.

Key Highlights

The study enhances methodological rigor by explicitly addressing scale heterogeneity—an often-overlooked bias that improves the validity and real-world relevance of preference-based insights.

Applying the Swait–Louviere test and willingness to pay, whenever possible, enables researchers to distinguish true preference differences from response inconsistency across choice datasets.

The findings advocate for the routine inclusion of scale diagnostics in stated-preference research to strengthen health decision making and modeling practice.

Graphical Abstract

This is a visual representation of the abstract.

Keywords

scale heterogeneity Swait and Louviere test WTP health preferences comparison discrete choice experiments Poe et al. test

Over the past 2 decades, discrete choice experiments (DCEs) have become ubiquitous in health economics because they are practical, efficient, and suited to questions faced by regulators, industry, health technology assessment bodies, and patient groups.¹ Growth has been propelled by patient-centered care, which values methods that elicit and quantify patient concerns and preferences.² Preference evidence already informs decisions: for example, the US Food and Drug Administration’s device center has used DCE findings, and in Canada, patient perspectives are considered within the deliberative framework of Canada’s Drug Agency alongside clinical and economic inputs.³ Collected via qualitative or mixed-methods workflows, preference data support judgments about acceptability, tradeoffs, and equity, helping align interventions with public values and needs.^4,5

DCEs provide a systematic, quantitative way to elicit stakeholder perspectives—patients, caregivers, clinicians, and decision makers—making them useful for judging intervention acceptability across settings.⁶ By revealing how people trade off benefits and risks, DCEs link clinical evidence to patient-centered care, supporting more tailored and effective solutions. Comparing preferences across groups is essential for understanding decision making and judging generalizability. Applications include subgroup comparisons in health state valuations, informing shared decision making, and evaluating alignment across elicitation methods.^7–10 Notably, DCE and threshold technique can yield divergent rankings and fail to predict screening choices, underscoring the need for transparent, method-aware comparisons.¹¹

Interpreting subgroup comparisons is challenging because choice model coefficients bundle taste (preferences) with scale (error variance).⁵ In DCEs, observed heterogeneity arises from differences in preferences (how individuals value attributes) and scale (how consistently they make choices); scale may reflect latent factors such as attentiveness or unobserved attributes and, if ignored, can bias structural parameters in unknown directions biasing conclusions and recommendations.^11–13 Despite its prevalence, few health care DCEs explicitly diagnose scale heterogeneity and apply formal identification strategies.^14–16 Several early studies compared raw coefficients from separately estimated models without accounting for scale, thereby conflating preference differences with choice consistency and assuming homogeneous tastes.^17–19 Only a few studies have formally addressed scale heterogeneity using the Swait–Louviere test and flexible model specifications, yielding more defensible subgroup comparisons.^20,21

Most prior work relied on observed covariates and overlooked unobserved scale effects, rarely separating scale from taste. To the best of the author’s knowledge, this is the first study to address this gap by combining the Swait–Louviere test for cross-group scale adjustment with a willingness-to-pay (WTP) space strategy to isolate genuine preference differences net of scale.

Objective of the Study

The objective of this study is to test whether observed differences in stated preferences between hypothetical patients and caregivers reflect true variations in values or are driven by differences in decision consistency (scale heterogeneity), using simulated data to support methodological investigation in health preference research.

Methods

Discrete Choice Experiments

DCEs rely on attributes and their levels to describe goods, services, or policies, generating profiles via experimental design theory.²² Choice experiments (CEs) are part of the family of stated preference methods that typically ask respondents in a series of choice tasks to choose between 2 or more alternative policies, which are described based on their relevant characteristics.^23,24 Preferences for existing or new products, technologies, or policy programs are elicited using a social survey format, such as in-person or online interviews.²² This allows estimation of public demand for these policies and preferences for their specific characteristics or attributes. Usually, the benefits associated with the policies and the costs involved, for example, the price individual respondents would have to pay to secure the described policy benefits, are described. The inclusion of a monetary attribute in the DCE allows estimation of public WTP for the intervention.

Econometric Choice Model

Preferences are modeled under the random utility model, where utility $U_{ijt}$ is decomposed into a deterministic component $V_{ijt}$ and a stochastic component $ε_{ijt}$ .²⁵ CEs belong to the class of attribute-based methods in which the deterministic component of utility derived by individual i from choosing alternative j in task t is specified as

U_{ijt} = V_{ijt} + ε_{ijt} = K_{j - 1} + β_{i} X_{ijt} + δ Z_{i} + ε_{ijt}

where $U_{ijt}$ denotes total utility. $β_{i}$ represents individual-specific preference weights for attributes $X_{ijt}$ , α represents coefficients associated with individual-level covariates $Z_{i}$ , and $k_{j - 1}$ is an alternative-specific constant for alternative j. The stochastic term $ε_{ijt}$ is assumed to be independently and identically distributed following an extreme value type I distribution and is added with group-specific scale to reflect decision consistency.²⁶

A mixed-logit model was used in this study that accounts for preference heterogeneity in both observed and unobserved preferences but requires computationally intensive procedures to estimate probabilities.²⁴

The decision to use mixed logit rather than the generalized multinomial logit (GMNL) model was guided by methodological considerations. While the GMNL model provides a structured framework for disentangling heterogeneity components, there is ongoing debate about its ability to do so effectively. Some studies suggest that the GMNL model may not fully separate preference heterogeneity from scale heterogeneity, as the estimated scale parameter might also capture other forms of correlation among utility coefficients.²³

Preference and Scale Equality Test

A statistical test procedure was conducted to formally evaluate whether preferences differ across the 2 groups, examining both taste and scale heterogeneity.²⁷ This method distinguishes 2 primary sources of heterogeneity: scale heterogeneity, which reflects variations in the consistency of choices due to differences in the variance of the unobserved utility component, and taste heterogeneity, which captures variations in preference parameters for treatment attributes. To disentangle scale from taste, this test 1) conducted a grid search of a relative scale factor (λ) that maximizes the pooled log-likelihood when one group’s coefficients are rescaled and 2) conducted likelihood ratio (LR) tests at the optimal λ to assess scale and taste equality using group-specific and pooled fits. Each value of λ is used to reestimate the pooled model, and the log-likelihood is recorded. The optimal scale factor ( $\hat{λ}$ ;) is defined as the value that maximizes the pooled log-likelihood when one group’s data are rescaled.

Two LR Tests Are Performed

Taste equality test: Compared the rescaled pooled model with the sum of the group-specific log-likelihoods:

\begin{matrix} L R_{taste} = - 2 \times [LogLik (unscaled pooled) \\ - (LogLik (patient) + LogLik (caregiver))] \end{matrix}

A nonsignificant result indicates no statistical difference in preferences between groups.

Scale equality test: Compared the rescaled pooled model to the unscaled pooled model:

\begin{matrix} L R_{scale} = - 2 \times [LogLik (unscaled pooled) \\ - LogLik (rescaled pooled)] \end{matrix}

A nonsignificant result indicates no statistical difference in scale (error variance). This 2-step procedure allows rigorous testing of whether observed differences in model estimates reflect true preference heterogeneity or merely differences in decision consistency.

Scale-Invariant WTP

Because coefficient-space comparisons confound taste with scale, the study also computed marginal rates of substitution, reported as WTP: $WT P_{k} = - β_{k} / β_{cost}$ , where the quotient represents the marginal monetary value respondents place on a 1-unit change in attribute k. As a ratio of coefficients, WTP cancels the common scale—one approach for removing the confounding inherent in raw coefficient (scale) differences—enabling like-for-like cross-group contrasts. The study simulated 40,000 draws from the joint coefficient distribution to obtain WTP distributions by group and tested patient–caregiver differences with 2-sided P values and the adjustment for multiplicity.^28,29 Units are expressed as dollars per 1-unit change in the attribute (e.g., 1 month, 1 side-effect level, 10% survival). This design addresses the known confounding of taste and scale when groups prioritize the same attributes, enabling clearer inference about substantive preference differences versus differences in choice consistency.

Study Design and Data

This study used a fully simulated dataset that shares the same attribute set and levels, and no real respondents were involved. An unlabeled, full-profile DCE was conducted with 3 alternatives per task, where the third represented the status quo. Each simulated respondent completed 3 choice tasks. Two hypothetical populations—patients and caregivers—were simulated, with 500 respondents per group (1,000 total decision makers). This design enables the controlled evaluation of whether cross-group differences in estimated coefficients reflect genuine taste differences versus scale heterogeneity (differences in decision consistency) when participants prioritize the same attributes. Group labels were assigned to distinguish between patients and caregivers.

Attributes and choice rules

Attribute selection and level ranges were informed by the published literature.³⁰ A hypothetical cancer treatment was characterized by 4 attributes: 1) treatment effectiveness, expressed as additional months gained (3, 6, or 9 mo); 2) survival rate (40%, 60%, 80%, or 100%); 3) side-effect severity (1 = low, 2 = mild, 3 = severe); and 4) out-of-pocket cost (CAD 5,000, 10,000, or 20,000). Attribute levels were randomly drawn with replacement for each alternative within each task (Table 1).

Table 1

Attributes and Levels of the Hypothetical Cancer Treatment

Attribute	Definition	Levels
Effectiveness	Incremental clinical benefit (i.e., added months)	3, 6, 9
Survival rate	Increase in probability of being alive (%)	40, 60, 80, 100
Side-effect severity	Intensity of adverse events affecting daily functioning	1 (low), 2 (mild), 3 (severe)
Out-of-pocket cost	Patient-borne treatment expenses (CAD)	5,000; 10,000; 20,000

Simulation and utility specification

Utilities were generated using group-specific preference parameters for patients and caregivers representing the relative importance of effectiveness, survival, side effects, and cost, respectively. A random Gumbel error term was also added, which captured unobserved variation. The simulated respondents were assumed to select the alternative with the highest total utility, yielding 1 observed choice per task. To mimic realistic uptake behavior, the status quo option was assigned a level of zero across all 4 attributes, representing the choice to forgo treatment and incur no cost, and it was calibrated to be chosen in approximately 12% of tasks per group. The primary goal of the exercise was to demonstrate how to conduct statistical tests for scale and preference heterogeneity when comparing preferences between 2 groups.²⁷ This procedure generated a transparent and reproducible synthetic dataset that aligned with realistic cancer treatment tradeoffs and was designed to test preference and scale heterogeneity.

Only the cost and side-effect attributes were modeled as random variables, whereas survival rate and effectiveness were held fixed to reduce computational demand. All parameters were estimated via simulated maximum likelihood, with computational efficiency ensured through the use of Halton sequences (10,000 replications) to approximate the likelihood integrals.³¹

This study is based on simulated data generated for methodological investigation and controlled method demonstration of group-specific taste and scale differences. It is not uncommon to use this type of data in health preference research.¹⁴ The use of simulated data is particularly valuable when access to real-world preference data is limited or restricted due to confidentiality concerns or when suitable datasets matching the study’s subgroup comparison requirements are not available. Given the absence of an accessible real-world dataset that precisely matches the study’s testing requirements, this revision uses simulated data as a transparent and reproducible proof of concept. Simulation enables rigorous model estimation while maintaining transparency, reproducibility, and ethical integrity. The full data-generation protocol is available to authorized readers. An empirical replication using an existing DCE with subgroup comparisons is planned, contingent on data-use permissions.

Results

Three mixed logit models were estimated: one for each hypothetical group (patient and caregiver) and the third pooled sample. Given the study’s methodological focus and simulated data, parsimony was prioritized. Preliminary estimates indicated statistically significant heterogeneity only for side-effect attribute in patient group, accordingly, modeled as random coefficients (normal), while months gained and survival rate were retained as fixed. Alternative specifications were explored with additional random parameters, and no improvement was observed in model fit (loglikelihood). Because the design is demonstrative rather than inferential, attribute coefficients were not interpreted substantively, and cost was treated normally distributed due to the theoretical possibility that some respondents may perceive higher price (cost) as a quality signal. The LR tests for both the patient (χ² = 779.49) and caregiver (χ² = 374.36) models were highly significant (P < 0.001), indicating that the full models provide significantly better fit than the null models do. McFadden’s R² indicated a good model fit (0.265) for patient group and a modest fit (0.128) for the caregiver group, suggesting that the explanatory variables capture more choice variation among patient group than caregiver.³² The resulting estimates are largely consistent across models in terms of coefficient direction, magnitude, and statistical significance. A notable difference emerged for the side-effects attribute, which is statistically significant at the 1% level in the patient model but only marginally significant (10% level) in the caregivers’ group (Table 2).

Table 2

Mixed Logit Estimates for Patient, Caregiver, and Pooled Samples

Variable	Patient				Caregiver				Pooled
	Estimate	Standard Error	z	P	Estimate	Standard Error	z	P	Estimate	Standard Error	z	P
asc1	−2.395	0.227	−10.537	0.000	−1.157	0.187	−6.206	0.000	−1.668	0.142	−11.753	0.000
asc2	−2.327	0.228	−10.200	0.000	−1.299	0.189	−6.875	0.000	−1.716	0.143	−11.965	0.000
Surviva rate	0.054	0.003	19.429	0.000	0.032	0.002	16.146	0.000	0.041	0.002	25.218	0.000
Side effects	−0.174	0.043	−4.097	0.000	−0.067	0.037	−1.818	0.069	−0.112	0.028	−4.074	0.000
Months gained	0.124	0.016	7.720	0.000	0.091	0.013	7.238	0.000	0.104	0.010	10.674	0.000
Cost	−0.031	0.007	−4.702	0.000	−0.026	0.006	−4.636	0.000	−0.027	0.004	−6.529	0.000
sd.months	0.124	0.026	4.715	0.000	0.002	0.050	0.033	0.973	0.063	0.025	2.549	0.011
sd.cost	0.003	0.022	0.142	0.887	0.000	0.018	0.009	0.992	0.001	0.014	0.083	0.934
Loglikelihood	−1,103.8				−1,312				−2,445.2
McFadden R²	0.265				0.128				0.246
No. of observations	4,500				4,500				9,000

ASC, alternative specific constant; SD, standard deviation of random parameter.

Among the attributes tested, only “months gained” exhibited significant preference heterogeneity (patient), as indicated by a statistically significant standard deviation term at the 1% level. Across both groups, preferences aligned around a consistent pattern; that is, treatment profiles associated with increased survival rates and greater life extension (months gained) were favored, whereas higher costs and more severe side effects were disliked. Although these model-based comparisons suggest similar preferences between the 2 groups, visual inspection of coefficient estimates alone does not provide a statistically rigorous basis for asserting equivalence (see Figure 1).

Figure 1

Graphical comparison of preference (normalized) between patients and caregivers.

The Swait–Louviere LR test results (Table 3) provide valuable insights into the nature of heterogeneity between patient and caregiver groups. The taste differences, as indicated by the LR test (LR = 7.87, P > 0.1), were not statistically significant.

Table 3

Equality of Preferences (β) and Scale (λ) Test Results

		LL(Po)	LR-Test	Reject?	LL (Pooled)	LR-Test	Reject?
LL_p	LL_c	$λ_{p} \neq λ_{c}$	(8 df)	H₀: βp = βc	$λ_{p} = λ_{c}$	(1 df)	H₀: $λ_{p} = λ_{c}$
−512.26	−848.59	−1,431.85	7.87	No (P = 0.45 > 0.01)	−1,364.79	134.13	Yes (P < 0.001)

LL_p, LL_ca: log-likelihood values from separate models for patients and caregivers; LL(Po): log-likelihood from a single model using pooled data from both groups; LR-Test, likelihood ratio test for preference (taste) equality between groups; df, degrees of freedom; H₀: βp = βc: null hypothesis that patient and caregiver preferences (β), respectively, are equal; Reject? indicates whether the null hypothesis is rejected at the 1% significance level.

In contrast, the LR test (LR = 134.128, P < 0.001) indicates pronounced scale heterogeneity between the 2 groups (see Figure 2). This result confirms the simulation setup in which utility coefficients were intentionally set equal across groups but the scale parameter differed.

Figure 2

Log-likelihood versus relative scale (λ) plot.

Scale-Invariant WTP Comparisons

Because WTP is a marginal rate of substitution, it inherently adjusts for scale within each group. Consequently, comparing WTP between patients and caregivers is a valid strategy for testing preference differences independent of scale heterogeneity. To formally assess this, the study applied a simulation-based convolution test that compares distributions of welfare measures (e.g., WTP) formed as nonlinear functions of estimated parameters across 2 independent samples without relying on normality. Using 40,000 simulated draws, the empirical distribution (see Figure 3) of the between-group difference was generated to obtain two-sided P values and confidence intervals. After correction, none of the between-group WTP differences were statistically significant (Table 4). The mean difference (patients − caregivers) for months gained was −$8,047 per month (95% confidence interval [CI]; P = 0.054); for side effects, $223 per level (95% CI; P = 0.87); and for survival rate, −$28,054 per 10% survival (95% CI; P = 0.31), all based on 40,000 simulated draws.

Figure 3

Willingness-to-pay distributions by attribute.

Table 4

Mean Difference Willingness to Pay (Poe et al Test)

Contrast	Attribute	MeanDiff	95% CI (Lower Upper)		P Value
Patient–caregiver	Months gained ($ per 1 mo)	−8,047.08	−20,921.60	55.09	0.054
Patient–caregiver	Side effects ($ per 1 level)	223.10	−6,807.07	12,529.82	0.87
Patient–caregiver	Survival rate ($ per 10% survival)	−28,053.71	−63,162.08	5,512.22	0.31

CI, confidence interval.

Consistent with the SL procedure, which indicated no taste differences once scale heterogeneity was accommodated, these WTP contrasts corroborate the absence of meaningful preference differences between patients and caregivers.

Discussion

Comparing health preferences across groups is central to valid inference and decision relevance. However, raw coefficient contrasts can confound true preference (taste) differences with variation in scale, potentially misattributing differences in choice consistency as substantive preference gaps. Mixed logit models were estimated on controlled simulated datasets for patients and caregivers to allow for unobserved heterogeneity in preferences. Although coefficient signs and magnitudes appeared qualitatively similar across groups, visual similarity alone does not establish equivalence.

To formally distinguish taste from scale, the study applied the Swait–Louviere (SL) 2-step procedure, which first estimates a relative scale factor and then tests equality of preference parameters across groups. Importantly, the SL procedure does not require a monetary attribute; it is applicable to any cross-group DCE comparison. Results from the SL test attributed between-group heterogeneity primarily to scale rather than taste, indicating that differences in estimated coefficients reflected variation in decision consistency rather than true preference divergence.

As a welfare-based cross-check, WTP estimates were compared between groups. When a cost/price attribute is available, WTP provides a complementary, decision-relevant check because it is a marginal rate of substitution and the common scale factor cancels out. Using the Poe et al. convolution test with 40,000 simulated draws, the study formally tested whether between-group WTP distributions differed. None of the between-group WTP differences statistically or practically reinforced the conclusion that patient and caregiver preferences are aligned once scale is accounted for and independently corroborated the SL result.

Interpreting scale heterogeneity is important for applied research. Scale captures variation in the unobserved component of utility (i.e., error variance) and is often viewed as reflecting differences in decision consistency or attentiveness. In empirical contexts, such variation can arise from survey burden, health literacy, or cognitive load. Although the present study uses simulated data and does not explore these underlying drivers, the results underscore the necessity of diagnosing and addressing scale heterogeneity before attributing observed differences to taste. This interpretation follows established guidance in the stated-preference literature that cross-group contrasts should explicitly consider scale to avoid overstating taste differences.

Limitations

These findings are based on simulated data with no real respondents. External drivers of scale heterogeneity and generalizability to specific patient populations were not examined. In practice, conclusions can depend on model specification, distributional assumptions, and simulation settings; we mitigated this with many draws for the Poe et al. test.

Conclusions

In summary, preference comparisons must be made on a scale-aware footing. This methodological study using simulated DCE data and mixed-logit models paired the Swait–Louviere 2-step with scale-invariant WTP contrasts (evaluated via the Poe et al. simulation test) to separate true taste differences from decision noise. Both approaches converged: between-group heterogeneity was attributable to scale, and WTP contrasts showed no statistically meaningful gaps, indicating alignment of preferences once the scale was accommodated. The study therefore recommends that health-preference analyses—and the policy decisions they inform—routinely include explicit scale diagnostics, that is, estimate relative scale factors, test taste equality under equalized scale, and report WTP results when a monetary attribute is present. Conclusions should be based on scale-adjusted evidence rather than raw coefficients or visual inspection alone. While these findings are illustrative and not intended for direct real-world inference, they provide a clear, reproducible blueprint for empirical applications to avoid scale-induced misinterpretation and to ground policy guidance in correctly identified preference signals. Future research should extend this methodological framework to real-world choice data, enabling the validation of the simulation-based findings under observed behavioral conditions.

Footnotes

Acknowledgements

The author conducted all aspects of this work independently.

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The author received no financial support for the research, authorship, and/or publication of this article.

Ethical Considerations

This study used computer-generated simulated data and did not involve human participants or animal subjects.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

ORCID iD

Solomon Tarfasa Faro

Data Availability

The simulated data and code generated during this study are available from the corresponding author upon reasonable request.

References

Barber

Bekker

Marti

Pavitt

Khambay

Meads

. Development of a discrete-choice experiment (DCE) to elicit adolescent and parent preferences for hypodontia treatment. Patient. 2019;12(1):137–48. DOI: 10.1007/S40271-018-0338-0

Mühlbacher

Nübling

. Analysis of physicians’ perspectives versus patients’ preferences: direct assessment and discrete choice experiments in the therapy of multiple myeloma. Eur J Health Econ. 2011;12(3):193–203. DOI: 10.1007/s10198-010-0218-6

Gonzalez

. A guide to measuring and interpreting attribute importance. Patient. 2019;12(3):287–95. DOI: 10.1007/s40271-019-00360-3

Methods Guide | CDA-AMC. Available from:https://www.cda-amc.ca/methods-guide. [Accessed 8 November, 2025].

Regier

Bentley

Mitton

, et al. Public engagement in priority-setting: results from a pan-Canadian survey of decision-makers in cancer control. Soc Sci Med. 2014;122:130–9. DOI: 10.1016/j.socscimed.2014.10.038

Faulkner

Sayuri Ii

Pakarinen

, et al. Understanding multi-stakeholder needs, preferences and expectations to define effective practices and processes of patient engagement in medicine development: a mixed-methods study. Health Expect. 2021;24(2):601–16. DOI: 10.1111/hex.13207

Stamuli

. Health outcomes in economic evaluation: who should value health? Br Med Bull. 2011;97(1):197–210. DOI: 10.1093/bmb/ldr001

Hausman

. Proxy preferences and the values of children’s health states. PharmacoEconomics. 2024;42(10):1065–72. DOI: 10.1007/s40273-024-01415-6

Crossnohere

Fischer

Vroom

Furlong

Bridges

JFP

. A comparison of caregiver and patient preferences for treating Duchenne muscular dystrophy. Patient. 2022;15(5):577–88. DOI: 10.1007/s40271-022-00574-y

10.

Zhang

Xie

. Differences between physician and patient preferences for cancer treatments: a systematic review. BMC Cancer. 2023;23(1):1126. DOI: 10.1186/s12885-023-11598-4

11.

Louviere

Flynn

Carson

. Discrete choice experiments are not conjoint analysis. J Choice Model. 2010;3(3):57–72. DOI: 10.1016/S1755-5345(13)70014-9

12.

Hossain

Saqib

Haq

. Scale heterogeneity in discrete choice experiment: an application of generalized mixed logit model in air travel choice. Econ Lett. 2018;172:85–88. DOI: 10.1016/j.econlet.2018.08.037

13.

Karim

Craig

Vass

Groothuis-Oudshoorn

CGM

. Current practices for accounting for preference heterogeneity in health-related discrete choice experiments: a systematic review. PharmacoEconomics. 2022;40(10):943–56. DOI: 10.1007/s40273-022-01178-y

14.

Wright

Vass

Sim

Burton

Fiebig

Payne

. Accounting for scale heterogeneity in healthcare-related discrete choice experiments when comparing stated preferences: a systematic review. Patient. 2018;11(5):475–88. DOI: 10.1007/s40271-018-0304-x

15.

Jiang

Pullenayegum

Shaw

, et al. Comparison of preferences and data quality between discrete choice experiments conducted in online and face-to-face respondents. Med Decis Making. 2023;43(6):667–79. DOI: 10.1177/0272989X231171912

16.

Nakayama

Kobayashi

Okazaki

Imanaka

Yoshizawa

Mahlich

. Patient preferences and urologist judgments on prostate cancer therapy in Japan. Am J Mens Health. 2018;12(4):1094–101. DOI: 10.1177/1557988318776123

17.

Mühlbacher

Johnson

. Choice experiments to quantify preferences for health and healthcare: state of the practice. Appl Health Econ Health Policy. 2016;14(3):253–66. DOI: 10.1007/s40258-016-0232-7

18.

Hauber

González

Groothuis-Oudshoorn

CGM

, et al. Statistical methods for the analysis of discrete choice experiments: a report of the ISPOR conjoint analysis good research practices task force. Value Health. 2016;19(4):300–15. DOI: 10.1016/j.jval.2016.04.004

19.

Ungar

Hadioonzadeh

Najafzadeh

Tsao

Dell

Lynd

. Quantifying preferences for asthma control in parents and adolescents using best–worst scaling. Respir Med. 2014;108(6):842–51. DOI: 10.1016/j.rmed.2014.03.014

20.

Veldwijk

DiSantostefano

Janssen

, et al. Maximum acceptable risk estimation based on a discrete choice experiment and a probabilistic threshold technique. Patient. 2023;16(6):641–53. DOI: 10.1007/s40271-023-00643-w

21.

Wang

Rowen

Chen

Mukuria

Street

Norman

. Valuing health and wellbeing using discrete choice experiment: exploring feasibility, design effect and international preference similarity. Eur J Health Econ. 2026;27(2):339–53. DOI: 10.1007/s10198-025-01821-3

22.

Bennett

Birol

, eds. Choice Experiments in Developing Countries: Implementation, Challenges and Policy Implications. Cheltenham (UK): Edward Elgar; 2010.

23.

Hess

Train

. Correlation and scale in mixed logit models. J Choice Model. 2017;23:1–8. DOI: 10.1016/j.jocm.2017.03.001

24.

Newman

. Systematically heterogeneous covariance in network GEV models. In: Hess

Daly

, eds. Choice Modelling: The State-of-the-Art and the State-of-Practice. Leeds (UK): Emerald Group Publishing Limited; 2010. p 237–58. DOI: 10.1108/9781849507738-010

25.

McFadden

Zarembka

. Conditional logit analysis of qualitative choice behavior. In Zarembka

, ed. Frontiers in Econometrics. New York: Academic Press; 1974. p 105–42. Available from: https://www.scirp.org/reference/referencespapers?referenceid=2646910. [Accessed 2 November, 2025].

26.

Train

. Discrete Choice Methods with Simulation. Cambridge (UK): Cambridge University Press; 2009.

27.

Swait

Louviere

. The role of the scale parameter in the estimation and comparison of multinomial logit models. J Mark Res. 1993;30(3):305. DOI: 10.2307/3172883

28.

Poe

Giraud

Loomis

. Computational methods for measuring the difference of empirical distributions. Am J Agric Econ. 2005;87(2):353–65. DOI: 10.1111/j.1467-8276.2005.00727.x

29.

Benjamini

Hochberg

. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Stat Methodol. 1995;57(1):289–300. DOI: 10.1111/j.2517-6161.1995.tb02031.x

30.

Fifer

Lybrand

Axford

Roach

. Alignment of preferences in the treatment of multiple myeloma—a discrete choice experiment of patient, carer, physician, and nurse preferences. BMC Cancer. 2020;20(1):546. DOI: 10.1186/s12885-020-07018-6

31.

Greene

Hensher

. Does scale heterogeneity across individuals matter? An empirical assessment of alternative logit models. Transportation. 2010;37(3):413–28. DOI: 10.1007/s11116-010-9259-z

32.

Hoyos

. The state of the art of environmental valuation with discrete choice experiments. Ecol Econ. 2010;69(8):1595–603. DOI: 10.1016/j.ecolecon.2010.04.011