Sample size and power for a stratified doubly randomized preference design

Abstract

The two-stage (or doubly) randomized preference trial design is an important tool for researchers seeking to disentangle the role of patient treatment preference on treatment response through estimation of selection and preference effects. Up until now, these designs have been limited by their assumption of equal preference rates and effect sizes across the entire study population. We propose a stratified two-stage randomized trial design that addresses this limitation. We begin by deriving stratified test statistics for the treatment, preference, and selection effects. Next, we develop a sample size formula for the number of patients required to detect each effect. The properties of the model and the efficiency of the design are established using a series of simulation studies. We demonstrate the applicability of the design using a study of Hepatitis C treatment modality, specialty clinic versus mobile medical clinic. In this example, a stratified preference design (stratified by alcohol/drug use) may more closely capture the true distribution of patient preferences and allow for a more efficient design than a design which ignores these differences (unstratified version).

Keywords

Sample size power two-stage trial patient preference stratified

1 Background

In the classic clinical trial setting, patients are randomized to one of multiple treatment arms, with the goal of assessing the efficacy of a particular treatment on a specified outcome. This effect is generally termed the treatment effect. However, this traditional design ignores the impact that a patient’s treatment preference may have on study outcomes. Rucker¹ proposed a two-stage trial design in order to measure the effect of patient preference. In this design, patients undergo a two-stage randomization process. First, patients are randomized to an option or random group. Those in the option group are given their preferred treatment (possibly after some informed decision-making process), while those in the random group are randomly assigned to a treatment through a second randomization. In addition to the treatment effect, we can estimate selection and preference effects from this design. The selection effect refers to the effect of a patient’s selection of his or her preferred treatment. Moreover, the preference effect is the additional change in outcome that results from the interaction between a patient’s preferred treatment and the treatment he/she actually receives.¹ In other words, this effect compares patients who receive their preferred treatment and those who do not. For example, when comparing a group therapy and medical intervention for depression, selection effects may be present if women tend to select the group therapy treatment option. In addition, preference effects may be manifested if patients who receive their desired treatment, either group therapy or medication, respond better than patients who do not. Estimating these effects may be important to investigators, particularly in cases where compliance may be related to a patient’s treatment preference. Additionally, patients may have a different psychological response to a treatment they deem to be more favorable,² especially in cases where they cannot be blinded to treatment assignment (e.g. surgical intervention and group therapy).

In order to use the model proposed by Rucker,¹ we must assume that the proportion of patients preferring a given treatment is constant across the entire patient population. For example, if we are considering a surgical versus a medical treatment for a heart condition, we would need to assume that the proportion of patients preferring surgery and the proportion preferring the medical intervention is the same for all individuals in the population, e.g. 30% and 70%, respectively. There are scenarios, however, when this assumption may be violated. For example, certain patients may have a higher affinity for a treatment; older patients may prefer a medical intervention versus a surgical intervention. A study by Clark et al.³ found that employed women were more likely to prefer a self-directed heart management system over a group format. This finding highlights that while several authors have made a call for the increased use of the two-stage randomized trial design to incorporate patient preferences,^4–13 there are still large gaps in methods available to design and ultimately analyze these trials. Therefore, to begin addressing these gaps, such as differential preference rates observed by Clark et al.,³ we propose a stratified version of Rucker’s two-stage trial design.¹ This new method will allow for the estimation of the treatment, selection, and preference effects separately within each stratum, as well as across the entire study population. Additionally, this method is expected to increase the efficiency of the design. By first stratifying patients, the observed variability in the specified outcome is expected to decrease, allowing for more precise estimates of each effect and a smaller overall sample size.

2 Motivating example

Hepatitis C virus (HCV) affects nearly 5 million people in the United States and results in $6.5 billion in disease-related costs annually.¹⁴ While there have been recent advances in HCV treatment, both in tolerability and duration, there are many infected people, especially drug and alcohol users, who do not seek the recommended treatment or have been denied treatment in the past. This trend could be influenced by many factors such as a lack of healthcare access or the social barriers and stigma that may be attached to these populations.

As an alternative to the traditional healthcare setting, mobile medical clinics (MMCs) have arisen as an attempt at removing many of the barriers that prevent patients from seeking treatment. National guidelines recommend treating HCV in a specialty hospital clinic; however, MMCs may provide a user-friendly alternative for patients with HCV. Since it would be all but impossible to blind patients to their healthcare setting, it is expected that patient preference for where they obtain their treatment will have a substantial impact on health-related quality of life (HRQoL), as well as play a major role in the initiation and continuation of treatment, and thus, their cure rate of HCV. The two-stage design first proposed by Rucker¹ is ideal to assess patient preference on these outcomes. However, we expect the proportion of patients preferring the MMC over a specialty hospital clinic to be substantially higher among patients who use alcohol or drugs. The current methods available for the design and analysis of two-stage clinical trials lack the ability to factor in stratification variables.^1,2,15

In the following paper, we propose an extension of Rucker’s preference design to allow for stratification. This method allows for the incorporation of important risk factors, such as alcohol and drug use status, that are expected to be related to a patient’s treatment preference or response. We begin by deriving a closed form sample size formula and demonstrating that the stratified study design can lead to a smaller sample size than its non-stratified counterpart.¹⁵ A series of simulation studies comparing the empirical Type I error with different nominal values was used to validate the assumptions of the model. Next, we demonstrate the efficiency of the stratified design by examining the study power. Finally, we demonstrate the applicability by designing a trial to assess the impact on HRQoL for patients seeking treatment for HCV.

3 Methods

3.1 Study design

For simplicity, we assume that all patients have a treatment preference (i.e. no undecided patients). Using similar notation as that proposed by Rucker, the response $y_{ijkl}$ for a patient k ( $k = 1, \dots, n$ ) in stratum l ( $l = 1, \dots, s)$ who receives treatment i $(i = 1, \dots, t)$ and prefers treatment j ( $j = 1, \dots, t$ ) is modeled using the following equation

\begin{matrix} y_{ijkl} = μ + μ_{l} + τ_{il} + ν_{jl} + π_{ijl} + ε_{ijkl} \end{matrix}

where

μ_{l}

represents the overall mean in stratum l,

τ_{il}

is the treatment effect of treatment i in stratum l,

ν_{jl}

is the selection effect for preferred treatment j in stratum l,

π_{ijl}

is the preference effect for treatment i and preferred treatment j in stratum l, and

ε_{ijkl}

is the random error term. We assume the following: the

y_{ijkl}

follow a normal distribution; the

ε_{ijkl}

are normally distributed with constant variance; without loss of generalizability, there are only two treatments of interest; and the parameters retain the necessary constraints for each of the strata

\begin{matrix} \sum_{l = 1}^{s} μ_{l} = 0 \\ τ_{1 l} + τ_{2 l} = 0 for all l = 1, \dots, s \\ φ_{l} ν_{1 l} + (1 - φ_{l}) ν_{2 l} = 0 for all l = 1, \dots, s \\ φ_{l} π_{i 1 l} + (1 - φ_{l}) π_{i 2 l} = 0 for all l = 1, \dots, s \\ π_{1 jl} + π_{2 jl} = 0 for all l = 1, \dots, s \end{matrix}

where

φ_{l}

denotes the proportion of patients preferring treatment 1 in stratum l and (1 −

φ_{l}

) denote the proportion of patients preferring treatment 2 in stratum l.

We use $μ_{il}$ to represent the mean response of patients randomly assigned to treatment i in stratum l. Similarly, $σ_{il}^{2}$ is used to represent the variance of patients randomly assigned to treatment i in stratum l. In addition, $μ_{ijl}$ and $σ_{ijl}^{2}$ are used to represent the mean response and variance of patients receiving treatment i and preferring treatment j in stratum l. We are able to estimate $μ_{11 l}$ and $μ_{22 l}$ from stratum l in the option arm. However, $μ_{12 l}$ and $μ_{21 l}$ are unobserved and can only be estimated from the following equations

\begin{matrix} μ_{12 l} = \frac{μ_{1 l} - φ_{l} μ_{11 l}}{1 - φ_{l}} \\ μ_{21 l} = \frac{μ_{2 l} - (1 - φ_{l}) μ_{22 l}}{φ_{l}} \end{matrix}

These equations are premised on the principles of randomization – we will achieve equipoise through randomization and thus participants in the random arm will have the same preference rate as those in the choice arm.

Using the following notation

\begin{matrix} μ_{l} = \frac{μ_{1 l} + μ_{2 l}}{2} \\ τ_{il} = μ_{il} - μ_{l} \\ ν_{jl} = \frac{μ_{1 jl} + μ_{2 jl}}{2} - μ_{l} \\ π_{ijl} = μ_{ijl} - τ_{il} - ν_{jl} - μ_{l} \end{matrix}

we can estimate the treatment, selection and preference effects within each strata. Finally, we can construct stratum-specific estimators for the differences in selection, preference, and treatment effects, respectively¹

\begin{matrix} Δ ν_{l} = ν_{1 l} - ν_{2 l} = \frac{μ_{11 l} + μ_{21 l} - μ_{12 l} - μ_{22 l}}{2} \\ Δ π_{l} = \frac{(π_{11 l} + π_{22 l}) - (π_{12 l} + π_{21 l})}{2} = \frac{μ_{11 l} - μ_{21 l} - μ_{12 l} + μ_{22 l}}{2} \\ Δ τ_{l} = τ_{1 l} - τ_{2 l} = μ_{1 l} - μ_{2 l} \end{matrix}

Since the strata are independent, these estimators can be used to determine the difference in the preference, treatment, and selection effects separately in each stratum. We can also construct an overall test statistic for the preference, selection, and treatment effects that accounts for stratification.

3.2 Stratified test statistic

Efficient testing of the preference, selection, and treatment effects in a stratified preference trial design requires adjustment of the test statistic to account for stratification. Adopting a method proposed by Rosenberger and Lachin,¹⁶ we construct an overall test statistic through a weighted sum of the stratum-specific test statistics. The proposed weights were chosen according to the proportion of patients in each stratum.

We let the random variable $m_{il}$ be the number of patients in the option arm of stratum l choosing treatment i and $n_{il}$ be the number of patients randomized to treatment i in stratum l. Similarly, m_l and n_l represent, respectively, the total number of patients in the option and random arm in stratum l. Given an overall sample size of N patients, the number of patients assigned to the choice arm is given by $θ N$ , where θ denotes the proportion of patients assigned to the choice arm. If $ξ_{l}$ represents the proportion of patients in stratum l, $θ ξ_{l} N$ gives the number of patients in the choice arm within each stratum. Note that $\sum_{l = 1}^{s} ξ_{l} = 1$ . Figure 1 shows the distribution of patients among the various strata and treatment arms. We assume equal allocation to the two treatments for each stratum in the random arm.

Figure 1.

Schematic of the stratified two-stage randomized preference design.

We calculate the stratum-specific test statistics using Rucker’s methods.¹ The observed response for an individual k in option group i of stratum l is denoted $x_{ikl}$ , and the observed response of an individual in random group i of stratum l is denoted $y_{ikl}$ . Let $z_{il} = \sum_{k = 1}^{m_{il}} x_{ikl} - m_{il} \bar{y_{il}}$ . Then the stratum specific test statistics for the selection, preference, and treatment effects are given as follows

\begin{matrix} \hat{Δ ν_{l}} = \frac{z_{1 l} - z_{2 l}}{\sqrt{Var (z_{1 l} - z_{2 l})}} \\ \hat{Δ π_{l}} = \frac{z_{1 l} + z_{2 l}}{\sqrt{Var (z_{1 l} + z_{2 l})}} \\ \hat{Δ τ_{l}} = \frac{\bar{y_{1 l}} - \bar{y_{2 l}}}{\sqrt{\frac{Var (y_{1 kl})}{n_{1 l}} + \frac{Var (y_{2 kl})}{n_{2 l}}}} \end{matrix}

with

\begin{matrix} Var (z_{1 l} \pm z_{2 l}) = Var (z_{1 l}) + Var (z_{2 l}) \pm 2 Cov (z_{1 l}, z_{2 l}) \end{matrix}

and

\begin{matrix} Var (z_{il}) = m_{il} Var (x_{ikl}) + (1 + \frac{m_{l} - 1}{m_{l}} m_{il}) m_{il} \frac{Var (y_{ikl})}{n_{il}} + \frac{m_{1 l} m_{2 l}}{m_{l}} (\bar{x_{il}} - \bar{y_{il}})^{2} \\ Cov (z_{1 l}, z_{2 l}) = - \frac{m_{1 l} m_{2 l}}{m_{l}} (\bar{x_{1 l}} - \bar{y_{1 l}}) (\bar{x_{2 l}} - \bar{y_{2 l}}) \end{matrix}

The overall test statistic is computed as the weighted sum of the above stratum-specific statistics.

\begin{matrix} \hat{Δ ν} = \sum_{l = 1}^{s} w_{l} \hat{Δ ν_{l}} \hat{Δ π} = \sum_{l = 1}^{s} w_{l} \hat{Δ π_{l}} \\ \hat{Δ τ} = \sum_{l = 1}^{s} w_{l} \hat{Δ τ_{l}} \end{matrix}

where

w_{l} = \frac{m_{l} + n_{l}}{N}

The stratum-specific test statistics are approximately normally distributed¹; therefore, we expect the overall test statistics for the stratified preference design to also be approximately normally distributed.

3.3 Sample size formula

To determine an appropriate sample size formula to account for stratification in the two-stage randomization design, the variance of each the selection and preference effects was derived (details in Appendix 1). Under the assumptions of equal variances in all study groups within a given stratum (i.e. $σ_{11 l}^{2} = σ_{22 l}^{2} = σ_{1 l}^{2} = σ_{2 l}^{2} = σ_{l}^{2}$ for each stratum l), and that both the preference and selection effects are the same across all strata, the formula for the variance of the selection effect reduces to

\begin{matrix} Var (Δ ν) = \frac{1}{4 θ N} \sum_{l = 1}^{s} \frac{ξ_{l}}{φ_{l}^{2} (1 - φ_{l})^{2}} [σ_{l}^{2} + φ_{l} (1 - φ_{l}) [(2 φ_{l} - 1) Δ ν + Δ π]^{2} + 2 (\frac{θ}{1 - θ}) σ_{l}^{2} (φ_{l}^{2} + (1 - φ_{l})^{2})] \end{matrix}

To construct a sample size formula from these variances, we can use the standard sample size formula for detecting a difference in two means (δ) with power β and Type I error rate of α.¹⁷ Assuming the outcomes are approximately normally distributed, this formula is given by

\begin{matrix} N = \frac{2 var (δ) (Z_{1 - β} + Z_{1 - \frac{α}{2}})^{2}}{δ^{2}} \end{matrix}

Substituting the above variance for the stratified selection effect gives the necessary sample size to detect an effect of $Δ ν$ .

\begin{matrix} N_{ν} = \frac{(Z_{1} - α / 2 + Z_{1} - β)^{2}}{4 θ Δ ν^{2}} \sum_{l = 1}^{s} \frac{ξ_{l}}{φ_{l}^{2} (1 - φ_{l})^{2}} [σ_{l}^{2} + φ (1 - φ_{l}) [(2 φ_{l} - 1) Δ ν + Δ π]^{2} + 2 (\frac{θ}{1 - θ}) σ_{l}^{2} (φ_{l}^{2} + (1 - φ_{l})^{2})] \end{matrix}

Similar to the selection effect, we can derive the variance of the preference effect. Under the assumption of equal variances and effects across a given stratum, the preference effect variance is given by

\begin{matrix} Var (Δ π) = \frac{1}{4 θ N} \sum_{l = 1}^{s} \frac{ξ_{l}}{φ_{l}^{2} (1 - φ_{l})^{2}} [σ_{l}^{2} + φ_{l} (1 - φ_{l}) [(2 φ_{l} - 1) Δ π + Δ ν]^{2} + 2 (\frac{θ}{1 - θ}) σ_{l}^{2} (φ_{l}^{2} + (1 - φ_{l})^{2})] \end{matrix}

Using the standard sample size formula to detect the difference between two means, we can find the corresponding sample size to detect a preference effect of $Δ π$

\begin{matrix} N_{π} = \frac{(Z_{1} - α / 2 + Z_{1} - β)^{2}}{4 θ Δ π^{2}} \sum_{l = 1}^{s} \frac{ξ_{l}}{φ_{l}^{2} (1 - φ_{l})^{2}} [σ_{l}^{2} + φ_{l} (1 - φ_{l}) [(2 φ_{l} - 1) Δ π + Δ ν]^{2} + 2 (\frac{θ}{1 - θ}) σ_{l}^{2} (φ_{l}^{2} + (1 - φ_{l})^{2})] \end{matrix}

Finally, the variance of the treatment effect follows the same structure as a traditional stratified clinical design with an additional inflation factor of $\frac{1}{1 - θ}$ . As above, if we assume equal variances and treatment effects across all strata, this variance is given by

\begin{matrix} Var (Δ τ) = \frac{4}{N (1 - θ)} \sum_{i = 1}^{s} ξ_{l} σ_{l}^{2} \end{matrix}

The corresponding sample size needed to detect a treatment effect of $Δ τ$ is shown below.

N_{τ} = \frac{4 (Z_{1 - β} + Z_{1 - \frac{α}{2}})^{2}}{(1 - θ) Δ τ^{2}} \sum_{i = 1}^{s} ξ_{l} σ_{l}^{2}

Unless the main focus is on only one of these effects in particular (e.g. the preference effect), the sample size for the trial is chosen to be the largest of $N_{τ}$ , $N_{ν}$ and $N_{π}$ to ensure sufficient power to test all three effects. A corrected Type I error rate should be utilized to account for the testing of all three hypotheses. A Bonferroni corrected value may be one conservative choice for α, although other less conservative corrections may also be used.

3.4 Example

We demonstrate the applicability of these methods using our motivating example. The hypothesis of the HCV study is that those receiving care at MMCs will have a greater increase in HRQoL as measured by the mental component summary (MCS) of the SF-12¹⁸ compared to those receiving care from the specialty clinic. In addition, those preferring a particular route of care (e.g. MMC) and receiving it will have an even greater improvement in HRQoL (preference effect). We assume that alcohol and drug users will prefer MMCs 70% of the time, while non-alcohol and non-drug users will prefer MMCs 50% of the time, with 30% of individuals being alcohol and drug users. We expect non-users to have a higher baseline MCS than users (45 and 40, respectively), but that they will experience the same increase in MCS. We assume that drug and alcohol users who receive treatment via the specialty clinic will have an average MCS of 45, while those receiving treatment via MMC will have an average MCS of 47. Additionally, we assume that patients using alcohol who prefer MMCs and get treatment via MMCs will have an additional 5-point increase in MCS. We also assume that patients who use alcohol and prefer the specialty clinic will have an additional 2-point increase in MCS. Finally, we assume a variance for MCS of 10 in both strata. Based on these assumptions, we can derive the means and effects for each of the two strata, drug and alcohol users (Table 2(a)) and non-drug and alcohol users (Table 2(b)).

Table 1.

Empirical Type I error (probability of rejecting the null hypothesis) for preference and selection effects from 10,000 simulations with $ξ_{l} = 0.5$ , $μ_{l} = τ_{l} = ν_{l} = π_{l} = 0$ for $l = 1, 2$ and N = 200.

		Test for preference		Test for selection
$φ_{1}$	$φ_{2}$	α = 0.05	α = 0.01	α = 0.05	α = 0.01
0.1	0.1	0.0530	0.0104	0.0534	0.0105
	0.2	0.0528	0.0136	0.0538	0.0126
	0.3	0.0512	0.0112	0.0495	0.0104
	0.4	0.0551	0.0120	0.0543	0.0122
	0.5	0.0488	0.0118	0.0533	0.0097
	0.6	0.0528	0.0120	0.0541	0.0114
	0.7	0.0513	0.0099	0.0522	0.0113
	0.8	0.0554	0.0093	0.0520	0.0113
	0.9	0.0472	0.0114	0.0525	0.0124
0.2	0.2	0.0558	0.0113	0.0525	0.0108
	0.3	0.0540	0.0125	0.0510	0.0100
	0.4	0.0531	0.0104	0.0543	0.0116
	0.5	0.0523	0.0107	0.0488	0.0117
	0.6	0.0525	0.0101	0.0497	0.0089
	0.7	0.0491	0.0110	0.0516	0.0100
	0.8	0.0531	0.0107	0.0523	0.0136
	0.9	0.0533	0.0112	0.0562	0.0121
0.3	0.3	0.0503	0.0098	0.0523	0.0116
	0.4	0.0480	0.0090	0.0476	0.0105
	0.5	0.0494	0.0102	0.0480	0.0103
	0.6	0.0482	0.0095	0.0502	0.0110
	0.7	0.0521	0.0087	0.0489	0.0102
	0.8	0.0521	0.0103	0.0534	0.0109
	0.9	0.0499	0.0082	0.0478	0.0111
0.4	0.4	0.0498	0.0109	0.0477	0.0086
	0.5	0.0506	0.0108	0.0511	0.0098
	0.6	0.0479	0.0088	0.0508	0.0100
	0.7	0.0507	0.0101	0.0507	0.0098
	0.8	0.0513	0.0099	0.0501	0.0099
	0.9	0.0547	0.0086	0.0520	0.0100
0.5	0.5	0.0501	0.0093	0.0491	0.0101
	0.6	0.0503	0.0102	0.0502	0.0106
	0.7	0.0506	0.0101	0.0524	0.0106
	0.8	0.0500	0.0101	0.0492	0.0118
	0.9	0.0544	0.0118	0.0520	0.0101
0.6	0.6	0.0502	0.0101	0.0485	0.0105
	0.7	0.0489	0.0100	0.0492	0.0107
	0.8	0.0514	0.0107	0.0486	0.0108
	0.9	0.0508	0.0098	0.0500	0.0092
0.7	0.7	0.0486	0.0109	0.0520	0.0100
	0.8	0.0495	0.0103	0.0553	0.0121
	0.9	0.0522	0.0096	0.0511	0.0121
0.8	0.8	0.0544	0.0097	0.0542	0.0089
	0.9	0.0511	0.0123	0.0524	0.0141
0.9	0.9	0.0527	0.0108	0.0531	0.0105

Table 2.

Derivation of the effect size estimates for mental component summary (MCS) by alcohol and drug use status.

	No choice (random)	Choice		No choice
Actual treatment	Mean	Preferred specialty	Preferred MMC	Preferred specialty	Preferred MMC
(a) Mean MSC and effect sizes for stratum of drug and alcohol users
Specialty clinic	45	47	–	47	(45−0.3 × 47)/0.7 = 44.1
MMC	47	–	52	(47−0.7 × 52)/0.3 = 35.3	52
				Average received preferred treatment = 49.5	Average did not receive preferred treatment = 39.7
				Preference effect = 49.5−39.7 = 9.8
				Average prefer specialty = 41.2	Average prefer MMC = 48.1
				Selection effect = 41.2−48.1 = −6.9
(b) Mean MCS and effect sizes for stratum of non-drug and alcohol users
Specialty clinic	50	52	–	52	(50−0.5 × 52)/0.5 = 48
MMC	52	–	57	(52−0.5 × 57)/0.5 = 47	57
				Average received preferred treatment = 54.5	Average did not receive preferred treatment = 47.5
				Preference effect = 54.5−47.5 = 7.0
				Average prefer specialty = 49.5	Average prefer MMC = 52.5
				Selection effect = 49.5−52.5 = −3

MCS: mental component summary; MMC: mobile medical clinic.

If we assume a Type I error rate of 5% and power the trial at both 80% and 90%, we can calculate the sample sizes to detect a treatment effect of

Δ τ = - 2

, a preference effect of

Δ π = 7.8

, and a selection effect of

Δ ν = - 4.2

(calculated as a weighted average of the stratum-specific effects) (Table 3). The weighted average of the stratum-specific preference rates was also taken as the unstratified preference rate (

φ = 0.56

). The variance of the unstratified design was calculated using a normal mixture model. Selecting the largest of the three sample sizes, we would need 156 subjects at 80% power and 210 subjects at 90% power for the stratified design as opposed to 260 and 356, respectively, for the unstratified design. This translates to an approximately 25% reduction in sample size for the stratified versus the unstratified design.

Table 3.

Required sample sizes for treatment, selection, and preference effect using a two-stage randomized design for HCV patients receiving treatment at either a specialty or MMC clinic, under the stratified and unstratified design.

	80% Power		90% Power
	Stratified	Unstratified	Stratified	Unstratified
$N_{τ}$	156	260	210	346
$N_{π}$	28	38	38	50
$N_{ν}$	140	172	186	232

3.5 Simulations

We used multiple sets of simulations to verify the properties of the stratified two-stage design. For all scenarios, we assumed an even distribution of subjects between the option and random arm ( $θ = 0.5$ ). We generated 10,000 simulations for each of the scenarios and all simulations were carried out using R version 3.2.2.

3.5.1 Type I error

For all scenarios, we allowed for two strata, varying the proportion of patients within stratum 1 ( $ξ_{1} = 0.1, 0.2, 0.3, 0.4, 0.5$ ). Data were generated under the normal distribution assuming a variance of 1 in each stratum. All stratum means were set to 0 to evaluate the performance of the model in the absence of treatment, preference, or selection effects. We varied the overall sample size N (200, 250, 300, 350, 400) and the preference rate $(φ_{l})$ in the two strata (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9). The evaluation of the simulations was based on Type I error rates for the selection and preference effects. We present empirical Type I error results for nominal values of α = 0.01 and α = 0.05 in Table 1 for a sample size of N = 200 and an equal distribution of patients in the strata ( $ξ_{l} = 0.5$ for $l = 1, 2$ ).

As Table 1 shows, the empirical results are very close to the nominal values, thus demonstrating the tests perform well and that the assumptions of the proposed test are reasonable. Similar results were obtained when the proportion of patients in the first strata ( $ξ_{1})$ was varied between 0.1 and 0.5 and the sample size was varied between N = 200 and N = 400 (data not shown).

3.5.2 Efficiency

We demonstrate the potential gains in power of the stratified preference design compared to the unstratified counterpart. First, the treatment effect was evaluated in the absence of preference and selection effects. We generated a treatment effect in one stratum twice that of the second stratum. To do this, we set $μ_{2} = 1$ in one stratum and $μ_{2} = 2$ in the second; $μ_{1}$ was varied from one to two times the value of $μ_{2}$ in increments of 0.1 in both strata. In addition, the proportion increase of receiving a preferred treatment in both strata was set to be 0 for both treatment options (i.e. no preference effect). With these specifications, the treatment effect was varied from 0 to 1.5 in increments of 0.15. We assumed equal allocation between the two strata ( $ξ_{l} = 0.5$ for all l).

Subsequently, the behavior of the preference and selection effects was investigated after removing the treatment effect. We set $μ_{1} = μ_{2} = 1$ in one strata and $μ_{1} = μ_{2} = 2$ in the second strata. The proportion increase in means of receiving a preferred treatment in both strata was set to be 0 for one treatment option. The proportion increase in means for the second treatment option was varied between 0 and 1 in increments of 0.1, keeping the value constant across both strata. These parameter values produced a selection and preference effect in the first stratum that was twice the effect in the second stratum. Under these conditions, the preference and selection effects were generated to be equal to each other within each stratum and to range from 0 to 1.5 in increments of 0.15. Again, the data were generated from a normal distribution with variance $σ_{l}^{2} = 1$ for all l, assuming equal allocation between the choice and random arms ( $θ = 0.5$ ) and between the two strata ( $ξ_{l} = 0.5$ for l = 1,2), as well as a preference rate of 0.5 in both strata ( $φ_{l} = 0.5$ for l = 1,2).

The results for an overall sample size of N = 200, 300, 400 for Type I error rates of 0.01 and 0.05 are shown in Figure 2. The solid lines indicate the power calculated after adjusting for stratification, while the dotted lines show the corresponding power when an unstratified design is used.

Figure 2.

Simulated power of stratified and unstratified trial designs for treatment, preference, and selection effects when stratum differences exist.

As the curves in Figure 2 show, using a stratified design results in improved power when different strata exhibit different preference, selection, or treatment effects. Additional simulations were run assuming three strata with equal distribution of subjects across the strata. These results are in accordance with the results presented for two strata although the data are not shown.

3.5.3 Sample size

Finally, to compare the efficiency of the stratified design with the unstratified case, the sample size for the preference effect for both designs was computed under a series of different scenarios. For all cases, one stratum was assumed to have a preference and selection effect of 0.5 ( $ν_{1} = 0.5$ , $π_{1} = 0.5$ ), while the preference and selection effect in the second stratum was increased from 0.5 to 4 in increments of 0.5. The allocation between the two strata was varied ( $ξ_{1} = 0.1, 0.2, 0.3, 0.4, 0.5$ ), along with the preference rate in each strata ( $φ_{l} = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9$ ). In all cases, a significance level of 0.05 and a power of 80% were used.

In the first scenario, the treatment effect and overall mean were assumed to be 0 in both strata ( $μ_{l} = 0$ , $τ_{l} = 0$ for l = 1,2) to evaluate the efficiency of the design when only preference and selection effects are present. In addition, a variance of $σ_{l}^{2} = 1$ was assumed within each stratum. The corresponding variance of the entire unstratified sample was derived assuming a normal mixture model. Figure 3 presents the log of the ratio of the required sample size under the stratified model to the sample size under the unstratified model (i.e. the stratified model is more efficient when the log of the ratio is below 0). The case where the individuals are distributed equally between the two strata is presented (i.e. $φ_{l} = 0.5 for l = 1, 2$ ), although varying the distribution of individuals across the strata produced similar results and are not shown. The stratum 1 preference rate is varied across panels (a)–(d) (0.1, 0.3, 0.5 and 0.7, respectively), while the stratum 2 preference rate is varied from 0.1 to 0.9 within each panel.

Figure 3.

Ratio of required sample size for the preference effect for the stratified and unstratified designs with no overall mean or treatment effect with φ_l = 0.5, Δπ_l = 0.5 in stratum 1 for 80% power and α = 0.05. The stratum 1 preference rate is varied across panels (a)–(d); the stratum 2 preference rate is varied within each panel.

As can be seen in Figure 3(a), the stratified design tends to do poorly (compared to the unstratified design) when the preference rate in one of the stratum is extreme (0.1). Even when the preference rate in one are more moderate (e.g. 0.3, 0.5, 0.7) (Figure 3(b)–(d)), the stratified model does not perform as well as the unstratified model when the preference rate in the other arm is extreme (0.1 or 0.9). Only when the preference rate is similar in both arms (Figure 3(a)) or when the preference rates are both moderate (Figure 3(b)–(d)) is the stratified version more efficient. When there is no overall treatment effect, the efficiency of the stratified design is most clearly seen when large differences exist between the preference effect sizes, and consequently, the means, of the two strata.

In the second scenario, the role of the treatment effect was considered by assuming a treatment effect of 1 in one stratum and 2 in the other ( $τ_{1} = 1, τ_{2} = 2$ ). A variance of $σ_{l}^{2} = 1$ was assumed in each stratum. Again, the results are shown for equal distribution between the two strata, but similar results were obtained for other allocations (data not shown).

Once again, we see that the unstratified design is most often more efficient than the stratified design when the preference rate in one stratum is extreme (Figure 4(a)). However, with the introduction of a stratum-specific treatment effect, the variance of the unstratified design is inflated and the stratified design becomes more efficient for moderate preference rates ( $φ_{l}$ between 0.2 and 0.8) (Figure 4(b)–(d)). We also see that the stratified design is more efficient when the means of the two strata are highly disparate for almost all combinations of the stratum 1 and stratum 2 preference rates (Figure 4(a)–(d)).

Figure 4.

Ratio of required sample size for the preference effect for stratified and unstratified designs with stratum differences in treatment effect with φ_l = 0.5, Δπ_l = 0.5 in stratum 1 at 80% power and α = 0.05. The stratum 1 preference rate is varied across panels (a)–(d); the stratum 2 preference rate is varied within each panel.

Finally, the efficiency of the design was investigated when the stratum variances are unequal ( $σ_{1}^{2} = 1, σ_{2}^{2} = 0.5$ ). Both the overall mean and treatment effect were set to 0 for both strata. Once again, we assume equal allocation between the two strata.

In general, Figure 5 illustrates similar results: the stratified design tends to perform best when the preference rates in both strata are moderate and the differences in stratum means, and the corresponding variances, are high.

Figure 5.

Ratio of required sample size for the preference effect for stratified and unstratified designs with stratum differences in variance with φ_l = 0.5, Δπ_l = 0.5 in stratum 1 at 80% power and α = 0.05. The stratum 1 preference rate is varied across panels (a)–(d); the stratum 2 preference rate is varied within each panel.

Of note, we also evaluated the selection effect under the same conditions as the preference effect and observed similar results (data not shown).

4 Discussion

The doubly randomized two-stage trial design proposed by Rucker¹ is an ideal design for addressing the inclusion of patient preference in making treatment decisions. This design enables the separation of treatment effects from the effects that result from patients choosing their treatment. In addition, the two-stage randomized trial design more accurately reflects clinical experience than the completely randomized design. These trials are especially relevant in behavioral intervention studies where, depending on the arm, different demands may be made on a participant’s time. The motivation of a patient to follow a treatment may be influenced by a preference a patient has before beginning any course of action.¹⁹ Allowing the patient to choose the intervention that is most suitable, in terms of level of commitment and willingness to participate in the task, may lead to enhanced motivation and thus efficacy of the intervention.^3,19 Additionally, the accurate estimation of these effects may be important when compliance and adherence to a treatment may be related to a patient’s outcome.⁹ In cases where equivalence has already been demonstrated among multiple treatments, accurate estimation of the selection and preference effects may help clinicians to better treat patients. For example, in relapsing–remitting multiple sclerosis, shared decision-making models promote better outcomes and are especially important for patients for which multiple reasonable treatment options exist, with none clearly outperforming the others.²⁰ However, the original design fails to account for differences in preference rates among different population subgroups. In particular, there may be different factors that cause certain people to have a higher preference for one treatment over another. For example, in the HCV study discussed above, patients who use drugs or alcohol are expected to have a stronger preference for MMCs versus treatment in a traditional healthcare setting. Additionally, these factors may result in different treatment effect responses. In this case, a stratified preference trial design will be able to account for these differing rates while increasing efficiency, reducing the required sample size, and offering methods to calculate the treatment, preference, and selection effects both within each stratum and overall.

The stratified preference trial design may be especially applicable in cases where the effect of two treatments is likely to be similar. In these scenarios, preference and selection effects may play a vital role in prescribing a treatment to a certain patient. In the example of a trial investigating HCV treatments, the differences in treatment effect between specialty hospital clinics and MMCs may be extremely small. However, because of various barriers that may prevent people using alcohol or drugs from initiating or adhering to a hospital treatment, allowing patients a choice in where to get treatment may allow for a significant increase in HRQoL, as measured by the SF-12. Further, it could have a substantial impact on the overall treatment goal of achieving sustained virologic response.

Our results indicate that not accounting for subgroups with different treatment responses and preference rates increases variability, decreases power and requires a greater sample size. Using a stratified version of a preference design enables minimization of these effects. As our results indicate, making use of a stratified design when there may be differences in preference rates or treatment responses allows for an increase in efficiency from the unadjusted design and analysis. This increase in efficiency translates to a smaller sample size needed to estimate each of the treatment, selection, and preference effects. While we have shown that the stratified design is not always more efficient than the unstratified design, especially under scenarios of extreme preference rates, these are the situations for which we would not recommend a stratified preference design, or we would recommend the selection of a different stratification variable.

One of the simplifying assumptions used in the construction of the sample size formula is that the preference, selection, and treatment effects are the same across all strata. The next step in this work is to create a test to determine the validity of this assumption and to further generalize these methods to incorporate situations with unequal effect sizes across the strata. Another important consideration for future development is the optimal allocation between the choice and random arms in the initial randomization. In this paper, equal randomization between the arms was assumed; however, previous work has indicated that this allocation ratio may not be optimal.² Future work may focus on determining the optimal allocation for the stratified design. In addition, future work is needed to extend these methods to other outcome distributions, such as binomial and survival outcomes. The methods proposed in this paper assume normally distributed, continuous data. Further, this study assumes that all patients in the choice arm have a treatment preference. In the future, these methods may also be extended to allow for the possibility of a patient not having a preference between the studied treatments. Finally, it is an important assumption of preference trial designs that the process of being randomized to either a choice or random arm does not influence a patient’s preference and response.¹ This assumption may cause concern in cases where being randomized to a particular treatment, as opposed to being allowed to choose a treatment, affects a patient’s response.

Footnotes

Acknowledgements

The authors would also like to thank Dr Peter Peduzzi for his thoughtful comments.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by Yale’s Clinical and Translational Science Award (NIH: UL1TR000142).

References

Rucker

. A two-stage trial design for testing treatment, self-selection and treatment preference effects. Stat Med 1989; 8: 477–485.

Walter

Turner

Macaskill

et al.

Optimal allocation of participants for the estimation of selection preference and treatment effects in the two-stage randomised design. Stat Med 2012; 31: 1307–1322.

Clark

Janz

Dodge

et al.

The effect of patient choice of intervention on health outcomes. Contempor Clin Trials 2008; 29: 679–686.

Bradley

. Designing medical and educational intervention studies: a review of some alternative to conventional randomized controlled trials. Diabetes Care 1993; 16: 509–518.

Halpern

. Evaluating preference effects in partially unblinded, randomized clinical trials. J Clin Epidemiol 2003; 56: 109–115.

Long

Little

Lin

. Causal inference in hybrid intervention trials involving treatment choice. J Am Stat Assoc 2008; 103: 474–484.

Main

. What is the best airway clearance technique in cystic fibrosis? Paedistr Respirat Rev 2013; 145: 10–12.

McPherson

Britton

Wennberg

. Are randomized controlled trials controlled? Patient preferences and unblind trials. J R Soc Med 1997; 90: 652–656.

McPherson

Britton

. Preferences and understanding their effects on health. Q Health Care 2001; 10(Suppl. I): i61–i66.

10.

Shang

. Understanding patient values and the manifestations in clinical research with traditional Chinese medicine – with practical suggestion for trial design and implementation. Evid Complement Alternat Med 2013; 2013: 847273–847273.

11.

Prady

Burch

Crouch

et al.

Insufficient evidence to determine the impact of patient preferences on clinical outcomes in acupuncture trials: a systematic review. J Clin Epidemiol 2013; 66: 308–318.

12.

Sidani

Fox

Epstein

. Conducting a two-stage preference trial: utility and challenges. Int J Nurs Stud 2015; 52: 1017–1024.

13.

Silverman

Altman

. Patients’ preferences and randomised trials. Lancet 1996; 347: 171–174.

14.

Razavi

Elkhoury

Elbasha

et al.

Chronic hepatitis C virus (HCV) disease burden and cost in the United States. Hepatology 2013; 57: 2164–2170.

15.

Turner

Walter

Macaskill

et al.

Sample size and power when designing a randomized trial for the estimation of treatment, selection, and preference effects. Med Decis Mak 2014; 34: 711–719.

16.

Rosenberger

Lachin

. Randomization in clinical trials: theory and practice, New York: Wiley & Sons, 2002.

17.

Friedman

Furberg

DeMets

. Fundamentals of clinical trials., 3rd ed. St. Louis, MO: Mosby – Year Book, 1996.

18.

Ware

Kosinski

Keller

. A 12-item short-form health survey: construction of scales and preliminary tests of reliability and validity. Med Care 1996; 34: 220–233.

19.

Brewin

Bradley

. Patient preferences and randomised clinical trias. Br Med J 1989; 299: 313–315.

20.

Barclay L. Multiple sclerosis discovery forum, http://www.msdiscovery.org/news/news_briefs/12714-evaluating-patient-preferences-key-ms-decision-making (2014, accessed 20 October 2016).