Abstract
The two-stage (or doubly) randomized preference trial design is an important tool for researchers seeking to disentangle the role of patient treatment preference on treatment response through estimation of selection and preference effects. Up until now, these designs have been limited by their assumption of equal preference rates and effect sizes across the entire study population. We propose a stratified two-stage randomized trial design that addresses this limitation. We begin by deriving stratified test statistics for the treatment, preference, and selection effects. Next, we develop a sample size formula for the number of patients required to detect each effect. The properties of the model and the efficiency of the design are established using a series of simulation studies. We demonstrate the applicability of the design using a study of Hepatitis C treatment modality, specialty clinic versus mobile medical clinic. In this example, a stratified preference design (stratified by alcohol/drug use) may more closely capture the true distribution of patient preferences and allow for a more efficient design than a design which ignores these differences (unstratified version).
1 Background
In the classic clinical trial setting, patients are randomized to one of multiple treatment arms, with the goal of assessing the efficacy of a particular treatment on a specified outcome. This effect is generally termed the treatment effect. However, this traditional design ignores the impact that a patient’s treatment preference may have on study outcomes. Rucker 1 proposed a two-stage trial design in order to measure the effect of patient preference. In this design, patients undergo a two-stage randomization process. First, patients are randomized to an option or random group. Those in the option group are given their preferred treatment (possibly after some informed decision-making process), while those in the random group are randomly assigned to a treatment through a second randomization. In addition to the treatment effect, we can estimate selection and preference effects from this design. The selection effect refers to the effect of a patient’s selection of his or her preferred treatment. Moreover, the preference effect is the additional change in outcome that results from the interaction between a patient’s preferred treatment and the treatment he/she actually receives. 1 In other words, this effect compares patients who receive their preferred treatment and those who do not. For example, when comparing a group therapy and medical intervention for depression, selection effects may be present if women tend to select the group therapy treatment option. In addition, preference effects may be manifested if patients who receive their desired treatment, either group therapy or medication, respond better than patients who do not. Estimating these effects may be important to investigators, particularly in cases where compliance may be related to a patient’s treatment preference. Additionally, patients may have a different psychological response to a treatment they deem to be more favorable, 2 especially in cases where they cannot be blinded to treatment assignment (e.g. surgical intervention and group therapy).
In order to use the model proposed by Rucker, 1 we must assume that the proportion of patients preferring a given treatment is constant across the entire patient population. For example, if we are considering a surgical versus a medical treatment for a heart condition, we would need to assume that the proportion of patients preferring surgery and the proportion preferring the medical intervention is the same for all individuals in the population, e.g. 30% and 70%, respectively. There are scenarios, however, when this assumption may be violated. For example, certain patients may have a higher affinity for a treatment; older patients may prefer a medical intervention versus a surgical intervention. A study by Clark et al. 3 found that employed women were more likely to prefer a self-directed heart management system over a group format. This finding highlights that while several authors have made a call for the increased use of the two-stage randomized trial design to incorporate patient preferences,4–13 there are still large gaps in methods available to design and ultimately analyze these trials. Therefore, to begin addressing these gaps, such as differential preference rates observed by Clark et al., 3 we propose a stratified version of Rucker’s two-stage trial design. 1 This new method will allow for the estimation of the treatment, selection, and preference effects separately within each stratum, as well as across the entire study population. Additionally, this method is expected to increase the efficiency of the design. By first stratifying patients, the observed variability in the specified outcome is expected to decrease, allowing for more precise estimates of each effect and a smaller overall sample size.
2 Motivating example
Hepatitis C virus (HCV) affects nearly 5 million people in the United States and results in $6.5 billion in disease-related costs annually. 14 While there have been recent advances in HCV treatment, both in tolerability and duration, there are many infected people, especially drug and alcohol users, who do not seek the recommended treatment or have been denied treatment in the past. This trend could be influenced by many factors such as a lack of healthcare access or the social barriers and stigma that may be attached to these populations.
As an alternative to the traditional healthcare setting, mobile medical clinics (MMCs) have arisen as an attempt at removing many of the barriers that prevent patients from seeking treatment. National guidelines recommend treating HCV in a specialty hospital clinic; however, MMCs may provide a user-friendly alternative for patients with HCV. Since it would be all but impossible to blind patients to their healthcare setting, it is expected that patient preference for where they obtain their treatment will have a substantial impact on health-related quality of life (HRQoL), as well as play a major role in the initiation and continuation of treatment, and thus, their cure rate of HCV. The two-stage design first proposed by Rucker 1 is ideal to assess patient preference on these outcomes. However, we expect the proportion of patients preferring the MMC over a specialty hospital clinic to be substantially higher among patients who use alcohol or drugs. The current methods available for the design and analysis of two-stage clinical trials lack the ability to factor in stratification variables.1,2,15
In the following paper, we propose an extension of Rucker’s preference design to allow for stratification. This method allows for the incorporation of important risk factors, such as alcohol and drug use status, that are expected to be related to a patient’s treatment preference or response. We begin by deriving a closed form sample size formula and demonstrating that the stratified study design can lead to a smaller sample size than its non-stratified counterpart. 15 A series of simulation studies comparing the empirical Type I error with different nominal values was used to validate the assumptions of the model. Next, we demonstrate the efficiency of the stratified design by examining the study power. Finally, we demonstrate the applicability by designing a trial to assess the impact on HRQoL for patients seeking treatment for HCV.
3 Methods
3.1 Study design
For simplicity, we assume that all patients have a treatment preference (i.e. no undecided patients). Using similar notation as that proposed by Rucker, the response
We use
These equations are premised on the principles of randomization – we will achieve equipoise through randomization and thus participants in the random arm will have the same preference rate as those in the choice arm.
Using the following notation
Since the strata are independent, these estimators can be used to determine the difference in the preference, treatment, and selection effects separately in each stratum. We can also construct an overall test statistic for the preference, selection, and treatment effects that accounts for stratification.
3.2 Stratified test statistic
Efficient testing of the preference, selection, and treatment effects in a stratified preference trial design requires adjustment of the test statistic to account for stratification. Adopting a method proposed by Rosenberger and Lachin, 16 we construct an overall test statistic through a weighted sum of the stratum-specific test statistics. The proposed weights were chosen according to the proportion of patients in each stratum.
We let the random variable Schematic of the stratified two-stage randomized preference design.
We calculate the stratum-specific test statistics using Rucker’s methods.
1
The observed response for an individual k in option group i of stratum l is denoted
The overall test statistic is computed as the weighted sum of the above stratum-specific statistics.
The stratum-specific test statistics are approximately normally distributed 1 ; therefore, we expect the overall test statistics for the stratified preference design to also be approximately normally distributed.
3.3 Sample size formula
To determine an appropriate sample size formula to account for stratification in the two-stage randomization design, the variance of each the selection and preference effects was derived (details in Appendix 1). Under the assumptions of equal variances in all study groups within a given stratum (i.e.
To construct a sample size formula from these variances, we can use the standard sample size formula for detecting a difference in two means (δ) with power β and Type I error rate of α.
17
Assuming the outcomes are approximately normally distributed, this formula is given by
Substituting the above variance for the stratified selection effect gives the necessary sample size to detect an effect of
Similar to the selection effect, we can derive the variance of the preference effect. Under the assumption of equal variances and effects across a given stratum, the preference effect variance is given by
Using the standard sample size formula to detect the difference between two means, we can find the corresponding sample size to detect a preference effect of
Finally, the variance of the treatment effect follows the same structure as a traditional stratified clinical design with an additional inflation factor of
The corresponding sample size needed to detect a treatment effect of
Unless the main focus is on only one of these effects in particular (e.g. the preference effect), the sample size for the trial is chosen to be the largest of
3.4 Example
Empirical Type I error (probability of rejecting the null hypothesis) for preference and selection effects from 10,000 simulations with
Derivation of the effect size estimates for mental component summary (MCS) by alcohol and drug use status.
MCS: mental component summary; MMC: mobile medical clinic.
Required sample sizes for treatment, selection, and preference effect using a two-stage randomized design for HCV patients receiving treatment at either a specialty or MMC clinic, under the stratified and unstratified design.
3.5 Simulations
We used multiple sets of simulations to verify the properties of the stratified two-stage design. For all scenarios, we assumed an even distribution of subjects between the option and random arm (
3.5.1 Type I error
For all scenarios, we allowed for two strata, varying the proportion of patients within stratum 1 (
As Table 1 shows, the empirical results are very close to the nominal values, thus demonstrating the tests perform well and that the assumptions of the proposed test are reasonable. Similar results were obtained when the proportion of patients in the first strata (
3.5.2 Efficiency
We demonstrate the potential gains in power of the stratified preference design compared to the unstratified counterpart. First, the treatment effect was evaluated in the absence of preference and selection effects. We generated a treatment effect in one stratum twice that of the second stratum. To do this, we set
Subsequently, the behavior of the preference and selection effects was investigated after removing the treatment effect. We set
The results for an overall sample size of N = 200, 300, 400 for Type I error rates of 0.01 and 0.05 are shown in Figure 2. The solid lines indicate the power calculated after adjusting for stratification, while the dotted lines show the corresponding power when an unstratified design is used.
Simulated power of stratified and unstratified trial designs for treatment, preference, and selection effects when stratum differences exist.
As the curves in Figure 2 show, using a stratified design results in improved power when different strata exhibit different preference, selection, or treatment effects. Additional simulations were run assuming three strata with equal distribution of subjects across the strata. These results are in accordance with the results presented for two strata although the data are not shown.
3.5.3 Sample size
Finally, to compare the efficiency of the stratified design with the unstratified case, the sample size for the preference effect for both designs was computed under a series of different scenarios. For all cases, one stratum was assumed to have a preference and selection effect of 0.5 (
In the first scenario, the treatment effect and overall mean were assumed to be 0 in both strata ( Ratio of required sample size for the preference effect for the stratified and unstratified designs with no overall mean or treatment effect with φ
l
= 0.5, Δπ
l
= 0.5 in stratum 1 for 80% power and α = 0.05. The stratum 1 preference rate is varied across panels (a)–(d); the stratum 2 preference rate is varied within each panel.
As can be seen in Figure 3(a), the stratified design tends to do poorly (compared to the unstratified design) when the preference rate in one of the stratum is extreme (0.1). Even when the preference rate in one are more moderate (e.g. 0.3, 0.5, 0.7) (Figure 3(b)–(d)), the stratified model does not perform as well as the unstratified model when the preference rate in the other arm is extreme (0.1 or 0.9). Only when the preference rate is similar in both arms (Figure 3(a)) or when the preference rates are both moderate (Figure 3(b)–(d)) is the stratified version more efficient. When there is no overall treatment effect, the efficiency of the stratified design is most clearly seen when large differences exist between the preference effect sizes, and consequently, the means, of the two strata.
In the second scenario, the role of the treatment effect was considered by assuming a treatment effect of 1 in one stratum and 2 in the other (
Once again, we see that the unstratified design is most often more efficient than the stratified design when the preference rate in one stratum is extreme (Figure 4(a)). However, with the introduction of a stratum-specific treatment effect, the variance of the unstratified design is inflated and the stratified design becomes more efficient for moderate preference rates ( Ratio of required sample size for the preference effect for stratified and unstratified designs with stratum differences in treatment effect with φ
l
= 0.5, Δπ
l
= 0.5 in stratum 1 at 80% power and α = 0.05. The stratum 1 preference rate is varied across panels (a)–(d); the stratum 2 preference rate is varied within each panel.
Finally, the efficiency of the design was investigated when the stratum variances are unequal (
In general, Figure 5 illustrates similar results: the stratified design tends to perform best when the preference rates in both strata are moderate and the differences in stratum means, and the corresponding variances, are high.
Ratio of required sample size for the preference effect for stratified and unstratified designs with stratum differences in variance with φ
l
= 0.5, Δπ
l
= 0.5 in stratum 1 at 80% power and α = 0.05. The stratum 1 preference rate is varied across panels (a)–(d); the stratum 2 preference rate is varied within each panel.
Of note, we also evaluated the selection effect under the same conditions as the preference effect and observed similar results (data not shown).
4 Discussion
The doubly randomized two-stage trial design proposed by Rucker 1 is an ideal design for addressing the inclusion of patient preference in making treatment decisions. This design enables the separation of treatment effects from the effects that result from patients choosing their treatment. In addition, the two-stage randomized trial design more accurately reflects clinical experience than the completely randomized design. These trials are especially relevant in behavioral intervention studies where, depending on the arm, different demands may be made on a participant’s time. The motivation of a patient to follow a treatment may be influenced by a preference a patient has before beginning any course of action. 19 Allowing the patient to choose the intervention that is most suitable, in terms of level of commitment and willingness to participate in the task, may lead to enhanced motivation and thus efficacy of the intervention.3,19 Additionally, the accurate estimation of these effects may be important when compliance and adherence to a treatment may be related to a patient’s outcome. 9 In cases where equivalence has already been demonstrated among multiple treatments, accurate estimation of the selection and preference effects may help clinicians to better treat patients. For example, in relapsing–remitting multiple sclerosis, shared decision-making models promote better outcomes and are especially important for patients for which multiple reasonable treatment options exist, with none clearly outperforming the others. 20 However, the original design fails to account for differences in preference rates among different population subgroups. In particular, there may be different factors that cause certain people to have a higher preference for one treatment over another. For example, in the HCV study discussed above, patients who use drugs or alcohol are expected to have a stronger preference for MMCs versus treatment in a traditional healthcare setting. Additionally, these factors may result in different treatment effect responses. In this case, a stratified preference trial design will be able to account for these differing rates while increasing efficiency, reducing the required sample size, and offering methods to calculate the treatment, preference, and selection effects both within each stratum and overall.
The stratified preference trial design may be especially applicable in cases where the effect of two treatments is likely to be similar. In these scenarios, preference and selection effects may play a vital role in prescribing a treatment to a certain patient. In the example of a trial investigating HCV treatments, the differences in treatment effect between specialty hospital clinics and MMCs may be extremely small. However, because of various barriers that may prevent people using alcohol or drugs from initiating or adhering to a hospital treatment, allowing patients a choice in where to get treatment may allow for a significant increase in HRQoL, as measured by the SF-12. Further, it could have a substantial impact on the overall treatment goal of achieving sustained virologic response.
Our results indicate that not accounting for subgroups with different treatment responses and preference rates increases variability, decreases power and requires a greater sample size. Using a stratified version of a preference design enables minimization of these effects. As our results indicate, making use of a stratified design when there may be differences in preference rates or treatment responses allows for an increase in efficiency from the unadjusted design and analysis. This increase in efficiency translates to a smaller sample size needed to estimate each of the treatment, selection, and preference effects. While we have shown that the stratified design is not always more efficient than the unstratified design, especially under scenarios of extreme preference rates, these are the situations for which we would not recommend a stratified preference design, or we would recommend the selection of a different stratification variable.
One of the simplifying assumptions used in the construction of the sample size formula is that the preference, selection, and treatment effects are the same across all strata. The next step in this work is to create a test to determine the validity of this assumption and to further generalize these methods to incorporate situations with unequal effect sizes across the strata. Another important consideration for future development is the optimal allocation between the choice and random arms in the initial randomization. In this paper, equal randomization between the arms was assumed; however, previous work has indicated that this allocation ratio may not be optimal. 2 Future work may focus on determining the optimal allocation for the stratified design. In addition, future work is needed to extend these methods to other outcome distributions, such as binomial and survival outcomes. The methods proposed in this paper assume normally distributed, continuous data. Further, this study assumes that all patients in the choice arm have a treatment preference. In the future, these methods may also be extended to allow for the possibility of a patient not having a preference between the studied treatments. Finally, it is an important assumption of preference trial designs that the process of being randomized to either a choice or random arm does not influence a patient’s preference and response. 1 This assumption may cause concern in cases where being randomized to a particular treatment, as opposed to being allowed to choose a treatment, affects a patient’s response.
Footnotes
Acknowledgements
The authors would also like to thank Dr Peter Peduzzi for his thoughtful comments.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by Yale’s Clinical and Translational Science Award (NIH: UL1TR000142).
