Sample size adjustment based on promising interim results and its application in confirmatory clinical trials

Abstract

Background:

For a carefully planned and well-designed Phase 3 confirmatory trial, there is still a potential risk of failing to meet the study objective due to possible differences between Phase 2 and Phase 3 studies. As illustrated by the ENGAGE trial, potential sample size increase at an interim analysis can mitigate the risk for an otherwise underpowered study. Many approaches for sample size adjustment (SSA) require certain modifications to the conventional statistical method, such as changing critical values or using a weighted Z-statistic for final hypothesis testing. Without modification, the type I error rate can be inflated, primarily caused by sample size increase for nonpromising interim observation that is close to null or no treatment effect. As illustrated by the TOPICAL trial, increasing sample size for nonpromising interim result could waste limited resource on ineffective treatment. The modifications in these approaches are therefore unnecessary costs of flexibility/interpretability for unnecessary scenarios of sample size increase.

Purpose:

To discuss and illustrate the appropriateness of SSA based on promising interim results, that is, conditional power being greater than 50% (or CDL approach), in a carefully planned and well-designed Phase 3 confirmatory trial.

Methods:

Two clinical trials are used to illustrate the clinical setting for the CDL approach and appropriateness of its application. Operating characteristics are assessed and compared to other methods using numeric computation. Hypothetical trials based on real clinical data are used to illustrate the approach.

Results:

The CDL approach for SSA leads to a small increase in expected sample size resulting in a small power gain versus the fixed design. This indicates that adding SSA will not on average substantially affect the budget at the portfolio level. However, when the interim result is promising, the CDL approach can dramatically increase the conditional power therefore mitigating the risk of an otherwise underpowered study.

Limitations:

Implementation challenges of the SSA methods are not in the scope of this paper. SSA is not intended to replace careful design of a confirmatory trial; instead, it can mitigate the risk for a well-designed trial.

Conclusions:

The CDL approach for SSA based on promising interim results, that is, conditional power being greater than 50%, is particularly useful in mitigating the risk for a carefully planned and well-designed Phase 3 confirmatory trial. No modification to the conventional statistical procedure is necessary while the type I error rate is controlled. Such a feature of ''no interference,'' or no change to the conventional statistical procedure with or without sample size adjustment, is important for the interpretation of a confirmatory trial. Similar to the fixed design, carefully planned and well-designed group sequential studies can also benefit from SSA to mitigate the risk of failing to meet the study objective.

Keywords

Sample size adjustment promising interim analysis conditional power

Introduction

The success of a pivotal Phase 3 study to confirm the efficacy and safety of an experimental drug is critical for regulatory registration and post-approval commercial promotion. Because of its importance, a confirmatory trial should be carefully planned and well-designed. However, even designed with high confidence, a pivotal Phase 3 study can still miss statistical significance due to unexpected reasons such as differences in patient populations between Phases 2 and 3 including inclusion/exclusion criteria, concomitant medications, surrogate versus clinical endpoints, and participating sites. In some cases, the final test yields a p value that is close to the nominal level but fails to reach the required statistical significance, as illustrated in the following example.

Example 1 (Effective aNticoaGulation with factor xA next GEneration in Atrial Fibrillation–Thrombolysis in Myocardial Infarction (ENGAGE AF-TIMI) 48 Trial): This was a randomized, double-blind trial including more than 21,000 patients with atrium fibrillation.¹ Patients were randomized to high dose edoxaban, low dose edoxaban, or warfarin. The primary efficacy endpoint was stroke or systemic embolic event and the study planned to accrue 672 events. For our purpose, we consider the comparison between high dose edoxaban and warfarin. The primary objective was to demonstrate noninferiority of edoxaban versus warfarin. If noninferiority was established, superiority would also be tested. As reported by Giugliano et al.,² the high dose edoxaban met the noninferiority criteria but in the subsequent superiority test, the study failed to show superiority with p = 0.08. While the study was a success in showing noninferiority for the high dose edoxaban, the failure to further establish superiority over standard of care could be a disadvantage in the market place post approval when competing with other products with a superiority claim. In fact, after the study results were announced, a competitor made a press release that its product continued to be the only oral anticoagulant which showed superior ischaemic stroke reduction versus warfarin.³

To reduce the risk, a potential sample size increase at an interim analysis if accumulating data show a lower than expected treatment effect has been discussed.^4–9 Many approaches require certain modifications to the conventional statistical method, such as changing critical values^4–6 or using a weighted Z-statistic for final hypothesis testing.^8,9 These approaches consider the general scenario where sample size increase can occur regardless of the actual interim results. Without modification to the conventional method, the type I error rate will be inflated,⁴ primarily caused by sample size increase for nonpromising interim observation that is close to null or no treatment effect. The modifications in the above approaches, therefore, are intended to account for this scenario at the cost of flexibility and/or interpretability. Increasing sample size when interim results indicate no or very small effect may not be necessary and could waste limited resource. As in Example 2, if sample size were increased based on a low 32% conditional power at an interim analysis with 50% data, the additional resource would have been wasted because there was no evidence of benefit. We argue against increasing sample size when interim results indicate no or very small effect and therefore modifications in the above approaches with the intention to account for such a scenario are not necessary.

Example 2 (TOPICAL Trial): TOPICAL was a double-blind, randomized, placebo-controlled, phase 3 trial to compare erlotinib and placebo in patients with advanced non-small-cell lung cancer.¹⁰ The primary efficacy endpoint was overall survival. Assuming a hazard ratio of 0.75, the study planned to enroll 664 patients (or 550 events) to achieve 90% power with two-sided significance level of 0.05. When 50% of the target number of events were available, the observed hazard ratio was 0.87 with 95% confidence interval (CI) of (0.68, 1.10) and the conditional power under the current trend was 32%.¹¹ At the end of the study, the hazard ratio was 0.94 with 95% CI of (0.81, 1.10) which failed to show superior efficacy compared to placebo.¹⁰

Sample size adjustment (SSA) based on the promising interim result principle¹² will consider increasing the sample size only if the interim results are “promising,” defined as a conditional power under the current trend as greater than 50%. The type I error rate will not be inflated using the conventional statistical procedure. The definition of “promising” or conditional power being greater than 50% is generally practical and sufficient for a well-designed confirmatory trial. In recent Phase 3 studies with planned SSA at an interim analysis, the maximum sample size increase was generally moderate between 40% and 60%.^13–15 Such an increase, if the intention is to maintain the target power, requires interim results to at least meet the definition of a promising conditional power of >50%. This 50% conditional power principle is further extended by Gao et al.¹⁶ and Mehta and Pocock.¹⁷ The 50% conditional power approach and its extensions, together called promising zone approaches, are compared and evaluated by Wang et al.¹⁸ and Menon et al.¹⁹ A hybrid approach incorporating this “promising” idea but using the weighted statistic has also been proposed.²⁰ Despite the advancement of methodology research, use of SSA in clinical trial practice remains controversial. In the recent Food and Drug Administration (FDA) draft guidance on adaptive design,²¹ sample size re-estimation based on unblinded interim analysis is considered less well-understood. It recommends further research and applications to gain better knowledge and more experience. Subsequently, Wang et al.¹⁸ introduce the concept of “twilight zone” which refers to the high uncertainty when designing the confirmatory trial based on limited Phase II data in a learn-and-confirm type of design. They advise against use of sample size adaptation in such a setting. Instead, they endorse sample size re-estimation in exploratory trials such as Phase 2 studies where a substantial increase in sample size is reasonable.

Instead of new methodological developments, the main objective of this article is to identify a meaningful clinical setting for the applications of the SSA methods: carefully planned and well-designed Phase 3 confirmatory trials. Examples of clinical trials are then used to illustrate the appropriateness of such applications. The clinical setting is introduced in the next section. SSA methods using conventional unweighted statistics¹² and weighted statistics^8,9 will be briefly reviewed in “Review of methods” section. We further characterize the SSA methods in “Comparison of methods” section. Hypothetical trials based on real clinical data are used to illustrate the methods in “Applications” section and recommendations are provided in “Conclusion and discussion” section.

Clinical scenario for SSA application

We consider application of SSA methods in Phase 3 confirmatory trials. Instead of looking at the “twilight zone”¹⁸ or a wide range of uncertain design parameters, we consider confirmatory trials that are carefully planned and well-designed. This is in fact the typical clinical setting in the traditional paradigm of drug development.

Prior to committing a big investment in a large Phase 3 trial, the sponsor will usually spend time and resources in understanding the disease, characterizing the compound in development, predicting the safety and efficacy profile based on pre-clinical data, assessing how the investigational compound may affect and be affected by the human body, establishing proof of concept, obtaining a preliminary but reasonably robust estimate of efficacy in relevant patient populations, and determining an appropriate dose for confirmatory trials. There is increasing interest in having an early read of which compounds are likely to succeed, and making the hard decisions about which compounds to terminate.²² Therefore, collecting robust data to aid in early decisions before Phase 3 is important and will continue to be important for sponsors. For the design of confirmatory Phase 3 studies, additional cautions are usually taken to ensure high confidence in study success. For example, a conservative treatment effect and/or high control rate may be assumed for Phase 3, and a high statistical power such as 80% or 90% is usually used. In the ISENTRESS program,²³ the decision of Go/No-Go to Phase 3 was based on extensive knowledge about the disease, well-understood animal models using in vitro and in vivo data, established proof-of-concept in patients, and favorable efficacy and safety data from two Phase 2 dose ranging trials. In the Phase 2 dose ranging trial in the same patient population, a treatment difference of 55% points (70% vs 15% for active vs placebo, respectively) in response rate was observed across all dose groups.²⁴ In anticipation of the use of newly approved drugs in the background therapy in Phase 3, the placebo rate was assumed to be 50% and the treatment effect was assumed to be 20% points.²⁵ Each of the two Phase 3 studies was then powered at 90%.

The cautionary steps may provide some cushion in case assumptions are wrong. However, they could be arbitrary and not necessarily adequate. SSA based on promising interim results could be particularly useful in mitigating the unexpected risk.

Review of methods

For our purpose, we consider one sample normal response $X_{i}$ for patient i with mean $θ$ and variance $σ^{2} = 1$ . This can be easily extended to two-sample test as in “Applications” section. We consider hypothesis $H_{0}$ : $θ = 0$ versus $H_{1}$ : $θ = θ_{a} > 0$ . Let $N_{0}$ be the original sample size to achieve a target power of $(1 - β)$ at the one-side significance level of $α$ , determined by $N_{0} = {(z_{α} + z_{β})}^{2} / θ_{a}^{2}$ , where $z_{α}$ and $z_{β}$ are the upper $α -$ and $β$ -quantiles of the standard normal distribution, respectively, and $θ_{a}$ is the assumed treatment effect. At an interim analysis with data from n patients or information time $t = n / N_{0}$ , define the sample mean as ${\hat{θ}}_{1} = n^{- 1} \sum_{i = 1}^{n} X_{i}$ and the normalized statistic as $Z_{1} = n^{1 / 2} {\hat{θ}}_{1}$ . The sample size may be increased from $N_{0}$ to $N = (1 + r (z_{1})) N_{0}$ , where $r (z_{1}) \geq 0$ is a function of the interim data.

Cui, Hung, and Wang

The weighted statistic $Z_{2, N}^{(W)} = \sqrt{t} Z_{1} + \sqrt{1 - t} {(N - n)}^{- 1 / 2} \sum_{i = n + 1}^{N} X_{i}$ is used for the hypothesis test and compared to the conventional critical value. This method, often referred as the Cui, Hung, and Wang (CHW) method,⁸ preserves the overall type I error. The weighted statistic puts less weight for patients who entered the study after the interim analysis if the sample size is increased based on the interim analysis. It is desirable, however, that uniform weight is assigned to all subjects before and after the sample size is increased and therefore follows the “one patient one vote” rule.²⁰

Chen, DeMets, and Lan

At the interim analysis, the conditional power under the current trend is defined as

ρ (t, z_{1}) = \Pr (Z_{2, N_{0}} > z_{α} | t, Z_{1} = z_{1}, θ = {\hat{θ}}_{1}) = Φ (z_{1} / \sqrt{t (1 - t)} - z_{α} / \sqrt{1 - t})

The normalized statistic $Z_{2, N_{0}} = N_{0}^{- 1 / 2} \sum_{i = 1}^{N_{0}} X_{i}$ is the conventional unweighted Z-statistic. If the interim result is promising, that is, a conditional power of $> 50 %$ , increasing the sample size will not inflate the overall type I error and will not require any modification to the final analysis. The conventional unweighted z-statistic and critical value can be used at the final analysis.¹² The definition of promising, that is, conditional power of >50%, is equivalent to ${\hat{θ}}_{1} \geq θ_{a} z_{α} / (z_{α} + z_{β})$ , which means the observed treatment effect should not be too far away from the expected effect. For example, taking $α = 0.025$ and $β = 0.95$ , this lower bound implies that the observed effect has to be at least 54% of the expected effect, in order to have a promising interim result for a sample size increase. An observed treatment effect that is much smaller than expected will require a dramatic sample size increase to maintain the target power.

Comparison of methods

Consider a simple SSA plan with the sample size increment $r (z_{1}) \geq 0$ as a function of the interim unweighted z-statistic $z_{1}$ . At an interim analysis, the sample size may be increased to $N = (1 + r (z_{1})) N_{0}$ to achieve the target power when the conditional power is greater than 50% but less than the target power. Potential decisions could be

{\begin{matrix} \begin{matrix} Increase to (1 + r (z_{1})) N_{0} & z_{1} \in G, where G = (z_{α} \sqrt{t}, z_{α} \sqrt{t} + z_{β} \sqrt{t (1 - t)}) \end{matrix} \\ \begin{matrix} Continue with N_{0} & Otherwise \end{matrix}, z_{1} \in G^{c} \end{matrix}

$G$ is the region for sample size increase and $G^{c}$ is the complement of $G$ . The upper bound of the region is defined as $Z_{U} = max_{z_{1}} {ρ (t, z_{1}) < (1 - β)}$ . The lower bound is defined as $Z_{L} = min_{z_{1}} {ρ (t, z_{1}) > 50 %}$ . Define $Z_{2, N} = N^{- 1 / 2} \sum_{i = 1}^{N} X_{i}$ and $t_{r}^{*} = n / N = t / (1 + r (z_{1}))$ . The definition of G is equivalent to a promising zone of conditional power between 50% and the target power of $(1 - β)$ , which guarantees preservation of type I error rate if only increasing sample size within this region G.¹²

Based on equation (8) of Mehta and Pocock,¹⁷ $r (z_{1})$ can be derived as

r (z_{1}) = \frac{t}{z_{1}^{2}} {[\frac{z_{α} - z_{1} \sqrt{t}}{\sqrt{1 - t}} + z_{β}]}^{2} + t - 1

This implies that if the sample size increment is determined by the above formula, the conditional power evaluated at ${\hat{θ}}_{1}$ with the new sample size will be $(1 - β)$ .

The following comparisons are based on numerical computations which involve normal integration. R code can be provided per request.

Expected sample size

Denote the true treatment effect $θ = f θ_{a}$ , where $f \geq 0$ represents the fraction of treatment effect over the originally predicted effect for sample size planning. The expected sample size (ESS) of SSA, denoted as $N_{ESS}$ , is defined as $N_{ESS} = E {N | θ = f θ_{a}}$ . The ratio of $N_{ESS} / N_{0}$ is $1 + \int_{z_{1} \in G} r (z_{1}) d Φ (z_{1} - \sqrt{t} f (z_{α} + z_{β}))$ which is always greater than or equal to 1. Details of derivation of the formula can be found in Appendix 1. Figure 1 shows the ESS as a ratio to the original sample size when the maximum sample size increment $R_{max}$ is set to be 50%, 100%, and 200% with a significance level of 0.025 and power of 90%. As shown in Figure 1, if we increase sample size to achieve the target power only when the conditional power at the interim analysis is greater than 50%, the ESS increase is small relative to the original sample size. This is especially true when the interim analysis is planned at a relatively late stage, such as when 80% of subjects completed the endpoint (right panel of the figure). In other words, adding SSA will not on average substantially affect the budget at the portfolio level.

Figure 1.

Expected sample size (ESS) as a ratio to original sample size.

Unconditional statistical power

We compare the unconditional power of four approaches: fixed design with original sample size; fixed design with ESS; SSA using Chen, DeMets, and Lan (CDL) unweighted approach; and SSA using CHW weighted Z approach. The statistical power is computed for a true treatment effect $θ = f θ_{a}$ . The formula for the calculations can be found in Appendix 1. Figure 2 compares the unconditional power for the four methods under varying assumptions on different time of interim analysis $t$ and maximum magnitude of sample size increment $R_{max}$ . We consider one-sided $α = 0.025$ and target power of 90%. Across the scenarios, a fixed design with the original sample size has lower power compared to others, as it has smaller sample size. The two SSA methods CDL and CHW and the fixed design with ESS all have similar power. The differences in unconditional power are very small, especially around $f = 1$ on the X-axis which is the clinical setting in consideration. For t = 0.5 and the maximum sample size increment $R_{max} = 100 %$ , the power gain of the SSA methods versus the fixed design with the original sample size is about 4% if the true treatment effect is $f = 0.8$ . This gain in power is based on an ∼13% increase in ESS as shown in Figure 1.

Figure 2.

Power comparisons between SSA methods and fixed design. Figure 2 shows that when we increase sample size to achieve the target power, only if the interim results are promising (i.e. conditional power between 50% and 90% (target power)), the expected sample size increase is small relative to the original size; furthermore, this increase is even smaller when we conduct the interim analysis at a late stage (e.g. t = 0.8). In practice, we usually limit the maximum increase to 100%, and if we apply the sample size adjustment rules proposed above, the expected sample size increase would be less than 8% for t = 0.8 and 13% for t = 0.5.

Conditional operating characteristics

In this section, we study the operating characteristics of SSA using CDL unweighted statistics in terms of ESS and power, conditional on the interim result being promising (i.e. $z_{1} \in G$ ). Details of the derivation of the formula can be found in “Conditional ESS” and “Conditional statistical power” sections of Appendix 1. We consider one-sided $α = 0.025$ and target power of 90%, under a different time of interim analysis $t$ and maximum magnitude of sample size increment $R_{max}$ . Figure 3 shows the conditional ESS, and Figure 4 shows the conditional power for different scenarios. From the two figures, we can see that when the interim result is promising, the SSA method can dramatically increase the power, with a reasonable ESS increase. For example, when $f = 0.8$ and $R_{max} = 50 %$ , the SSA design will increase the power from 73.7% to 89.5% with a 30.3% increase of the ESS, in the promising zone. This is generally true under different target power.

Figure 3.

Expected sample size (ESS), conditional on promising interim results.

Figure 4.

Power comparison with fixed design, conditional on promising interim results.

Applications

We first extend the notation to a survival endpoint. Under the proportional hazards assumption, $λ_{1} (u) = λ_{0} (u) e^{- δ}$ , where $λ_{1} (u)$ and $λ_{0} (u)$ are the hazard functions at time $u$ for active and control, respectively. The parameter of interest $δ = - [\log {λ_{1} (u)} - \log {λ_{0} (u)}]$ is positive if the experimental treatment is more efficacious than the control, and $e^{- δ}$ is the hazard ratio. Assume equal randomization in the study. The estimate $\hat{δ} = (2 / \sqrt{D_{0}}) Z ~ N (δ, 4 / D_{0})$ , where $D_{0}$ is the number of events for both groups combined and $Z$ is the normalized logrank statistic.²⁶ Other design modifications to account for special considerations for a time-to-event endpoint can also be found in the research by Cook.²⁷

In the following hypothetical examples motivated by Examples 1 and 2, testing procedures may be different from those used in the examples. The joint normal distribution holds asymptotically, which is generally true in a large confirmatory trial. In the examples below, the sample size increment is determined by the observed treatment effect in order to maintain a target power.

SSA for effective therapy

In Example 1, the hazard ratio was 0.87 with 95% CI of (0.73, 1.04) based on 633 events.² Using these data, we consider a hypothetical study with a target number of $D_{0} = 633$ events, which provides 80% power to show superiority at a two-sided 0.05 level based on an assumed treatment effect of $δ_{a} = 0.22$ or hazard ratio of 0.80. At the end of the study, the hazard ratio is 0.87 or $\hat{δ} = - \log (0.87) = 0.14$ . The two-sided p value is $2 * [1 - Φ (\sqrt{D_{0}} \hat{δ} / 2)] = 0.08$ . The study fails to reject the hypothesis.

We consider SSA at an interim analysis with $t = 0.5$ . Under the proportional hazards assumption, the standardized logrank test statistics at the interim and final analyses have a joint normal distribution. The conditional distribution of ${\hat{δ}}_{1}$ , the observed treatment effect at the interim analysis, given the final observed treatment effect $\hat{δ}$ is normal with mean $\hat{δ}$ and variance $4 (1 - t) / D_{0} t$ . In our case, the conditional distribution of ${\hat{δ}}_{1} | \hat{δ}$ is normal with mean 0.14 and variance $4 \times (1 - 0.5) / (633 \times 0.5) = 0 . 08^{2}$ .

If the conditional power at the interim analysis is less than 50% or greater than 80%, the study will continue with the original $D_{0} = 633$ ; if the conditional power is between 50% and 80%, the target number of cases will be increased to $D$ to maintain the target power of 80%. As shown in “Promising zone of CP between (50%, 80%)” subsection of Appendix 1, the promising zone defined as the conditional power between 50% and 80% is equivalent to ${\hat{δ}}_{1} \in (0.16, 0.20)$ or an observed hazard ratio between 0.82 and 0.86. Based on the CDL approach, the 95% CI for the treatment effect is $- \log (0.87) \pm 1.96 \sqrt{4 / D}$ . In order to have the lower bound of the CI above 0, the new sample size $D$ should be no less than $[2 \times 1.96 / \log (0.87 {)]}^{2}$ or 792; that is, at least 25% increase to ensure a successful study at the end. As shown in the “SS increase to reach statistical significance” subsection of Appendix 1, this sample size requirement is equivalent to ${\hat{δ}}_{1} \leq 0.19$ or observed hazard ratio not smaller than 0.83. Table 1 provides a summary of possible interim observations, corresponding sample size decisions, and the final results. Without SSA, the study would fail to reject the hypothesis with a p value of 0.08. When SSA is considered, there is a chance that the study can be saved with an increased sample size. When the hazard ratio is between 0.86 and 0.83, the sample size will be increased and the magnitude of increase, above 25%, is sufficient to reach statistical significance with p < 0.05. The probability for this outcome is 0.14. The ESS increase is less than 13%.

Table 1.

Hypothetical trial 1: SSA versus No SSA (assume observed HR = 0.87 at the end of study).

Observed treatment effect ${\hat{δ}}_{1}$ at Interim Analysis Projected		CDL				No SSA p value
		Sample size increase?	End study p value	Probability to save the study	Expected sample size increase	No SSA p value
Observed treatment effect ${\hat{δ}}_{1} ~ N$ (0.14, 0.08²)	${\hat{δ}}_{1} \leq 0.16$ (HR ≥ 0.86)	No	0.08	14%	<13%	0.08
	$0.16 < {\hat{δ}}_{1} < 0.19 (0.83 < HR < 0.86)$	Increase by at least 25%	<0.05
	$0.19 < {\hat{δ}}_{1} < 0.20 (0.82 < HR < 0.83)$	Increase by 0% to 25%	0.05–0.08
	${\hat{δ}}_{1} \geq 0.20$ (HR ≤ 0.82)	No	0.08

SSA: sample size adjustment; HR: hazard ratio; CDL: Chen, DeMets, and Lan.

No SSA for ineffective therapy

In Example 2, the hazard ratio was 0.94 with a 95% CI of (0.81, 1.10) based on 657 deaths.¹⁰ At the interim analysis with $t = 0.5$ , the observed hazard ratio was 0.87.¹¹ Using these data, we consider a hypothetical trial with a target number of $D_{0} = 657$ events, which has 90% power to show superiority based on assumed hazard ratio of 0.78. At the end of the study, the hazard ratio is 0.94 with 95% CI of (0.81, 1.10). The experimental therapy is concluded to be ineffective.

Let us now assume an observed hazard ratio of 0.87 at the interim analysis with 50% of the target number of events. As shown in “Promising zone of CP between (50%, 80%)” subsection of Appendix 1, the conditional power is 40% in this hypothetical trial. The CDL approach would recommend not increasing the sample size as it is below 50%. This would be a correct decision as the additional investment would have been wasted. A more aggressive decision would be stopping the trial for futility. Note that if a decision is made to increase the sample size, the increase in sample size to ensure that the conditional power with the new sample size $D$ remains at 90% would require $D$ = 2024 deaths, that is, more than triple the original sample size ( $D / D_{0} = 3.1$ ). This further supports the argument that caution should be taken when relaxing the cutoff of conditional power of >50% for the promising zone.

Conclusion and discussion

We consider the application of SSA methods in carefully planned and well-designed Phase 3 confirmatory trials. There is still a potential risk of failing to meet the study objective due to possible differences between Phase 2 and Phase 3 studies. The CDL approach, which allows a sample size increase when the interim results are promising, is particularly useful in mitigating the risk. The requirement of conditional power of >50% for possible SSA is not a binding rule as that of a binding futility boundary. There is no modification of the efficacy boundary or other parameters in the CDL approach, while for a binding futility boundary the critical value for the efficacy test is lowered to compensate for the early futility stopping. The type I error rate is strictly controlled without modification to the conventional statistical procedure. The conventional unweighted test statistics and critical values can be used without any change. Such a feature of “no interference,” or no change to the conventional statistical procedure with or without SSA, is important for confirmatory Phase 3 trials. It is easier for clinical interpretation based on the “one patient one vote” principle, and statistical inference (estimation and CI) based on maximum likelihood estimation will be consistent with the unweighted test statistics. The SSA methods do not require lowering the final statistical significance level in order to control the type I error rate (i.e. no change to the final critical value), which also means not lowering the criteria for success, therefore maintaining a consistent threshold for required evidence in establishing efficacy for experimental drugs. By requiring promising interim results in order to increase the sample size, the CDL approach can reduce the chance of mistakenly increasing sample size for an ineffective treatment. While the unconditional power is not the primary focus of the CDL method, it provides a small power gain versus a fixed design with the original sample size, based on a small increase in ESS. The small increase in ESS indicates that adding SSA based on promising interim results will not on average substantially affect the budget at the portfolio level with multiple programs. Given the potential benefit of saving a pivotal trial, we recommend SSA based on interim results be always considered and if appropriate, be used in the design of the confirmatory trial. Second, as SSA based on promising interim results is not expected to substantially improve unconditional power, the SSA methods, especially the CDL approach, are not intended to re-design or dramatically change the ongoing trial. The new sample size is usually calculated in such a way that after sample size increase the conditional power is maintained at the high level. Therefore, its goal is to ensure that the ongoing study can meet the study objective at the end of the study if in fact the treatment is efficacious. Therefore, to achieve a high unconditional power is not the primary focus of the SSA methods. Instead, maintaining a high conditional power under the current trend would be the primary goal of SSA methods. As we know, both SSA methods would aim to maintain the conditional power at the target level. Additional characteristics based on conditional probabilities for the SSA methods have been discussed.¹⁸ SSA and group sequential designs are intended to address different issues in large Phase 3 trials. Similar to fixed designs, carefully planned and well-designed group sequential studies can also benefit from SSA to mitigate the risk of failing to meet the study objective. The CDL approach can be extended to group sequential studies without changing the conventional group sequential approaches.¹²

Some authors caution about early trends with small numbers of patients and events, since such results are generally unstable and any decisions based on them could be incorrect.²⁸ To help obtain a robust decision on SSA, we recommend an interim analysis with at least 50% of the originally planned data available to reduce the chance of action based on data noise. Some authors argue that this decision can be delayed until more data are available and the estimate is more robust.¹⁶ However, an interim analysis very close to the planned end of the study may also limit the potential value of the method as the conditional power would tend to be close to either 1 or 0.²⁹ Our numerical study shows similar characteristics up to an information time of 0.8. However, this should also take other factors such as operational feasibility into consideration.

Footnotes

Appendix 1 Acknowledgements

The authors are full time employees of their corresponding affiliations. They are grateful to the Associate Editor and the two reviewers for their helpful comments which greatly improved the quality of this article.

Declaration of conflicting interests

The authors declare that there is no conflict of interest.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

References

Ruff

Giugliano

Antman

. Evaluation of the novel factor Xa inhibitor edoxaban compared with warfarin in patients with atrial fibrillation: design and rationale for the Effective aNticoaGulation with factor xA next GEneration in Atrial Fibrillation–Thrombolysis in Myocardial Infarction study 48 (ENGAGE AF–TIMI 48). Am Heart J 2010; 160: 635–641.

Giugliano

Ruff

Braunwald

. Edoxaban versus warfarin in patients with atrial fibrillation. N Engl J Med 2013; 369: 2093–2104.

Business Wire. Pradaxa^® (dabigatran etexilate) 150 mg bid continues to be the only oral anticoagulant which showed superior ischaemic stroke reduction vs. warfarin in its pivotal study RE-LY^®, http://www.businesswire.com/news/home/20131125005798/en/Pradaxa%C2%AE-dabigatran-etexilate-150mg-bid-continues-oral (2013, accessed 8 April 2015).

Proschan

Hunsberger

. Design extension of studies based on conditional power. Biometrics 1995; 51: 1315–1324.

Shih

Xie

. A sample size adjustment procedure for clinical trials based on conditional power. Biostatistics 2002; 3: 277–287.

Jennison

Turnbull

. Mid-course sample size modification in clinical trials based on the observed treatment effect. Stat Med 2003; 22: 971–993.

Denne

. Sample size recalculation using conditional power. Stat Med 2001; 20: 2645–2660.

Cui

Hung

HMJ

Wang

. Modification of sample size in group sequential clinical trials. Biometrics 1999; 55: 321–324.

Fisher

. Self-designing clinical trials. Stat Med 1998; 17: 1551–1562.

10.

Lee

Khan

Upadhyay

. First-line erlotinib in patients with advanced non-small-cell lung cancer unsuitable for chemotherapy (TOPICAL): a double-blind, placebo-controlled, phase 3 trial. Lancet Oncol 2012; 13: 1161–1170.

11.

Jitlal

Khan

Lee

. Stopping clinical trials early for futility: retrospective analysis of several randomised clinical studies. Br J Cancer 2012; 107: 910–917.

12.

Chen

YHJ

DeMets

Lan

KKG

. Increasing the sample size when the unblinded interim result is promising. Stat Med 2004; 23: 1023–1038.

13.

Heger

Voss

Knebel

. Prevention of abdominal wound infection (PROUD trial, DRKS00000390): study protocol for a randomized controlled trial. Trials 2011; 12: 245.

14.

Leonardi

Mahaffey

White

. Rationale and design of the Cangrelor versus standard therapy to acHieve optimal Management of Platelet InhibitiON PHOENIX trial. Am Heart J 2012; 163: 768–776.e2.

15.

Ravandi

Ritchie

Sayar

. VALOR, an adaptive design, pivotal phase 3 trial of vosaroxin or placebo in combination with cytarabine in first relapsed or refractory acute myeloid leukemia. J Clin Oncol 2012; 30(suppl; abstr TPS6637).

16.

Gao

Ware

Mehta

. Sample size re-estimation for adaptive sequential designs. J Biopharm Stat 2008; 18: 1184–1196.

17.

Mehta

Pocock

. Adaptive increase in sample size when interim results are promising: a practical guide with examples. Stat Med 2011; 30: 3267–3284.

18.

Wang

Hung

HMJ

O’Neill

. Paradigms for adaptive statistical information designs: practical experiences and strategies. Stat Med 2012; 31: 3011–3023.

19.

Menon

Massaro

Pencina

. Comparison of operating characteristics of commonly used sample size re-estimation procedures in a two-stage design. Comm Stat Simulat Comput 2013; 42: 1140–1152.

20.

Wang

Hung

HMJ

. A conditional adaptive weighted test method for confirmatory trials. Ther Innov Regul Sci 2014; 48: 51–55.

21.

US Food and Drug Administration. Draft guidance for industry on adaptive design clinical trials for drugs and biologics, http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM201790.pdf (2010, accessed 8 April 2015).

22.

Ringel

Tollman

Hersch

. Does size matter in R&D productivity? If not, what does? Nat Rev Drug Discov 2013; 12: 901–902.

23.

Nguyen

Isaacs

Teppler

. Raltegravir, the first HIV-1 integrase strand transfer inhibitor in the HIV armamentarium. Ann N Y Acad Sci 2011; 1222: 83–89.

24.

Grinsztejn

Nguyen

Katlama

. Safety and efficacy of the HIV-1 integrase inhibitor raltegravir (MK-0518) in treatment-experienced patients with multidrug-resistant virus: a phase II randomized controlled trial. Lancet 2007; 369: 1261–1269.

25.

Steigbigel

Cooper

Kumar

. Raltegravir with optimized background therapy for resistant HIV-1 infection. N Engl J Med 2008; 359: 339–354.

26.

Schoenfeld

. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 1981; 68: 316–319.

27.

Cook

. Methods for mid-course corrections in clinical trials with survival outcomes. Stat Med 2003; 22: 3431–3447.

28.

DeMets

Pocock

Julian

. The agonising negative trend in monitoring of clinical trials. Lancet 1999; 354: 1983–1988.

29.

Bauer

Koenig

. The reassessment of trial perspectives from interim data—a critical view. Stat Med 2006; 25: 23–36.