A Practical Perspective: Application of the Generalized Approach for Adaptive Design

Abstract

Background:

Adaptive design methodology has been well studied for continuous, binary, and survival outcomes for decades. However, for complicated endpoints such as recurrent hospitalization in the joint frailty setting and composite endpoint in the win-ratio setting, adaptive design is not intuitive because of sophistication in existing methods to perform sample size re-estimation.

Methods:

The objective of this paper is to propose a practical generalized approach to implement the above activities at the interim stage through approximation so that sample size re-estimation becomes easily understood and readily amenable.

Results:

Through simulations on representative complex situations, the proposed method can maintain the planned statistical power by sample size re-estimation while controlling the type I error.

Conclusion:

The proposed adaptive approach is easy to implement in general sample size re-estimation situations. Its validity can be verified through simulation under varying scenarios. In summary, this approach offers a transparent communication channel with regulatory agencies to facilitate clinical trial development regardless of the complexity of the underlying situations.

Keywords

adaptive conditional power Finkelstein-Schoenfeld Anderson-Gill

Background

Unlike the conventional nonadaptive trials, an adaptive statistical design provides the flexibility so that the interim assessment could modify the trial design midstream with statistical rigor to increase the likelihood for trial success. The typical trial modification is sample size re-estimation with only 1 interim analysis. To facilitate an adaptive design, the data between the first stage and the second stage should be independent. For a comprehensive survey of literature on adaptive design, one can refer to the FDA guidance document on adaptive design issued in September 2018.¹

One of the common utilities to implement an adaptive design is conditional power. When conditional power at the interim stage is undesirable, overall sample size can be increased to maintain the conditional power to a desired level (for instance, 80% or 90%). However, if the conditional power at the interim is too low, the sample size increase may be prohibitive, and even worse, increasing sample size in this setting may lead to claiming nonmeaningful treatment effect or inflating type I error in some extreme cases. The publications by Chen et al,² Mehta and Pocock,³ and Gao et al⁴ suggest optimal adaptive design when the conditional power is promising. The advantage of an adaptive design given promising interim result is that the final statistical test can be performed through the original conventional approach without weighting the test statistics at the two stages based on weights that are determined by the planned sample size. In another word, the promising zone approach can maintain or boost the probability for trial success given that the interim result is clinically meaningful and promising. In addition, the final statistical test can be performed in the traditional way just like there are no interim assessment and trial modification.

Majority of methods in the current literature can facilitate calculating the interim conditional power and sample size re-estimation relying on explicit formula without concerning the specific forms of the underlying test statistics. Lan and Wittes generalized the method through Brownian motion.⁵ Interim calculation through the method by Mehta and Pocock³ is only based on the interim Z value. Nevertheless, the Lan and Wittes method was not fully expanded into sample size re-estimation, and the Mehta and Pocock approach still requires some sophistication. Bauer et al⁶ provided a contemporary overview of literature on sample size re-estimation in their 2016 paper. Although this paper offers in-depth history and comprehensive coverage of methodology for an adaptive design, it lacks a simplified illustration for implementation. Since for complicated endpoints such as recurrent hospitalization in the joint frailty setting and composite endpoint in the win-ratio setting, intuitive and simple implementation is not readily available in sample size re-estimation, there can be challenges to implement the adaptive strategy for non-sophisticated practitioners. This paper proposes a practical generalized adaptive approach for a wide range of situations.

Methodology

Suppose the initial plan divides the study population into two cohorts with sample size denoted as n_i , i=1, 2. The information associated with each cohort is $I_{i} = n_{i} / n$ , where $n = n_{1} + n_{2}$ . The combined test statistics⁷ is usually in the form of

\sqrt{I_{1}} Z_{1} + \sqrt{I_{2}} Z_{2}

where Z_i , i=1, 2, are normalized test statistics for various type of endpoints. For example, they can be the Wald test statistics from the joint frailty model⁸ or the Finkelstein and Schoenfeld test statistics⁹ in the composite endpoint setting. Joint frailty model is a statistical approach to model two parallel statistical processes and their correlation so that challenging issues such as competing risk and informative censoring can be appropriately addressed. The Finkelstein and Schoenfeld method uses the hierarchical structure of a composite endpoint to score each pair of trial subjects to obtain an aggregate score, and then perform statistical test through a test statistic normalized by the mean and variance of the aggregate score. To illustrate, assume the test is one-sided: $H_{0} : μ = 0 vs H_{1} : μ > 0$ , where μ is the mean of a general test statistic, such as log hazard ratio or relative risk. The larger the test statistics, the larger the treatment effect. Let α denote the 1-sided type I error rate, $Z_{1 - α} = Φ^{- 1} (1 - α)$ , where $Φ (.)$ represents the cumulative distribution function for the standard normal distribution.

At the interim, Z ₁ is realized as z ₁. The conditional power can be calculated by the following formula under the alternative hypothesis:

P (Z_{2} > \frac{(Z_{1 - α} - \sqrt{I_{1}} z_{1})}{\sqrt{I_{2}}})

Under the alternative hypothesis, in most situations, it is expected that Z_i , i=1, 2, follow the normal distribution $N (μ_{i}, 1)$ approximately, where μ_i is the mean for Z_i , i=1, 2, and it is proportional to $\sqrt{n_{i}}$ . Taking $μ_{1}$ as z ₁, the approximate estimate for $μ_{2}$ is $\frac{\sqrt{n_{2}}}{\sqrt{n_{1}}} z_{1}$ . Therefore, the conditional power is approximately $Φ (\frac{\sqrt{n_{2}}}{\sqrt{n_{1}}} z_{1} - \frac{(Z_{1 - α} - \sqrt{I_{1}} z_{1})}{\sqrt{I_{2}}})$ . Of note, if the alternative hypothesis is in a different direction, the above formula needs to be modified.

Based on the result from Chen et al, when the conditional power is 50% or higher, the sample size adjustment for the second stage can be flexible and the final test can be performed in the conventional way without inflating the type I error. Suppose the initial planned power is $1 - β$ , the following sample size re-estimation scheme can be utilized:

If interim conditional power is below 50% or above $1 - β$ , continue without adjustment

If interim conditional power is between 50% and $1 - β$ inclusive (promising zone), replace n ₂ by $\tilde{n_{2}}$ , where $\tilde{n_{2}}$ is the solution of the equation $Φ (\frac{\sqrt{n_{2}}}{\sqrt{n_{1}}} z_{1} - \frac{(Z_{1 - α} - \sqrt{I_{1}} z_{1})}{\sqrt{I_{2}}}) = 1 - β$

In case a more precise promising zone is preferred to the convenient rule of conditional power > 50%, the Mehta and Pocock method can be utilized.

If the sponsor has the resource and desire to increase the likelihood for trial success when the interim result is promising, the upper limit of the promising zone in terms of conditional power can be set at a higher value (eg 90% vs 80%), and the lower limit of the promising zone can be extended if the calculated value by the Mehta and Pocock method is smaller than the 50% conditional power.³

Simulation Results

To be general, the simulation focuses on 2-arm studies and it is performed under the following frameworks. Simulations were performed using the R software.

Finkelstein-Schoenfeld Framework

For power simulation, assume yearly recurrent hospitalization rates of 60% for the control arm and 48% for the treatment arm. The yearly mortality rates for the control and treatment arms are 28% and 21%, respectively. Quality of life (QoL) improvement at the required analysis visit has a mean of 1 and standard deviation of 1 for control arm, and a mean of 1.5 and a standard deviation of 1 for the treatment arm. The QoL improvement is normalized for the ease of simulation. For type I error simulation, the recurrent hospitalization and mortality rates are set at 60% and 28%, respectively, for both treatment groups while for QoL measure, the mean is set at 1. The larger the QoL measure the better. The yearly attrition rate is 7.5%, and the monthly enrollment is set at 5 per month for simplicity and illustration. The final analysis time is when the last subject reaches 12-month follow-up. The recurrent hospitalizations and death are connected with a gamma frailty $γ$ ∼gamma (2, ½) accompanied by a frailty exponent of α = 1:

Death \sim exp (λ_{d} \times γ^{α}); rehospitalization ∼ exp (λ_{h} \times γ)

Where exp(·) represent the exponential distribution, and $λ_{d}$ and $λ_{h}$ denotes the hazard parameters for the death and recurrent hospitalization processes, respectively. Suppose the initial planned sample size is 300, with a 2:1 randomization ratio between the treatment and control arms. The interim assessment is conducted when the 200th subject reaches 12-month follow-up. If the conditional power is below 50% or above 80%, one continues to the final analysis without adjustment. Otherwise, increase the total sample size to 450 as a simple measure to protect trial integrity (avoid back-engineering).

The Finkelstein and Schoenfeld method considers the hierarchy of death, recurrent hospitalizations, and normalized QoL improvement to rank each pair of trial subjects. For each pair of subjects, time to mortality will be compared to determine a winner. If indeterminate due to both subjects’ survival, the number of recurrent hospitalizations will be compared. If still a tie, then the normalized QoL improvement will be compared. Winner is set as 1 while loser is set as –1. Score for a tie is set as 0. The mean and variance calculated for an aggregate score for all pairs of subjects are used to construct a Z test statistic to assess the treatment effect. For the adaptive scheme to work, the aggregate scores for the interim cohort and the remaining cohort are calculated independently for the respective cohorts. The implementation of the adaptive strategy relies on the value of the interim Z test statistics.

With simulation iterations set at 2000 for power and 10,000 for type I error, Table 1 presents the simulation results. In Table 1, “No Overflow” means that only information used at interim assessment is used for the final test through formula (1), while “Overflow” means all available information for the interim cohort at the final data cut is used for the final test. To illustrate, suppose that the interim assessment is performed on day 200 in calendar time and the final test is performed on day 400 in calendar time in the survival setting. “No Overflow” means that data for the interim cohort are only used up to day 200 at the final test, though at the final test on day 400, subjects in the interim cohort have already cumulated data beyond day 200. On the other hand, “Overflow” means that available data beyond day 200 for the interim cohort is used for the final test.

Table 1.

Simulated Power and Type I Error.^a

Method	Nonadaptive Power, %	Adaptive Power, %		Adaptive Type I Error
Method	Nonadaptive Power, %	No Overflow	Overflow	No Overflow	Overflow
Finkelstein-Schoenfeld (FS)	61.6	65.8	65.6	0.025	0.0246
Anderson-Gill (AG)	63.6	65.9	66.9	0.029	0.028

^a In the promising zone, for the FS method, the nonadaptive power is 61.4%, while the powers for “No Overflow” and “Overflow” are 85.2% and 84% respectively; for the AG method, the nonadaptive power is 63.7%, while the powers for “No Overflow” and “Overflow” are 81.1% and 88% respectively.

Though the overall power increase is moderate (from 61.6% to 65.8% or 65.6%), in the promising zone (bottom of Table 1), the power increase is substantial: without adaptation, the power is 61.4%; with adaptation, the powers are 85.2% and 84% respectively under “No Overflow” and “Overflow.”

The type I error is controlled at the nominal level of 0.025 for both “No Overflow” and “Overflow” cases.

Anderson-Gill Recurrent Events

The simulation assumptions for the Anderson-Gill¹⁰ model follow that for the Finkelstein-Schoenfeld method except the following: The yearly recurrent hospitalization rates for the control and treatment arms are set at 60% and 45%, respectively, so that the adaptive feature is more likely to be triggered.

The primary endpoint is recurrent hospitalization while death is a competing risk. The Anderson-Gill Cox regression with robust variance estimate is utilized for treatment effect estimation.

With simulation iterations set at 2000 for power and 10,000 for type I error, the simulation results are reflected in Table 1. The trend is consistent with that observed under the Finkelstein and Schoenfeld framework. The type I error under the Anderson-Gill model is slightly above the nominal level of 0.025. This could be due to random variation in simulation or some subtle systematic bias. Further investigation may be warranted. When correlation between recurrent hospitalization and death is strong, the joint frailty model analysis can be a preferred approach than the Anderson-Gill method. Limited simulation results using the joint frailty model yield similar trend observed above.

Discussions

Sample size re-estimation based on interim result through a rigorous statistical scheme can help achieve trial success in a more robust way. The statistical methodology is mature in the general setting when the endpoint of interest is binary, continuous, or time-to-event. However, in the complex situations when the endpoint is recurrent hospitalization in the joint frailty setting or composite endpoint in the win-ratio setting, existing literature may not offer intuitive and easy-to-implement perspective for nonsophisticated practitioners.

The proposed approach provides a practical and simple approach that only utilizes Z value at the interim and the planned sample size to approximate the conditional power so that sample size re-estimation can be readily performed regardless of the complexity of the underlying situation. Though the Lan and Wittes B-value method generalized the conditional power calculation through the elegant Brownian motion process, they did not fully expand their theory into sample size re-estimation and simplify the trial design modification process. The Gao et al or the Mehta and Pocock methods did generalize but still requires some sophistication. The proposed approach offers a practical perspective that greatly simplifies the sample size re-estimation for a wide variety of scenarios. In fact, the conditional power formula in this paper matches that from Lan and Wittes and that from Mehta and Pocock after some simple algebraic manipulation.

The key assumption is that the interim Z test statistics and the second-stage Z test statistics behave similarly in an asymptotic sense. Though the enrollment pattern, event distribution pattern, nonconstant risk ratio between treatment arms, overflow of information for the interim cohort, and other factors can render the two Z test statistics different, the simulation in this paper has not yet revealed any negative consequence of the proposed approach. It would be of interest to see a tangible negative effect through more simulations performed in more extreme situations.

In practice, it is usually sufficient to define the promising zone as conditional power >50%. In case a more precise promising zone is desired, the Mehta and Pocock method can be used. This approach can be easily extended to single-arm studies and group sequential setting. The proposed method in this paper calculates the conditional power assuming the true treatment effect aligns with the observed interim test statistics z ₁. Some practitioners may prefer maintaining the original assumed treatment effect when calculating the conditional power. In such cases, one needs to rely on alternative approaches to obtain the expectation of Z ₂ (eg, simulation).

Since adaptive designs using conditional error¹¹ can have certain favorable properties, the idea of this paper may be worth expanding into that direction. Besides the simulations performed on the complex situations to verify the validity of the proposed method, the utility of this approach has also been confirmed in the most commonly encountered settings for endpoints including continuous, binary, and time-to-event, and for trial design types including superiority and non-inferiority. In such sense, the proposed method provides a unified, and importantly, simplified approach to facilitate an adaptive sample size reestimation.

Concluding Remarks

Footnotes

Acknowledgment

The authors thank the two referees for their valuable comments that improved the manuscript.

Declaration of Conflicting Interests

No potential conflicts of interest were declared.

Funding

No financial support of the research, authorship, and/or publication of this article was declared.

ORCID iD

Jin Wang, PhD

Ethical Approval

This article does not contain any studies with human or animal subjects performed by any of the authors

References

FDA

. Adaptive designs for clinical trials of drugs and biologics guidance for industry. FDA Guidance Documents. September 2018.

Chen

DeMets

Lan

KKG

. Increasing the sample size when the interim results are promising. Stat Med 2004;23:1023–1038.

Mehta

Pocock

. Adaptive increase in sample size when interim results are promising: A practical guide with examples. Stat Med 2011;30:3267–3284.

Gao

Ware

Mehta

. Sample size re-estimation for adaptive sequential design in clinical trials. J Biopharm Stat. 2008;18:1184–1196.

Lan

KKG

Wittes

. The consultant’s forum the B-value: a tool for monitoring data. Biometrics. 1988;44:579–585.

Bauer

Bretz

Dragalin

Konig

Wassmer

. Twenty-five years of confirmatory adaptive designs: opportunities and pitfalls. Stat Med. 2016;35:325–347.

Cui

Hung

Wang

. Modification of sample size in group sequential clinical trials. Biometrics. 1999;55:853–857.

Rondeau

Mathoulin-Pelissier

Jacqmin-Gadda

Brouste

Soubeyran

. Joint frailty models for recurring events and death using maximum penalized likelihood estimation: application on cancer events. Biostatistics. 2007;8:708–721.

Finkelstein

Schoenfeld

. Combining mortality and longitudinal measures in clinical trials. Stat Med. 1999; 18: 1341–1354.

10.

Anderson

Gill

. Cox’s regression model for counting processes: a large sample study. Ann Stat. 1982;10:1100–1120.

11.

Muller

Schafer

. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches. Biometrics 2001;57:886–891.