Clinician-trialist rounds: 18. Should young (and old!) clinician-trialists perform single-arm Phase II futility trials?

Abstract

Today’s case

Having completed formal training in both stroke neurology and clinical epidemiology, you have been recruited to a dream job^* with a group of outstanding clinical neurologists who wish to launch a major new program of RCTs, focusing initially on neuroprotective agents for improving the short-term outcomes of stroke patients. The first agent on their list of prospects has performed well in Phase I trials, and its safety and dosing have been sorted out in Phase II trials. Excited at the prospect of your first big trial, you do all the right things: you read all about the drug, meet repeatedly with your senior clinical colleagues to develop a PICOT question for a Phase III trial, perform a systematic literature review of prior neuroprotective trials, and invite a statistician-trialist from across campus to join as a co-investigator before you write the first draft of your protocol.

Your literature review produces a Mikado-like ‘short, sharp shock’ when you discover that over a hundred previous Phase III neuroprotective trials have failed to demonstrate clinically important benefits, and you realize that the time required to enroll and follow sufficient participants for your trial would consume at least 3 years of your maximum effort, with shaky prospects for a positive result.

Disheartened, you almost cancel your appointment with your prospective statistician co-investigator, but keep it out of courtesy and curiosity. Lucky for you that you did, because she suggests a scientific strategy that would drastically reduce the time and effort required to determine whether you were backing a ‘loser’ drug or a possible ‘winner’.

Your PubMed search for ‘acute ischemic stroke’, with filters for ‘Phase III trials’, ‘systematic review’, and ‘humans’ unearthed the systematic review by Chelsea Kidwell and colleagues [1] of English-language ‘controlled acute ischemic stroke clinical treatment trials’ in which they studied 114 Phase III RCTs that tested neuroprotective agents, either alone (88) or in combination with rheologic/antithrombotic agents (26). Alas, they documented that only 3 of them had generated statistically significant (but still probably not clinically important) benefit! Accordingly, the ‘prior probability’ that your Phase III trial of yet another neuroprotective agent would generate a clinically and statistically significant benefit took a nose-dive, forcing you to rethink how you should direct your scarce time and energy for the next few years, and questioning whether you should put all your eggs into such a fragile Phase III basket.

Here is where the early recruitment of a statistician co-investigator provides not only access to a higher level of statistical expertise, experience, and judgment but also a wider array of statistical strategies for identifying and solving challenges in the design, execution, analysis, and interpretation of any trial: in this case, strategies for helping your first big trial avoid the same, dismal fate as that mass of prior neuroprotective trials. Indeed, your new statistical colleague introduces you to the work of two other biostatisticians, Yuko Palesch and Barbara Tilley, who led a group of us in exploring whether it might have been possible to avoid not only that unsuccessful expenditure of participants’ good will and well-being, but also their study teams’ efforts and the consumption of tens of millions of research dollars in those ‘negative’ Phase III neuroprotective trials [2].

Borrowing a strategy sometimes employed by oncology trialists, the Palesch–Tilley team explored whether these ineffective neuroprotective agents could have been exposed as ‘futile’ before they ever got to Phase III. The strategy they employed was the single-arm ‘Phase II Futility Trial’, and its concept and execution are delightfully simple (especially to the non-statistician): having designated the event rate among patients on active treatment that would demonstrate efficacy (e.g., the experimental event rate stated in the corresponding Phase III trial protocol), one simply treats everyone and compares their event rate against this efficacy criterion. If treated patients suffer events above that criterion for efficacy, the treatment is judged ‘futile’, and it is not carried forward into a Phase III trial.

When the Palesch–Tilley team applied this strategy to the datasets of 6 Phase III neuroprotective trials, the results were staggering. For example, the protocol for a Phase III trial of fosphenytoin specified that demonstrating a clinically important benefit in disability and dependence at 3 months (based on the modified Rankin scale) would require about 600 patients (300 treated and 300 controls). After 4 years of recruitment, 462 patients had been entered, and an interim analysis stopped the trial for a lack of efficacy (P = 0.87). However, applying the single-arm Phase II futility strategy of comparing the Rankin scores of only the successively admitted treated (experimental) participants against the scores they’d have to achieve for the treatment to meet the efficacy target specified in the Phase III protocol, the treatment would have been declared futile after just 19 treated participants (3% of the projected sample size specified in the Phase III protocol and 4% of the number of participants enrolled before the Phase III trial was abandoned). For two other agents, the Phase II futility strategy would have stopped the corresponding Phase III trials after 28% and 29% of their enrollments at the time they were terminated, respectively, corresponding to 18% and 17% of the sample sizes specified in their Phase III protocols. The three other Phase III neuroprotective trials they assessed all ‘passed’ their Phase II futility trial hurdles, and included the only Phase III trial that generated worthwhile improvements in some of its outcomes. More recently, single-arm Phase II futility trials have been used to decide which among a wide array of ‘promising’ treatments for stroke and Parkinson disease ought to be discarded [3], and which are worth carrying on to Phase III.

Phase II futility trialists tell me that they expose ineffective treatments with about ¼ of the participants required to reach the same negative conclusion in a Phase III trial [4]. Note, however, that the roles of the alpha and beta errors are different in futility trials, where the Type I (alpha) error becomes the chance of calling an effective treatment futile (and is typically adjusted to reduce the risk of rejecting a truly efficacious treatment as ‘futile’), and the Type II (beta) error becomes the chance of failing to identify a futile treatment.

Stage II futility trials are most applicable when the beneficial effects of therapy are objective and occur early, and when there are a large number of promising treatments competing for our attention. And, three caveats must be borne in mind when interpreting a ‘negative’ single-arm Phase II futility trial that doesn’t label a treatment as futile. First, it does not provide sufficient evidence for the efficacy of that treatment; such a conclusion requires a Phase III trial. Second, a ‘negative’ Phase II futility trial is no guarantee that a subsequent Phase III trial of its treatment will generate a positive result. Third, just as in a protocol for a Phase III trial, your hypothesized ‘Experimental Event Rate’ against which the Phase II futility trial outcome is compared is only as good as your literature review, clinical expertise, and therapeutic judgment.

Back to today’s case

With your new statistician co-investigator, you design a single-arm Phase II futility trial of your candidate neuroprotective agent, explain it well enough to your boss and clinical colleagues to gain their understanding and support, explain it well enough to a granting agency to get it funded (they are awed by your small budget), and gain the collaboration of sufficient clinical colleagues to recruit sufficient participants (about a quarter of the number required for a Phase III trial of the same agent) to generate a clear conclusion. Alas, the conclusion you reach is that it would be futile to carry it on to a Phase III trial. Your disappointment is more than offset by the praise you receive from clinical colleagues for protecting their time, effort, and patients from a futile Phase III trial, by their nomination of the next neuroprotective to test with your growing team of enthusiastic investigators, and by your recruitment of an energetic and effective trial staff (bridge-funded until your next successful grant application by your appreciative chair). In addition, by not putting all your eggs in a doomed Phase III basket, you’ve had time to first-author an update on the Kidwell systematic review, start an N-of-1 trial service[5], co-author a clinically oriented review of futility trials with your statistician-colleague, and lead-author a neurology-trainee study on the accuracy and precision of retinal-vein pulsation as a non-invasive measure of increased intracranial pressure.

As usual, this Round isn’t over yet, because our discussion period has just begun. Rounders who have other or contrary thoughts about Phase II futility trials, or have questions or comments about the ones presented here, are encouraged to send them to the Editors, with a copy to Dave at sackett@bmts.com. He’ll summarize them in a later round.

Footnotes

*

25% clinical:75% research; your choice of mentor; associate membership in the department of clinical epidemiology; and ready access to friendly, established trialists and statisticians across campus.

References

Kidwell

Liebeskind

Starkman

Silver

. Trends in acute ischemic stroke trials through the 20th century. Stroke 2001; 32: 1349–59.

Palesch

Tilley

Sackett

Johnston

Woolson

. Applying a phase II futility study design to therapeutic stroke trials. Stroke 2005; 36: 2410–14.

Tilley

Palesch

Kieburtz

. Optimizing the ongoing search for new treatments for Parkinson disease: Using futility designs. Neurology 2006; 66: 628–33.

Palesch

. Personal communication to the author. May 2013.

Sackett

. Clinician-trialist rounds 4: Why not do an N-of-1 RCT? Clin Trials 2011; 8: 350–52.