Adaptive Seamless Design for Establishing Pharmacokinetic and Efficacy Equivalence in Developing Biosimilars

Abstract

Background:

Recently, numerous pharmaceutical sponsors have expressed a great deal of interest in the development of biosimilars, which requires clinical trials to demonstrate that the pharmacokinetic (PK) and clinical efficacy are equivalent. Pharmacodynamics (PD) may be used in evaluating efficacy if there are relevant PD markers available. However, in their absence, it is necessary to design the associated clinical trials to include efficacy measures as the primary endpoint.

Methods:

In this study, we propose a novel adaptive seamless PK and efficacy design with an efficient framework to remedy the risk of misspecification of efficacy parameters and to discontinue the trial evaluating the efficacy for futility based on the PK evaluation. Here, we consider the clinical development of biosimilars including their evaluation in patients rather than healthy volunteers under a situation where both PK and efficacy parameters are required to demonstrate equivalence. The original idea of the proposed method was to organize a clinical trial that includes the statistical analysis of PK as an interim analysis, with sample size recalculation of the efficacy data.

Results:

Our simulation study indicated that the proposed design would allow trials to be more efficient than with the classical design.

Conclusions:

This proposal provides appealing advantages, such as a shorter time period, additional cost savings, and a smaller number of patients required.

Keywords

adaptive seamless PK and efficacy design biosimilar infliximab clinical trials sample size recalculation

Introduction

Adaptive seamless designs are rapidly becoming more attractive to numerous pharmaceutical sponsors in every phase of the clinical trial process, including adaptive seamless phase I/II or adaptive seamless phase II/III trials. In the context of confirmatory clinical trials, numerous statisticians interested in these methodologies have focused on superiority trials.^1
–3 However, other types of trials, such as non-inferiority and equivalence between test and reference products, have also been attractive to the pharmaceutical industry. In particular, equivalence trials, called bioequivalence trials, are applied in the development of generic products. To confirm bioequivalence, pharmacokinetic (PK) parameters, such as area under the concentration-time curve (AUC) and maximum serum concentration (Cmax), are typically evaluated in healthy volunteers using two one-sided tests (TOST)⁴ with log-transformed values. Although some methods, such as the adaptive sample size recalculation, have been proposed for bioequivalence trials,^5

–8 these are rarely used in the actual clinical setting. Because such trials are conducted by recruiting generally between 20 and 50 volunteers, inclusion of interim analyses in the trials is usually not reasonable.

While numerous pharmaceutical sponsors have expressed interest in bioequivalence trials, considerable interest in developing biosimilars is also growing.⁹ There is no unified definition for biosimilars; however, according to the guideline of European Medicines Agency, a biosimilar is defined as a biological medicinal product that contains an active substance similar to that of the original previously authorized biological medicinal product.¹⁰ Biosimilars differ from generic chemical products, for example, with respect to the complexity and heterogeneity of the molecular structure.^9,11 A reduction in healthcare costs for patients can be expected if a biosimilar is approved by regulators and placed on the market. However, characteristically, a larger number of subjects would be required to investigate and clinically develop a biosimilar than that is required for the development of a generic product because there are regulatory requirements that encourage sponsors to provide pharmacodynamic (PD) or efficacy data in addition to PK data.^10,12 That is, there must be no clinically meaningful difference between the test and reference products. Moreover, as far as the PK data goes, a PK trial is sometimes conducted using a parallel-group clinical trial design instead of a crossover design owing to a high risk of immunogenicity.¹⁰ In this instance, a large number of patients would be required to declare PK equivalence. For instance, a trial that was primarily aimed at confirming the PK equivalence was conducted with 250 recruited patients based on the assumption that the coefficient of variation (CV) was 50%.¹³

In anticipation of the impending expiration of a number of patents for biological medicinal products in numerous countries, some statistical methods have been developed for evaluating biosimilarity. Methods for assessing biosimilarity with respect to variability between the test and reference products have been investigated.^14

–18 As a biosimilarity measure, a biosimilar index based on the concept of reproducibility probability has been proposed and discussed.^19
–21 Chiu et al²² discussed the use of a Bayesian method that uses prior information. Pan et al²³ proposed a Bayesian group sequential design that incorporates information adaptively using a calibrated power prior. Chow et al²⁴ proposed methods for assessing biosimilarity based on the assumption that a biomarker is predictive of the clinical outcome. Li et al²⁵ proposed a biosimilarity trial design for evaluating clinical efficacy with asymmetrical margins. Liao and Darken²⁶ developed a method for assessing biosimilarity by comparability of critical quality attributes. Furthermore, a three-arm parallel design, which consists of one test and two reference products from two different batches, was provided to investigate biosimilarity.²⁷ When the three-arm parallel design is employed, the approach with the use of the frequency estimator criterion was also proposed to assess biosimilarity.²⁸ In summary, most methods currently focus on one specified trial. Therefore, little methodology that enables the performance of multiple trials seamlessly has been developed.

Here, two main trials served as motivating examples for confirming PK and efficacy in establishing the equivalence of a biosimilar of the innovator infliximab (Remicade).^13,29 These trials indicated that a clinical trial to establish the efficacy equivalence is often required in case any relevant PD markers are unavailable and that a PK trial could be conducted in patients rather than healthy volunteers. Although these trials were conducted separately, they each had a primary endpoint that was intrinsically set to determine PK and efficacy equivalence, respectively.

Based on this motivating example, in this study, we considered the clinical development of biosimilars with an emphasis on the necessity of demonstrating equivalence between the test and reference products by including both PK and efficacy as primary endpoints. We assumed that patients with the same disease conditions were targeted to provide equivalence data for the PK and efficacy. Methods using adaptive seamless designs, which allow sample size recalculation based on interim data, were applied in this setting. The adaptive seamless PK and efficacy design, which incorporates trials to establish both PK and efficacy equivalence, allows trials to be more efficient than classical trial designs.

This paper is structured as follows. First, we introduce a motivating example for the development of one biosimilar. Next, we propose a novel adaptive seamless PK and efficacy design and then lay the framework of our proposed design in the section “Methods.” The section “Simulation Study” presents a simulation study. Finally, in the section “Discussion and Conclusions,” we conclude the paper with a discussion.

Motivating Example

First, we introduce a motivating example for the development of a biosimilar to the innovator infliximab (Remicade), which is a monoclonal antibody against tumor necrosis factor-alpha that is used to treat patients with active rheumatoid arthritis who have shown an inadequate response to methotrexate.³⁰ However, a major hurdle for patient access is the high medication cost. Therefore, there is a lot of interest in developing a biosimilar of the innovator monoclonal antibody. Biosimilars of infliximab (Inflectra and Remsima) were recently launched in several markets across numerous countries.³¹ During the development of Remsima, two main clinical trials were performed. The trials were PK and phase III studies that demonstrated equivalence using PK and efficacy endpoints, respectively.^13,29 Both trials were designed as randomized, double-blind, parallel-group clinical trials, and their primary endpoints were the AUC and Cmax for the PK trial and the American College of Rheumatology 20% (ACR20) improvement³² for the efficacy trial.

For the PK study, the required sample size was calculated to be 196 based on the equivalence margin of 80% to 125%, power of 90%, and TOST with a significance level of 5% under the assumption of a geometric mean ratio of 1.00 and a CV of 50%. In addition to PK, efficacy and safety were also compared in this study. In contrast, the sample size of the phase III study was 468, which was the sample size required to achieve 80% power to meet the equivalence margin within ±15% for ACR20 at a specific time point under the TOST with a significance level of 2.5%, assuming an expected response rate of 50% in both groups. As secondary endpoints, additional efficacy, immunogenicity, safety, PK, and PD were assessed; however, no adjustments for multiplicity were performed.

Consequently, equivalence for the primary PK and efficacy endpoints was established in each trial. Each confidence interval (CI) was in the range of the corresponding equivalence margins. For the PK parameters, the geometric mean ratios (90% CI) were 1.05 (0.94-1.16) and 1.02 (0.95-1.09) for the AUC and Cmax, respectively. As an efficacy parameter, the ACR20 response rates for each group were 60.9% and 58.6%, and the difference in the response rates (95% CI) was 2% (−6% to 10%) for the intention-to-treat population, whereas the ACR20 response rates were 73.4% and 69.7%, with a difference of 4% (−4% to 12%) for the per-protocol population.

Note that each between–product group difference observed varied from the prespecified settings. In confirmatory clinical trials, misspecifications often occur regardless of careful planning. For example, the observed geometric mean ratio of the AUC deviated from the prespecified value of 1.00, whereas the observed response rates for each group in the per-protocol population differed greatly from the prespecified value of 50%. Even if a sponsor in one country designs a trial based on trials conducted in other countries, the observed values from that trial will not necessarily be consistent with those of the other countries owing to reasons such as possible race- or measurement technique–related differences between countries.^29,33 These misspecifications have been accounted for by recruiting more patients, which allows for more drop-out or exclusion of population sets than required. In addition, the observed response rates for each group differentially deviated from the expected value of 50%, and this was a conservative setting because the range of the CIs tends to widen. Because of these remedies, the targeted sample sizes for both trials were in effect set to approximately 1.15 to 1.20 times the required sample size. Instead, to compensate for this risk of the failure to demonstrate equivalence, a sponsor could consider increasing the sample size in midcourse, that is, sample size recalculation within adaptive seamless design.

Methods

In this section, we consider the clinical development of biosimilars under a randomized parallel-group design with two arms, test (T) and reference (R) products. Similar to the motivating examples of the Remsima,^13,29 we assume that a crossover design is not a feasible option for the trial design because of the associated problem of carryover effects, although crossover designs are often used in the development of biosimilars where recruitment for the study population is targeted at healthy volunteers.^34,35 Based on the motivating example involving two trials, a single trial is conducted for the evaluations of both the PK and efficacy, and their confirmations constitute the interim and final analyses, respectively. To compensate for the misspecification of parameters at the planning stage, an interim analysis will be performed for the final efficacy confirmation that requires more patients than the PK confirmation. Figure 1 shows the proposed framework of the adaptive seamless PK and efficacy design. Note that the objective of the trial is to determine the equivalence of both PK and efficacy.

Figure 1.

Proposed adaptive seamless pharmacokinetic (PK) and efficacy design. SSR, sample size recalculation.

Collecting Both PK and Efficacy Data From the First Stage

Here, we considered that not only PK data but also efficacy data could be obtained from the PK trial. To achieve this, we assumed that both the PK and efficacy trials should be conducted with patients who have a common disease and are receiving similar dosage regimens.

The strength of this study is that it constitutes a clinical trial that includes the statistical analysis of PK as an interim analysis of the efficacy data. Therefore, we consider a two-stage design based on the assumption that both the PK data and efficacy data are obtainable from the first stage, as described in Figure 1. The data from the first stage can subsequently be used to test the PK equivalence and arrive at an interim decision for the efficacy equivalence, whereas only the efficacy data are obtainable from the second stage to test the efficacy equivalence.

Sample Size Adjustment for Efficacy Endpoint of Interim Analysis

The interim analysis conducted in an unblinded fashion allows for the option to recalculate sample size for the subsequent stage if the interim result indicates a potential benefit for sample size recalculation. Note that the interim decision for the subsequent stage is only based on the efficacy data of the interim analysis.

Let Z₁ and Z₂ denote the test statistics until the interim and final analysis, respectively. Without any interim analysis from the first-stage data, the hypothesis testing would be conducted conventionally, with $Z_{2} > z_{α}$ as the one-sided significance level α where z_α is the critical value for a superiority clinical trial. In addition, the sample size is estimated to achieve 1 − β power under an alternative hypothesis.

Mehta and Pocock³⁶ have proposed methods in which the sample size is increased if the interim results are promising and the decision is made based on the conditional power. Recently, a thorough investigation of this approach has been conducted and reported in several articles in the context of a superiority trial.^37,38 In this study, we consider this approach in evaluating the equivalence of an efficacy endpoint. The conditional power given by $Z_{1} = z_{1}$ with the observed difference between the groups derived from the interim efficacy data is expressed using the following equation³⁹:

\begin{array}{l} C P (z_{1}, {\tilde{N}}_{2}) = \Pr (Z_{2} > z_{α} | Z_{1} = z_{1}) \\ = 1 - Φ (\frac{z_{α} \sqrt{N_{2}} - z_{1} \sqrt{N_{1}}}{{\tilde{N}}_{2}} - \frac{z_{1} \sqrt{{\tilde{N}}_{2}}}{N_{1}}), \end{array}

where N₁ and N₂ are the planned sample sizes for the interim and final analysis, respectively, and ${\tilde{N}}_{2}$ is the increment in the sample size during the second stage, denoted by ${\tilde{N}}_{2} = N_{2} - N_{1}$ .

The promising zone approach arrives at the interim decision by defining the region that represents the promising zone as follows:

Case 1 (Promising): $c p_{\min} \leq C P (z_{1}, {\tilde{N}}_{2}) < 1 - β$ → Increase the sample size to $N_{2}^{*}$ ;

Case 2 (Otherwise, ie, favorable or unfavorable): → Continue to the planned N₂,

where cp_min is the prespecified lower probability of the conditional power. If the interim conditional power is deemed promising, the sample size is increased to $N_{2}^{*} = \min (N_{2}' (z_{1}), N_{\max})$ , where the maximum for $N_{2}^{*}$ is less than $N_{\max} and where N_{2}' (z_{1})$ consists of the sum of N₁ and ${\tilde{N}}_{2}' (z_{1})$ , which is the increment in the sample size when the sample size is increased from the planned N₂. To satisfy the $C P (z_{1}, {\tilde{N}}_{2}) = 1 - β$ on the condition of the promising zone, the increased sample size is derived as

{\tilde{N}}_{2}' (z_{1}) = (\frac{N_{1}}{z_{1}^{2}}) {[\frac{z_{α} \sqrt{N_{2}} - z_{1} \sqrt{N_{1}}}{\sqrt{N_{2} - N_{1}}} - z_{β}]}^{2},

with the restriction that the maximum sample size increase is N_max. Thereafter, the critical value for the final analysis, in exchange for z_α, can be adjusted to

z' (z_{1}, N_{2}^{*}) = \frac{1}{\sqrt{N_{2}^{*}}} [\sqrt{\frac{N_{2}^{*} - N_{1}}{{\tilde{N}}_{2}}} (z_{α} \sqrt{N_{2}} - z_{1} \sqrt{N_{1}}) + z_{1} \sqrt{N_{1}}],

which holds that $\Pr {Z_{2}^{*} > z' (z_{1}, N_{2}^{*})} = α$ , where $Z_{2}^{*}$ is the test statistic for the final analysis using $N_{2}^{*}$ instead of N₂.⁴⁰ When the sample size is increased to $N_{2}^{*}$ , the power for performing the final analysis is greater under the critical value $z' (z_{1}, N_{2}^{*})$ than it is under the z_α.

The final analysis would also be conducted using $Z_{2}^{*} > z_{α}$ , similar to the conventional hypothesis testing and in line with the rule that the sample size is only increased if the interim conditional power is deemed promising. Note that the inflation of the type I error rate would occur if the decision rule is not adhered to by the promising zone approach.⁴¹

With an equivalence trial, the interim result, z₁, for efficacy is constructed using the following null hypotheses for the efficacy endpoint:

H_{0}^{Eff} : H_{0, U}^{Eff} \cup H_{0, L}^{Eff},

that is,

H_{0, U}^{Eff} : p_{T} - p_{R} \geq Δ and H_{0, L}^{Eff} : p_{T} - p_{R} \leq - Δ,

where p_g is the response rate for the group $g \in {E, R}$ and Δ is a prespecified equivalence margin of efficacy. Moreover, the alternative hypotheses for the efficacy are shown by

H_{A}^{Eff} : H_{A, U}^{Eff} \cap H_{A, L}^{Eff},

that is,

H_{A, U}^{Eff} : p_{T} - p_{R} < Δ and H_{A, L}^{Eff} : p_{T} - p_{R} > - Δ .

PK Equivalence Including Remedy for Insufficient Sample Size

We shall begin this section by describing the hypotheses for the PK evaluation. For the PK equivalence, the null hypotheses for PK are constructed as follows:

H_{0}^{PK} : H_{0, U}^{PK} \cup H_{0, L}^{PK},

that is,

H_{0, U}^{PK} : X_{T} - X_{R} \geq θ and H_{0, L}^{PK} : X_{T} - X_{R} \leq - θ

where X_g is the log-transformed mean of the PK parameters such as the AUC or Cmax for group $g \in {T, R}$ and θ is the prespecified equivalence margin of the PK endpoint. Thus, the alternative hypotheses for the PK are as follows:

H_{A}^{PK} : H_{A, U}^{PK} \cap H_{A, L}^{PK},

that is,

H_{A, U}^{PK} : X_{T} - X_{R} < θ and H_{A, L}^{PK} : X_{T} - X_{R} > - θ .

In the bioequivalence trial, the θ is traditionally set as the log-transformed 1.25, which corresponds to a range of 80% to 125% under the $H_{A}^{PK}$ .⁴² Furthermore, θ is usually set in a similar way in the development of biosimilars. In this study, we consider a biosimilar development process where it is necessary to demonstrate the equivalence of both the PK and efficacy endpoints. Hence, the null and alternative hypotheses H₀ and H_A, respectively, are expressed as

H_{0} : H_{0}^{PK} \cup H_{0}^{Eff}_{VS.} H_{A} : H_{A}^{PK} \cap H_{A}^{Eff} .

Note that the H₀ is rejected only if the equivalence for both the PK and efficacy is established. This meets the requirement based on several guidelines in which both the PK and efficacy are required to show equivalence when developing biosimilars.^10,12

Figure 2 shows the detailed framework used to declare the equivalence of both the PK and efficacy within the adaptive seamless PK and efficacy design as described in Figure 1. With respect to controlling the type I error rate for PK and efficacy equivalence under each significance level, a fixed sequence testing procedure⁴³ is incorporated into this framework. That is, the efficacy equivalence is tested only if the PK equivalence is declared. As shown in Figure 1, it organizes a clinical trial that includes the statistical analysis of PK as an interim analysis of the efficacy data because we considered that both PK data and efficacy data could be obtained from the sample size N₁ in the PK trial. If PK equivalence fails to be declared, subsequent collection of efficacy data is discontinued for futility. On the other hand, collecting efficacy data is subsequently continued if PK equivalence is declared. Thereafter, sample size recalculation from N₂ to $N_{2}^{*}$ is conducted following the promising zone approach in Case 1, as shown in Figure 2, while the enrolment is continued to the planned N₂ in Case 2.

Figure 2.

Framework that determines equivalence of both pharmacokinetic (PK) and efficacy. “EQ” and “not EQ” denote where equivalence is declared and not declared, respectively. SSR, sample size recalculation; TOST, two one-sided tests.

Simulation Study

Here, we evaluated the efficiency of the adaptive seamless design proposed in this study. This is to enable the presentation of their operating characteristics based on the design parameters such as the geometric mean ratio and CV in the PK trial, and response rates for each group in the efficacy trial. The assumptions made for the PK and efficacy trial designs and the simulation setting are presented in the subsection “Simulation Setting.” In the subsection “Simulation Results,” we demonstrate the effectiveness of the proposed adaptive seamless PK and efficacy design based on the power and expected sample size.

Simulation Setting

First, we consider a randomized, parallel-group clinical trial with two arms, which are the test (T) and reference (R) product arms. We assume that E is set as the biosimilar, and the sample size recalculation for the efficacy endpoint under the adaptive seamless PK and efficacy design as described in the section “Methods” is conducted with an interim analysis, which plays a role in the statistical analysis of the PK data.

Based on the examples,^13,29 we suppose that the overall planned sample size is set as $N_{2} = 480$ and the interim analysis is conducted after N₁= 200 are recruited in the trial. The set of $N_{2} = 480 and N_{1} = 200$ corresponds to 80% power for efficacy, assuming 50% response rates and 90% power for the PK and a geometric mean ratio and CV of 1.00 and 50%, respectively.^13,29 In addition, the underestimated sample size, $N_{1} = 120$ , was also considered. The equivalence margins are set at ±15% for efficacy, which is binary data, and a range of 80% to 125% for the PK, using one-sided significance levels of 2.5% and 5% for efficacy and PK, respectively.

To assess the sample size recalculation for efficacy equivalence, the lower probability of the conditional power cp_min is set at 50% or 33%.^36,44 We further assume that the magnitude of the sample size increase is set as $r_{\max} = 2.0$ , which denotes the ratio of N_max to N₂. The assessment of the power of the efficacy is performed using the true difference between the response rates of the product groups denoted as δ of 0% to 5%.

For reference, the power and expected sample size are also illustrated when a fixed design is used in exchange for the adaptive seamless PK and efficacy design. Regarding the expected sample size using the fixed design, we consider that the PK and efficacy trials are conducted separately, suggesting that the efficacy data are not available for the PK confirmation. That is, the PK trial is assumed to be conducted and the efficacy trial is subsequently conducted separately under the fixed design. Hence, if the PK equivalence fails to be declared, the efficacy trial is not implemented.

Simulation Results

We evaluated the efficiency of the adaptive seamless PK and efficacy design proposed based on the type I error rate, power, and expected sample size. The calculation was based on 500,000 simulations with respect to the type I error rate and 10,000 with respect to power and expected sample size to display the operating characteristics.

Table 1 shows the probabilities of rejecting the null hypotheses $H_{0}^{PK} or H_{0}^{Eff}$ . As shown in Table 1, the type I error rates of the PK and efficacy endpoints are controlled under the significance levels of 5% and 2.5%, respectively. In particular, the type I error rate of the efficacy endpoint is confirmed to be controlled even when it includes a sample size recalculation as long as multiplicity adjustments are included as described in the subsection “Sample Size Adjustment for Efficacy Endpoint of Interim Analysis,” in contrast with that without multiplicity adjustments. Therefore, the probability of rejecting the null hypothesis $H_{0} : H_{0}^{PK} \cup H_{0}^{Eff}$ is also controlled owing to the fixed sequence testing procedure for PK and efficacy equivalence, as described in the subsection “PK Equivalence Including Remedy for Insufficient Sample Size.”

Table 1.

Type I Error Rates.^a

cp _min	N ₁	Rejecting $H_{0}^{PK}$	Rejecting $H_{0}^{Eff}$ With Multiplicity Adjustments	Rejecting $H_{0}^{Eff}$ Without Multiplicity Adjustments
0.33	200	0.050	0.024	0.029
	120	0.050	0.023	0.027
0.50	200	0.050	0.023	0.029
	120	0.050	0.023	0.027

^aCalculation was performed under the assumption that $N_{2} = 480, r_{\max} = 2.0$ , and an expected response rate of 50% as a function of lower probability of conditional power (cp_min) and planned sample size until interim analysis (N₁). Regarding the probability of rejecting $H_{0}^{Eff}$ , sample size recalculation is incorporated.

The comparison of powers between the fixed design and the adaptive seamless PK and efficacy design in the final analysis is shown in Table 2. These powers are defined as the probability of rejecting $H_{0} : H_{0}^{PK} \cup H_{0}^{Eff}$ when using the adaptive seamless PK and efficacy design. The result revealed that the design is more powerful when using the adaptive seamless PK and efficacy design owing to the incorporation of sample size recalculation. Note that the adaptive seamless PK and efficacy design would minimize the risk of misspecification of the prespecified parameters of the efficacy endpoint. In particular, it is possible to compensate for the power using this design under the situation where the conditional power falls within the range of the promising zone in the interim analysis of the efficacy and sample size is increased.

Table 2.

Powers of Adaptive Seamless Pharmacokinetic (PK) and Efficacy and Fixed Designs.^a

cp _min	N ₁	δ(%)	Fixed Design	Adaptive Seamless PK and Efficacy Design
0.33	200	0.00	0.740	0.771
		0.01	0.731	0.763
		0.02	0.700	0.740
		0.03	0.658	0.701
		0.04	0.599	0.645
		0.05	0.530	0.575
	120	0.00	0.523	0.547
		0.01	0.519	0.544
		0.02	0.499	0.528
		0.03	0.467	0.489
		0.04	0.425	0.454
		0.05	0.376	0.399
cp _min	N ₁	δ(%)	Fixed Design	Adaptive Seamless PK and Efficacy Design
0.50	200	0.00	0.740	0.761
		0.01	0.731	0.755
		0.02	0.700	0.726
		0.03	0.658	0.681
		0.04	0.599	0.627
		0.05	0.530	0.552
	120	0.00	0.523	0.528
		0.01	0.519	0.529
		0.02	0.499	0.511
		0.03	0.467	0.475
		0.04	0.425	0.434
		0.05	0.376	0.385

^aCalculation was performed under the assumption that $N_{2} = 480, r_{\max} = 2.0$ , and an expected response rate of 50% as a function of lower probability of conditional power (cp_min), planned sample size until interim analysis (N₁), and difference between response rates (δ).

We showed that the power is improved by the use of the frameworks for PK and the sample size recalculation in the promising zone approach for the efficacy of the adaptive seamless PK and efficacy design. The gain of power is attributable to the increase in sample size. Table 3 shows the expected sample size for achieving the corresponding power illustrated in Table 2. Note that the expected sample sizes for the fixed design do not exactly correspond to the sum of the interim and final sample size (that is, N₁ and N₂) because of the failure to declare PK equivalence in the PK trial preceding the efficacy trial. It is obvious that the approach that uses the adaptive seamless PK and efficacy design is more efficient than the approach that uses the fixed design, where N₁ is set at 200 and the expected sample size is decreased because the evaluations for both the PK and efficacy are conducted separately in the latter design. Hence, it is necessary to set a sample size that has sufficient power. Furthermore, the increase in power and expected sample size is higher when cp_min is set at a lower value.

Table 3.

Expected Sample Sizes Corresponding to Powers in Adaptive Seamless Pharmacokinetic (PK) and Efficacy and Fixed Designs.^a

cp _min	N ₁	δ (%)	Fixed Design	Adaptive Seamless PK and Efficacy Design
0.33	200	0.00	635.6	530.4
		0.01	635.6	529.8
		0.02	635.6	530.4
		0.03	635.6	531.7
		0.04	635.6	527.5
		0.05	635.6	524.1
	120	0.00	427.7	406.7
		0.01	427.7	405.0
		0.02	427.7	406.4
		0.03	427.7	404.1
		0.04	427.7	403.0
		0.05	427.7	401.6
cp _min	N ₁	δ (%)	Fixed Design	Adaptive Seamless PK and Efficacy Design
0.50	200	0.00	635.6	501.0
		0.01	635.6	500.3
		0.02	635.6	500.9
		0.03	635.6	499.7
		0.04	635.6	498.8
		0.05	635.6	495.5
	120	0.00	427.7	372.6
		0.01	427.7	371.5
		0.02	427.7	371.4
		0.03	427.7	371.1
		0.04	427.7	370.6
		0.05	427.7	369.5

^aCalculation was performed under the assumption that $N_{2} = 480, r_{\max} = 2.0$ , and an expected response rate of 50% as a function of lower probability of conditional power (cp_min), planned sample size until interim analysis (N₁), and difference between response rates (δ).

Discussion and Conclusions

The aim of this study was to propose a novel adaptive seamless PK and efficacy design for establishing the equivalence of both PK and efficacy in clinical trial phases for the development of biosimilars. The proposed design, which allows sponsors to develop biosimilars over shorter time periods, leads to additional cost savings, requires a fewer number of patients for trials, and would be an appealing strategy for the efficient implementation of clinical trials. This enhanced process would consequently accelerate product approval by regulatory agencies. Note that there is still controversy about statistical analysis for biosimilar development, even though related guidelines^10,12 have been issued from the regulatory agencies. For instance, choosing the margin, primary endpoint, and primary time point for efficacy represent the issues and challenges with respect to biosimilar development. Hence, consultation with regulatory agencies must be required before applying the proposed design, which has originality specific to biosimilar development and offers benefits even considering the issues and challenges.

Our proposed framework is an attractive option with respect to the total trial period as mentioned in the subsection “PK Equivalence Including Remedy for Insufficient Sample Size.” For efficacy equivalence, sample size recalculation was incorporated in the adaptive seamless PK and efficacy design to compensate for the risk of misspecification of the efficacy parameters. The study is subject to early termination in the efficacy part if PK equivalence fails to be declared. The power was improved as shown in Table 2, but not dramatically increased with sample size recalculation for the efficacy part because we considered that the preplanned sample size was adequate to achieve the target-level power of 80%. Note that the trial should be planned carefully to estimate the sample size and should not be set up deliberately with an underestimated sample size with insufficient power solely dependent on the sample size recalculation. With the planned sample size with insufficient power, the expected sample size using the adaptive seamless PK and efficacy design is larger than that using a fixed design; however, it is smaller when a sample size with sufficient power is used. The promising zone approach also enables the trial to avoid implementing further support when the interim result deems it obviously unpromising. This would reallocate and optimize the additional investment of resources. However, a downside of sample size recalculation is that the statisticians associated with the sponsor can grasp the interim conditional power based on the additional sample size to be enrolled during the subsequent stage. For the interim analysis, sample size recalculation should be conducted with strict adherence to the operating procedures, which are incorporated in the charter for the interim analysis while setting up an independent data monitoring committee. This step would prevent operational bias and maintain the trial integrity, even if the trial is an equivalence design, as the US Food and Drug Administration has cautioned.⁴⁵

It is noteworthy to mention that this work was limited to a specific situation where there are no relevant PD markers for measuring the efficacy in clinical trials. In addition, we propose the adaptive seamless PK and efficacy design with the restriction that both PK and efficacy trials are required to be conducted with patients. This is because the premise of this study was based on the characteristics of biosimilar development trials that are often conducted in patients rather than in healthy volunteers. Although healthy volunteers are used in most applications for biosimilars,³⁵ the development of biosimilars has a greater possibility of targeting patients hereafter owing to the high molecular complexity of biosimilars. For example, the biosimilar of trastuzumab (Herceptin), which is similar to the biosimilar infliximab because it is a monoclonal antibody, has been developed.⁴⁶ The proposed adaptive seamless PK and efficacy design would also be applicable in the development of the biosimilar trastuzumab because PK and phase III studies^47,48 have been conducted for patients with anti–human epidermal growth factor receptor 2 (HER2)–positive metastatic breast cancer to demonstrate equivalence for PK and efficacy endpoints, respectively. In addition, there would be a controversial issue with respect to the assumption that patients who have common diseases and are being treated with similar dosage regimens are targeted to provide equivalence data for PK and efficacy. Note that in the motivating example described in the subsection “Motivating Example,” the targeted patients for the PK study had ankylosing spondylitis,¹³ whereas those for the efficacy study had rheumatoid arthritis.²⁹ However, there is also an example in which targeted patients were rheumatoid arthritis even in the PK study.³³ Moreover, our framework was constructed using parallel-group clinical trial designs based on the two main trials performed in the development of Remsima.^13,29 Because crossover designs for PK trials are often used, the clinical trials that consist of PK evaluations using crossover designs and efficacy trials using parallel-group designs could be extended for use with the adaptive seamless PK and efficacy design. In this case, the adaptive sequential design used for PK confirmation^5–8 would be an additional option for the PK trial. As a further and practical consideration, a multiple testing issue for multiple PK endpoints would be needed in addition to the fixed sequence testing procedure considered between PK and efficacy endpoints because two PK endpoints, that is, AUC and Cmax, are often evaluated in practice in the PK trial.⁴⁹ In addition, several types of AUC are often set as primary endpoints. For instance, AUCs from time zero to predicted infinity and from time zero to the last measurable concentration were assessed in addition to Cmax as primary endpoints in the PK study within the development of the biosimilar adalimumab.⁵⁰ Further, other PK parameters, such as tmax, volume of distribution, and half-life, should be set as secondary PK endpoints, whereas AUC and Cmax are frequently set as primary PK endpoints.¹⁰ In the motivating PK trial,¹³ nine parameters were set as secondary endpoints, whereas AUC and Cmax were set as primary endpoints. Although multiplicity for secondary PK endpoints was not usually addressed and only primary PK endpoints were needed to demonstrate equivalence statistically, providing these secondary PK parameters is necessary to conclude biosimilarity in practice.

In conclusion, our study proposed a novel method for developing biosimilars using an adaptive seamless design that enables sample size recalculation based on interim data and incorporates trials to establish both PK and efficacy equivalence. Furthermore, the newly proposed design allows clinical trials to be more efficiently conducted than conventionally designed methods, thereby reducing costs, saving time, and providing an attractive option for pharmaceutical sponsors.

Footnotes

Acknowledgments

The authors thank the two reviewers for their helpful comments that greatly improved the paper.

Conference Presentation

This work was partially presented at the XXVIIIth International Biometric Conference in Victoria, Canada, July 10-15, 2016.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Bauer

Bretz

Dragalin

König

Wassmer

. Twenty-five years of confirmatory adaptive designs: opportunities and pitfalls. Stat Med. 2016;35:325–347.

Bretz

Koenig

Brannath

Glimm

Posch

. Adaptive designs for confirmatory clinical trials. Stat Med. 2009;28:1181–1217.

Uozumi

Hamada

. Interim decision-making strategies in adaptive designs for population selection using time-to-event endpoints. J Biopharm Stat. 2017;27:84–100.

Schuirmann

. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokinet Biopharm. 1987;15:657–680.

Montague

Potvin

DiLiberti

Hauck

Parr

Schuirmann

. Additional results for “Sequential design approaches for bioequivalence studies with crossover designs.” Pharm Stat. 2012;11:8–13.

Potvin

DiLiberti

Hauck

Parr

Schuirmann

Smith

. Sequential design approaches for bioequivalence studies with crossover designs. Pharm Stat. 2008;7:245–262.

Audet

DiLiberti

. Optimal adaptive sequential designs for crossover bioequivalence studies. Pharm Stat. 2016;15:15–27.

Zheng

Zhao

Wang

. Modifications of sequential designs in bioequivalence trials. Pharm Stat. 2015;14:180–188.

Chow

. Biosimilars: Design and Analysis of Follow-on Biologics. Boca Raton, FL: Chapman & Hall/CRC; 2013.

10.

European Medicines Agency. Guideline on similar biological medicinal products containing biotechnology-derived proteins as active substance: non-clinical and clinical issues. http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2015/01/WC500180219.pdf. Published 2014. Accessed December 23, 2016.

11.

Berghout

. Clinical programs in the development of similar biotherapeutic products: rationale and general principles. Biologicals. 2011;39:293–296.

12.

Food and Drug Administration. Guidance for industry: scientific considerations in demonstrating biosimilarity to a reference product. http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM291128.pdf. Published 2015. Accessed December 23, 2016.

13.

Park

Hrycaj

Jeka

. A randomised, double-blind, multicentre, parallel-group, prospective study comparing the pharmacokinetics, safety, and efficacy of CT-P13 and innovator infliximab in patients with ankylosing spondylitis: the PLANETAS study. Ann Rheum Dis. 2013;72:1605–1612.

14.

Belleli

Fisch

Renard

Woehling

Gsteiger

. Assessing switchability for biosimilar products: modelling approaches applied to children’s growth. Pharm Stat. 2015;14:341–349.

15.

Hsieh

Chow

Liu

Hsiao

Chi

. Statistical test for evaluation of biosimilarity in variability of follow-on biologics. J Biopharm Stat. 2010;20:75–89.

16.

Yang

Zhang

Chow

Chi

. An adapted F-test for homogeneity of variability in follow-on biological products. Stat Med. 2013;32:415–423.

17.

Zhang

Yang

Chow

Chi

. Nonparametric tests for evaluation of biosimilarity in variability of follow-on biologics. J Biopharm Stat. 2014;24:1239–1253.

18.

Zhang

Yang

Chow

Endrenyi

Chi

. Impact of variability on the choice of biosimilarity limits in assessing follow-on biologics. Stat Med. 2013;32:424–433.

19.

Chow

Yang

Starr

Chiu

. Statistical methods for assessing interchangeability of biosimilars. Stat Med. 2013;32:442–448.

20.

Hsieh

Chow

Yang

Chi

. The evaluation of biosimilarity index based on reproducibility probability for assessing follow-on biologics. Stat Med. 2013;32:406–414.

21.

Yang

Lai

. Estimation and approximation approaches for biosimilar index based on reproducibility probability. J Biopharm Stat. 2014;24:1298–1311.

22.

Chiu

Liu

Chow

. Applications of the Bayesian prior information to evaluation of equivalence of similar biological medicinal products. J Biopharm Stat. 2014;24:1254–1263.

23.

Pan

Yuan

Xia

. A calibrated power prior approach to borrow information from historical data with application to biosimilar clinical trials [published online December 23, 2016]. J Royal Stat Soc C: App. doi:10.1111/rssc.12204.

24.

Chow

Tse

Chi

. Statistical methods for assessment of biosimilarity using biomarker data. J Biopharm Stat. 2010;20:90–105.

25.

Liu

Wood

Johri

. Statistical considerations in biosimilar clinical efficacy trials with asymmetrical margins. Stat Med. 2013;32:393–405.

26.

Liao

Darken

. Comparability of critical quality attributes for establishing biosimilarity. Stat Med. 2013;32:462–469.

27.

Kang

Chow

. Statistical assessment of biosimilarity based on relative distance between follow-on biologics. Stat Med. 2013;32:382–392.

28.

Zhang

Chow

. Frequency estimator for assessing of follow-on biologics. J Biopharm Stat. 2014;24:1280–1297.

29.

Yoo

Hrycaj

Miranda

. A randomised, double-blind, parallel-group study to demonstrate equivalence in efficacy and safety of CT-P13 compared with innovator infliximab when coadministered with methotrexate in patients with active rheumatoid arthritis: the PLANETRA study. Ann Rheum Dis. 2013;72:1613–1620.

30.

Maini

St Clair

Breedveld

. Infliximab (chimeric anti-tumour necrosis factor alpha monoclonal antibody) versus placebo in rheumatoid arthritis patients receiving concomitant methotrexate: a randomised phase III trial. ATTRACT Study Group. Lancet. 1999;354:1932–1939.

31.

Schellekens

Lietzan

Faccin

Venema

. Biosimilar monoclonal antibodies: the scientific basis for extrapolation. Expert Opin Biol Ther. 2015;15:1633–1646.

32.

Felson

Anderson

Boers

. American College of Rheumatology. Preliminary definition of improvement in rheumatoid arthritis. Arthritis Rheum. 1995;38:727–735.

33.

Takeuchi

Yamanaka

Tanaka

. Evaluation of the pharmacokinetic equivalence and 54-week efficacy and safety of CT-P13 and innovator infliximab in Japanese patients with rheumatoid arthritis. Mod Rheumatol. 2015;25:817–824.

34.

Nagasaki

Ando

. Clinical development and trial design of biosimilar products: a Japanese perspective. J Biopharm Stat. 2014;24:1165–1172.

35.

Wang

Chow

. Development of biosimilars—pharmacokinetic and pharmacodynamic considerations. J Biopharm Stat. 2010;20:46–61.

36.

Mehta

Pocock

. Adaptive increase in sample size when interim results are promising: a practical guide with examples. Stat Med. 2011;30:3267–3284.

37.

Chen

Lan

. Sample size adjustment based on promising interim results and its application in confirmatory clinical trials. Clin Trials. 2015;12:584–595.

38.

Jennison

Turnbull

. Adaptive sample size modification in clinical trials: start small then ask for more? Stat Med. 2015;34:3793–3810.

39.

Jennison

Turnbull

. Group Sequential Methods With Applications to Clinical Trials. Boca Raton, FL: Chapman & Hall/CRC; 2000.

40.

Gao

Ware

Mehta

. Sample size re-estimation for adaptive sequential design in clinical trials. J Biopharm Stat. 2008;18:1184–1196.

41.

Proschan

Hunsberger

. Designed extension of studies based on conditional power. Biometrics. 1995;51:1315–1324.

42.

Food and Drug Administration. Guidance for industry: statistical approaches to establishing bioequivalence. http://www.fda.gov/downloads/Drugs/Guidances/ucm070244.pdf. Published 2001. Accessed December 23, 2016.

43.

Wiens

. A fixed sequence Bonferroni procedure for testing multiple endpoints. Pharm Stat. 2003;2:211–215.

44.

Chen

DeMets

Lan

. Increasing the sample size when the unblinded interim result is promising. Stat Med. 2004;23:1023–1038.

45.

Food and Drug Administration. Guidance for industry: adaptive design clinical trials for drugs and biologics, draft. http://www.fda.gov/downloads/Drugs/Guidances/ucm201790.pdf. Published 2010. Accessed December 23, 2016.

46.

Cortés

Curigliano

Diéras

. Expert perspectives on biosimilar monoclonal antibodies in breast cancer. Breast Cancer Res Treat. 2014;144:233–239.

47.

Krasnozhon

Bondarenko

. Phase I/IIb clinical trial comparing PK and safety of trastuzumab and its biosimilar candidate CT-P6. SG-BCC. 2013; St Gallen, Switzerland; abstract 268.

48.

Odarchenko

Grecea

. Double-blind, randomized, parallel group, phase III study to demonstrate equivalent efficacy and comparable safety of CT-P6 and trastuzumab, both in combination with paclitaxel, in patients with metastatic breast cancer (MBC) as first-line treatment. J Clin Oncol. 2013;31:suppl; abstract 629.

49.

Hua

D’Agostino

Sr . Multiplicity adjustments in testing for bioequivalence. Stat Med. 2015;34:215–231.

50.

Wynne

Altendorfer

Sonderegger

. Bioequivalence, safety and immunogenicity of BI 695501, an adalimumab biosimilar candidate, compared with the reference biologic in a randomized, double-blind, active comparator phase I clinical study (VOLTAIRE^®-PK) in healthy subjects. Expert Opin Investig Drugs. 2016;25:1361–1370.