Inference for the treatment effect in multiple-period cluster randomised trials when random effect correlation structure is misspecified

Abstract

Multiple-period cluster randomised trials, such as stepped wedge or cluster cross-over trials, are being conducted with increasing frequency. In the design and analysis of these trials, it is necessary to specify the form of the within-cluster correlation structure, and a common assumption is that the correlation between the outcomes of any pair of subjects within a cluster is identical. More complex models that allow for correlations within a cluster to decay over time have recently been suggested. However, most software packages cannot fit these models. As a result, practitioners may choose a simpler model. We analytically examine the impact of incorrectly omitting a decay in correlation on the variance of the treatment effect estimator and show that misspecification of the within-cluster correlation structure can lead to incorrect conclusions regarding estimated treatment effects for stepped wedge and cluster crossover trials.

Keywords

Analysis of variance cluster randomised trial intracluster correlation model misspecification stepped wedge

1 Introduction

It is now well recognised that the correlation between subjects from the same cluster must be accounted for in the analysis of data from cluster randomised trials.^1,6 The standard error of the estimated treatment effect depends on the variance components, and failure to properly account for within-cluster correlation structure may lead to confidence intervals for the treatment effect estimate that are of incorrect width. When cluster randomised trials are conducted over multiple periods, the entire within-cluster correlation structure, which incorporates correlations between observations taken in the same and different trial periods, must be considered.

A simplifying assumption that can be made for the within-cluster correlation structure for multiple-period cluster randomised trials is that correlations between subjects’ outcomes do not depend on the timing of measurements: the correlation between outcomes from any pair of subjects from the same cluster is identical. In a recent systematic review, the majority of stepped wedge cluster randomised trials with protocols published between 1987 and 2014 were found to have made this assumption for power calculations.⁷ In the broader context of multiple-period cluster randomised trials, this assumption may be made less frequently, see e.g. Feldman and McKinlay⁴ and Murray et al.⁸ for earlier papers discussing alternative models. The underlying statistical model for outcomes incorporates this assumption through the inclusion of a cluster-level random intercept. Recommended practice requires the inclusion of fixed effects for periods.⁹ In the stepped wedge context, this model is usually referred to as the Hussey and Hughes model.¹⁰ More recently, in the context of stepped wedge trials, several authors have relaxed the assumption of constant correlation within clusters, considering in detail particular variants of the general model presented in Feldman and McKinlay.⁴ Girling and Hemming⁵ and Hooper et al.¹¹ incorporated a random interaction between period and cluster, thus allowing the correlation between subjects within a cluster from the same period to differ from the correlation between subjects in the same cluster measured in different periods. There are thus two intra-cluster correlations associated with this model: within-period intra-cluster correlations and between-period intra-cluster correlations. More complex within-cluster correlation structures were introduced by Kasza et al.,¹² allowing for the correlation between the outcomes of any pair of subjects in the same cluster but in different periods to diminish as the time between periods increases. The statistical models allowing for this correlation decay include a structured covariance matrix for the cluster-period random interaction, and imply that the between-period intra-cluster correlation depends on the distance between the periods in which observations being compared were measured.

While the impact of ignoring clustering on inference for the treatment effect in single-period cluster randomised trials has previously been considered, e.g. Moerbeek et al.,¹³ the impact of within-cluster correlation structure misspecification on analysis, and planning, of multiple-period cluster randomised trials has not. We investigate the impact of within-cluster correlation structure misspecification on the estimation of variance components, and on the estimation of a treatment effect and its associated standard error. The impact of misspecification of the correlation structure at the planning stage of a multiple-period cluster randomised trial, when the values of the various within-cluster correlations are assumed to be known without error, has previously been considered (see e.g. Kasza et al.¹²) Here our interest is on the impact of the misspecification of that structure when fitting a model to a dataset. Moerbeek¹⁴ has considered estimators of variance components when random intercepts for clusters or for periods within clusters are incorrectly excluded from the outcome model: we extend this work to cross-classified data structures and a model that allows for a decay in correlations over time and discuss the implications of variance component mis-estimation and misspecification on estimating treatment effects and planning cluster randomised trials.

Our focus in this paper is the misspecification of mixed-model ANOVAs: we do not consider models that incorporate random coefficients, for example for time or treatment. Such more complex models have been considered by other authors, e.g. Hemming et al.¹⁵ and Murray et al.⁸ Models of the type that we consider are frequently used within the stepped wedge and cluster cross-over literature in particular, and thus an investigation of the properties of these models is needed to allow a more in-depth understanding of these designs.

In Section 2, we consider the estimation of variance components using ANOVA formulas for two-way cross classified mixed models with and without random interaction terms. In Section 3, we investigate the impact of omitting a decay in correlation on the estimation of variance components and on the variance of the treatment effect in multiple-period parallel cluster randomised trials with post-baseline measurements only, cluster cross-over and stepped wedge designs. We provide an R Shiny app² to allow users to investigate the impact of correlation structure misspecification for scenarios corresponding to various parameter configurations, available at https://monash-biostat.shinyapps.io/MisspecCorrStruct. Code for local implementation of the app can be downloaded from https://github.com/jkasza/MisspecCorrStruct. In addition to the designs considered in this paper, the R Shiny app allows for parallel designs with baseline measurements, and for users to upload design matrices.

2 Mixed effects models and ANOVA estimators

In order to investigate the impact of within-cluster correlation structure misspecification on the estimation of variance components, we consider three models for a continuous outcome Y_kti, where in each cluster $k = 1, \dots, K$ at each time point $t = 1, \dots, T$ , subjects $i = 1, \dots, m$ are measured (a total of $K \times T \times m$ measurements). These models are listed below in increasing order of complexity of within-cluster correlation structure.

Model 1. This is the two-way crossed classification model with fixed period effects and no interaction between period and cluster:

Y_{kti} = μ + β_{t} + α_{k} + ε_{kti}, α_{k} \sim N (0, σ_{1 α}^{2}), ε_{kti} \sim N (0, σ_{1 ε}^{2})

with

\sum_{j = 1}^{T} β_{t} = 0

for identifiability of fixed effects. This model, referred to as the Hussey and Hughes model, is frequently used for stepped wedge trial sample size calculation and analysis.^7,10 The key assumption of this model is that all observations from the same cluster are equally correlated, with

corr (Y_{kti}, Y_{ksj}) = \frac{σ_{1 α}^{2}}{σ_{1 α}^{2} + σ_{1 ε}^{2}} .

Model 2. This model is the two-way crossed classification model with fixed period effects and an interaction between period and cluster:

Y_{kti} = μ + β_{t} + α_{k} + γ_{kt} + ε_{kti}, α_{k} \sim N (0, σ_{2 α}^{2}), γ_{kt} \sim N (0, σ_{γ}^{2}), ε_{kti} \sim N (0, σ_{2 ε}^{2})

again

\sum_{t = 1}^{T} β_{t} = 0

. We do not consider the model with summation restrictions on the γ_kts. Models with this correlation structure have been considered by several authors, e.g. Girling and Hemming⁵ and Hooper et al.¹¹ and was referred to as the Hooper/Girling model in Kasza et al.¹² The key component of this model with random interactions is that the correlation between any pair of observations from the same cluster in the same period,

corr (Y_{kti}, Y_{ktj}) = \frac{σ_{2 α}^{2} + σ_{γ}^{2}}{σ_{2 α}^{2} + σ_{γ}^{2} + σ_{2 ε}^{2}}

, differs from the correlation between any pair of observations from the same cluster but in different periods,

corr (Y_{kti}, Y_{ksj}) = \frac{σ_{2 α}^{2}}{σ_{2 α}^{2} + σ_{γ}^{2} + σ_{2 ε}^{2}}

Model 3. This extends Model 2 and supposes that within each cluster, the dependence between random effects for each period decays over time:

$Y_{kti} = μ + β_{t} + γ_{kt} + ε_{kti}, γ_{k} = (γ_{k 1}, \dots, γ_{kT})' \sim N (0, Σ), cov (γ_{kt}, γ_{ks}) = σ_{3 α}^{2} r^{| t - s |}, ε_{kti} \sim N (0, σ_{3 ε}^{2})$ where r is a constant less than or equal to 1, and $\sum_{t = 1}^{T} β_{t} = 0$ . This model implies that the correlation between any pair of observations from the same cluster decays the further apart in time they are observed: $corr (Y_{kti}, Y_{ksj}) = r^{| t - s |} \frac{σ_{3 α}^{2}}{σ_{3 α}^{2} + σ_{3 ε}^{2}}$ . Models with this correlation structure have been considered in Kasza et al.¹²

Models 1–3 are appropriate for repeated cross-sectional designs, where measurements are taken from distinct sets of subjects in each period. Inclusion of subject-specific random intercepts allows for closed or open cohort designs, as described by Copas et al.,³ where some or all subjects may contribute measurements in multiple periods.

We consider balanced datasets with an equal number of observations in each cluster in each period, and under this assumption, closed form ANOVA estimators of the variance components of Models 1 and 2 (the two-way crossed classification models without and with a random interaction) are available, e.g. Chapter 4 of Searle et al.:¹⁶ these estimators are displayed in Table 1. No such closed form formulas are available for Model 3, the correlation decay model. In fact, availability of methods and statistical software to fit Model 3 is limited: although SAS permits the inclusion of random effects with this correlation structure, such models cannot currently be fit in Stata or R. This lack of availability of software, and that most, if not all, of the currently available estimates of within-period and between-period intra-cluster correlations are based on Models 1 or 2 imply that it is indeed necessary to determine the impact of assuming Models 1 or 2 when Model 3 has the most appropriate correlation structure.

Table 1.

Mean squares and ANOVA estimators for the variance components of the two-way crossed classification models with and without interactions.

	Mean square	ANOVA estimators
1. No interaction
Cluster	$MSK (1) = \frac{1}{K - 1} \sum_{k = 1}^{K} T m ({\bar{Y}}_{k • •} - {\bar{Y}}_{• • •})^{2}$	${\hat{σ}}_{1 α}^{2} = \frac{MSK (1) - MSE (1)}{Tm}$
Residual	$\begin{matrix} MSE (1) = \frac{1}{KTm - K - T + 1} \sum_{k = 1}^{K} \sum_{t = 1}^{T} \sum_{i = 1}^{m} (Y_{kti} - {\bar{Y}}_{k • •} - {\bar{Y}}_{• t •} + {\bar{Y}}_{• • •})^{2} \end{matrix}$	${\hat{σ}}_{1 ε}^{2} = MSE (1)$
2. Interaction
Cluster	$MSK (2) = \frac{1}{K - 1} \sum_{k = 1}^{K} T m ({\bar{Y}}_{k • •} - {\bar{Y}}_{• • •})^{2}$	${\hat{σ}}_{2 α}^{2} = \frac{MSK (2) - MSKT (2)}{Tm}$
Period × cluster	$MSKT (2) = \frac{1}{(K - 1) (T - 1)} \sum_{k = 1}^{K} \sum_{t = 1}^{T} m ({\bar{Y}}_{kt •} - {\bar{Y}}_{k • •} - {\bar{Y}}_{• t •} + {\bar{Y}}_{• • •})^{2}$	${\hat{σ}}_{γ}^{2} = \frac{MSKT (2) - MSE (2)}{m}$
Residual	$MSE (2) = \frac{1}{KT (m - 1)} \sum_{k = 1}^{K} \sum_{t = 1}^{T} \sum_{i = 1}^{m} (Y_{kti} - {\bar{Y}}_{kt •})^{2}$	${\hat{σ}}_{2 ε}^{2} = MSE (2)$
$\begin{matrix} {\bar{Y}}_{kt •} = \frac{1}{m} \sum_{i = 1}^{m} Y_{kti}, {\bar{Y}}_{k • •} = \frac{1}{mT} \sum_{t = 1}^{T} \sum_{i = 1}^{m} Y_{kti}, \\ {\bar{Y}}_{• t •} = \frac{1}{mK} \sum_{k = 1}^{K} \sum_{i = 1}^{m} Y_{kti}, {\bar{Y}}_{• • •} = \frac{1}{mTK} \sum_{k = 1}^{K} \sum_{t = 1}^{T} \sum_{i = 1}^{m} Y_{kti} \end{matrix}$

3 Implications of correlation structure misspecification

3.1 Expected values of variance component estimators

We explore the consequences of misspecifying the correlation structure on variance component estimation by calculating the expected values of the estimators in Table 1, assuming outcomes Y_kti are distributed according to each of the models described in Section 2. Table 2 lists these expected values. Derivations are provided in Appendix 1.

Table 2.

Expected values of variance component estimators in Table 1 for outcomes distributed according to the two-way crossed classification model without and with an interaction between cluster and period, and correlation decay models.

	True (data-generating) model
Fitted	Model 1:	Model 2:	Model 3:
Model	No interaction	Interaction	Correlation decay
Model 1
$E [{\hat{σ}}_{1 α}^{2}]$	$σ_{1 α}^{2}$	$σ_{2 α}^{2} + \frac{K (m - 1)}{KTm - T - K + 1} σ_{γ}^{2}$	$σ_{3 α}^{2} [\frac{Km - 1}{T (KTm - K - T + 1)} \sum_{t = 1}^{T} \sum_{s = 1}^{T} r^{\| t - s \|} - \frac{K - 1}{KTm - K - T + 1}]$
$E [{\hat{σ}}_{1 ε}^{2}]$	$σ_{1 ε}^{2}$	$σ_{2 ε}^{2} + \frac{(KT - K - T + 1) m}{KTm - K - T + 1} σ_{γ}^{2}$	$σ_{3 ε}^{2} + σ_{3 α}^{2} [\frac{(K - 1) Tm}{KTm - K - T + 1} - \frac{m (K - 1)}{T (KTm - K - T + 1)} \sum_{t = 1}^{T} \sum_{s = 1}^{T} r^{\| t - s \|}]$
Model 2
$E [{\hat{σ}}_{2 ε}^{2}]$	$σ_{1 ε}^{2}$	$σ_{2 ε}^{2}$	$σ_{3 ε}^{2}$
$E [{\hat{σ}}_{γ}^{2}]$	0	$σ_{γ}^{2}$	$σ_{3 α}^{2} (\frac{T}{T - 1} - \frac{1}{T (T - 1)} \sum_{t = 1}^{T} \sum_{s = 1}^{T} r^{\| t - s \|})$
$E [{\hat{σ}}_{2 α}^{2}]$	$σ_{1 α}^{2}$	$σ_{2 α}^{2}$	$σ_{3 α}^{2} (\frac{1}{T (T - 1)} \sum_{t = 1}^{T} \sum_{s = 1}^{T} r^{\| t - s \|} - \frac{1}{T - 1})$

Table 2 indicates that applying the no interaction estimators to data without an interaction or the interaction estimators to data with an interaction results in unbiased estimation of variance components. When the interaction estimators are applied to data without an interaction, the expected values of the subject-level error and the cluster-level random effect variances are unbiased, and the expected value of the cluster-period level random effect variance is zero. These are standard results: the ANOVA estimators for the variance components are unbiased¹⁶ and overspecification does not result in bias. However, as is expected, when the assumed correlation structure omits a decay in the correlation between cluster-period level random effects over time, the expected values of the variance components compensate for that misspecification. As observed in Moerbeek,¹⁴ where the misspecification of two-way nested models by omission of a level of clustering was considered, when the cluster-period level is omitted (i.e. Model 1 is fitted to data generated according to Model 2), the variation due to the cluster-period level is spread across the subject-level and cluster-level variance component estimates. When T is large, most of the cluster-period-level variability will be added to the subject-level error variance.

When applied to the model with a correlation decay, the expected values of no interaction and interaction variance component estimators depend on the decay parameter r, and the design parameters T, K, and m. When r = 1, there is no decay in the correlation, the correlation decay model collapses to the model with no interaction, and the no interaction and interaction estimators are thus unbiased for the correlation decay variance components.

3.2 Correlation structure misspecification and treatment effect variance estimation

To investigate the impact of correlation structure misspecification on the treatment effect estimator variance, we include $θ X_{kt}$ in the fixed components of Models 1–3 in Section 2, where X_kt is the binary treatment indicator for cluster k in period t. The parameter θ is the treatment effect, and, assuming a known within-cluster correlation structure and cluster randomised trial design, can be estimated using generalised least squares. The variance of the estimator $\hat{θ}$ , var $(\hat{θ})$ , is of interest at the design and analysis stages of a trial: a formula for var $(\hat{θ})$ given a general within-cluster covariance matrix is provided in Kasza et al.¹²

In this section, we seek to determine for which trial designs and values of T, K, m and r the variance components estimated using the two-way crossed with or without interaction ANOVA formulas sufficiently capture the correlation structure of the correlation decay model when estimating the variance of the treatment effect estimator. We compare the value of var $(\hat{θ})$ obtained given a known decaying within-cluster correlation structure to an estimator of this variance obtained by plugging in the expected values of the variance components when Model 3 is the true data-generating model, but Model 1 or 2 is fitted to the data (given in the final column of Table 2). We denote these estimators of var $(\hat{θ})$ by ${\hat{V}}_{1}$ and ${\hat{V}}_{2}$ , with the hats indicating these are estimates of the true variance of $\hat{θ}$ based on estimates of variance components. The “true” value of var $(\hat{θ})$ , obtained under Model 3, is denoted by V₃.

To allow for comparability between Models 1 to 3, we set

σ_{1 ε}^{2} = σ_{2 ε}^{2} = σ_{3 ε}^{2}; σ_{2 α}^{2} = r σ_{1 α}^{2}; σ_{γ}^{2} = (1 - r) σ_{1 α}^{2}; σ_{3 α}^{2} = σ_{1 α}^{2} .

This parameterisation ensures that

var (Y_{kti})

and

cov (Y_{kti}, Y_{ksj})

are the same across the three models, and that the within-period intra-cluster correlation is the same in all three considered models. Were the original parameterisations in Section 2 used for comparisons, direct comparability between the three models could not be guaranteed. Table 3 displays the within-period and between-period intra-cluster correlations for each of the three models in terms of these shared parameters.

Table 3.

Within-period and between-period intra-cluster correlations (ICCs) for the three considered models, setting $σ_{1 ε}^{2} = σ_{2 ε}^{2} = σ_{3 ε}^{2}; σ_{2 α}^{2} = r σ_{1 α}^{2}; σ_{γ}^{2} = (1 - r) σ_{1 α}^{2}; σ_{3 α}^{2} = σ_{1 α}^{2}$ and $ρ = \frac{σ_{1 α}^{2}}{σ_{1 ε}^{2} + σ_{1 α}^{2}}$ .

	Within-period ICC	Between-period ICC
Model	$corr (Y_{kti}, Y_{ktj})$	$corr (Y_{kti}, Y_{ksj}), s \neq t$
1	ρ	ρ
2	ρ	$r \times ρ$
3	ρ	$r^{\| t - s \|} \times ρ$

ICC: intra-cluster correlation.

We consider three types of multiple-period cluster randomised trials: parallel and cluster cross-over designs over T periods, where $⌊ \frac{T}{2} ⌋$ clusters are assigned to each of the two treatment sequences (the $⌊ n ⌋$ notation indicates the largest integer less than or equal to n), and stepped wedge designs, where one cluster is assigned to each of the T – 1 treatment sequences. In each of the considered trial types, m subjects are supposed to have their outcome measured in each period. For the cluster cross-over designs, half of the clusters implement the control condition in the first period and the intervention condition in the second period; the other half first implement the intervention before switching to the control. In both arms, clusters switch back and forth between the two conditions in each period so that a design with T periods has T – 1 switches.

For each design type and combination of $T = 4, 5, \dots, 20$ and $r = 0, 0.05, 0.1, \dots, 1$ , setting $σ_{1 ε}^{2} + σ_{1 α}^{2} = 1, σ_{1 α}^{2} = 0.05$ and m = 100, we calculate ${\hat{V}}_{1} / V_{3}$ and ${\hat{V}}_{2} / V_{3}$ . The formulas in Table 2 indicate that were the number of clusters assigned to each treatment sequence increased or decreased from $⌊ \frac{T}{2} ⌋, {\hat{V}}_{2} / V_{3}$ would remain unchanged and ${\hat{V}}_{1} / V_{3}$ would be changed only minimally. Results for this set of configurations are displayed in Figures 1 and 2: readers can investigate results for other values of $σ_{1 α}^{2}$ and m at https://monash-biostat.shinyapps.io/MisspecCorrStruct.

Figure 1.

The ratio of the variances of the treatment effect estimator obtained under the one-way layout and under the decaying structure (Model 1 versus Model 3), when the correct model has the decaying within-cluster correlation structure for a range of decay parameters r, and numbers of periods T, with number of subjects per period, m = 100, $σ_{1 ε}^{2} = 0.95$ and $σ_{1 α}^{2} = 0.05$ . White indicates that the ratio is equal to 1; red indicates ratios greater than 1, with greater intensity indicating larger values; blue indicates ratios less than 1, with greater intensity indicating smaller values.

Figure 2.

The ratio of the variances of the treatment effect estimator obtained under the two-way nested layout and under the decaying structure (Model 2 versus Model 3), when the correct model has the decaying within-cluster correlation structure for a range of decay parameters r, and numbers of periods T, with number of subjects per period, m = 100, $σ_{1 ε}^{2} = 0.95$ and $σ_{1 α}^{2} = 0.05$ . White indicates that the ratio is equal to 1; red indicates ratios greater than 1, with greater intensity indicating larger values; blue indicates ratios less than 1, with greater intensity indicating smaller values.

Figures 1 and 2 both confirm that when r = 1 (implying no decay in the correlation between subjects in the same cluster over time), $V_{3} = {\hat{V}}_{1} = {\hat{V}}_{2}$ . Figure 1 indicates that when Model 3 (with the random effect correlation decay structure) is correct, application of the model without a random interaction between period and cluster to estimate variance components leads to underestimation of var $(\hat{θ})$ when the study has a stepped wedge or CRXO design, and slight overestimation when the study has a parallel design. Figure 2 shows that when the model with a random interaction is used to estimate variance components instead of using the correct correlation decay structure, overestimation of var $(\hat{θ})$ occurs for both the CRXO and parallel designs (although the overestimation is slight for the parallel design), while for the stepped wedge design, var $(\hat{θ})$ may be over or underestimated, depending on the combination of r, T and m. For combinations corresponding to values of r and T that would be likely to occur in practice ( $0.8 < r < 1$ and T < 10), ${\hat{V}}_{2} / V_{3}$ is greater than 1, indicating overestimation of the treatment effect variance when estimates of variance components are based on Model 2 instead of Model 3.

4 Discussion

The results in Section 3 have implications for both inferences on the treatment effect and for trial planning, but the implications depend on the study design. The impact of correlation structure misspecification is much less for the parallel design than for the SW and CRXO design. In the SW and CRXO designs, treatment varies at the cluster-period level, and within-cluster between-period comparisons are required to estimate the treatment effect: these comparisons rely on the correlations between subjects in the same cluster in different periods, so misspecification of the structure of the within-cluster covariance matrix has a greater impact for these designs than for the parallel design. We did not consider parallel designs with pre-intervention baseline measurements, although users can investigate the impact of misspecification for such designs in our online app. Whether results for parallel designs with baseline measurements more closely resemble those for parallel designs or stepped wedge designs will depend on the proportion of the periods in the design that are dedicated to pre-intervention versus post-intervention measures. If the proportion is very small, the results will more closely resemble the parallel design results; as the proportion increases, the results will more closely resemble the stepped wedge results. In the parallel design without any pre-intervention baseline measurements, treatment varies only at the cluster level. As has been observed by Moerbeek,¹⁴ ignoring a level of clustering lower than that at which the exposure of interest varies has no impact on inference for the effect of that treatment or any treatments that vary at a higher level of the structure. Thus, our results align with those of Moerbeek:¹⁴ misspecifying the cluster-period level correlation structure has minimal impact on the variance of treatment effects at a higher level. Thus, if the study being considered was a parallel cluster randomised design with each subject measured in each period, instead of the repeated cross-sectional design we have considered, misspecifying the within-subject correlation structure (as distinct from the within-cluster correlation structure) would have minimal impact on the variance of the treatment effect estimator, provided that random effects for higher levels of the data hierarchy are included in the model.

When the number of periods in a study is large, including a fixed term for each period as we have done may result in a less efficient estimator for the treatment effect than a model where a linear term for time is included. In practice, including continuous terms for time may lead to more efficient estimation of the treatment effect, which will then lead to additional considerations regarding how random effects for time should be included in the model. Murray et al.⁸ considered three models in their investigation of the issue in the context of parallel cluster randomised trials: the first was similar to our Model 2; the second included a random cluster-period interaction but a linear fixed effect for time; and the third included random coefficients for a linear term for time. They concluded that incorrectly omitting a random coefficient for time led to an inflation of Type I error rates. We have not investigated scenarios where a linear term for time and random coefficients for time are incorrectly excluded or included, given our focus on the impact of omitting a decay in the within-cluster correlation structure. Given the lack of closed-form estimators for the decaying correlation model, it is difficult to determine how inference for the treatment effect would be impacted were a decaying correlation structure incorrectly included in a model instead of a random coefficient for time (or vice-versa). Further work comparing these two models is required.

If the ratios in Figures 1 and 2 are less than one, then the variance of the treatment effect estimate will be underestimated when the model is misspecified. At the study design stage, this will imply that a sample size smaller than required under the true model will be sought; at the analysis stage, confidence intervals for the treatment effect will be narrower than necessary, inflating the risk of Type I error. If correlation structure misspecification leads to overestimation of the true variance (i.e. ratio is greater than 1), sample sizes will be inflated relative to that which would be obtained had the correlation decay model been used to obtain variance component estimates. At the analysis stage, confidence intervals will be wider than required, inflating the risk of a Type II error. As ρ or m, the number of subjects in each cluster-period, increases, so too does the impact of within-cluster correlation structure misspecification. In Table 4, we provide a short summary of the implications for the confidence interval width when Model 1 or Model 2 is incorrectly assumed instead of Model 3.

Table 4.

Summary of the implications for the width of the confidence interval of the treatment effect when Model 3 is correct, but Model 1 or Model 2 is incorrectly assumed, for parallel, CRXO and stepped wedge designs.

	Design type
Assumed model	Parallel	CRXO	Stepped wedge
1	Too wide	Too narrow	Too narrow
2	Too wide	Too wide	Too narrow OR too wide

We have assumed balanced data for our derivations and comparative studies. When data are not balanced, formulae for variance components do not have closed forms, and as pointed out by Moerbeek,¹⁴ it is difficult to determine the impact of within-cluster correlation structure misspecification on the estimation of variance components without undertaking numerical simulation. In some of the results in Figures 1 and 2, cluster randomised trials with small numbers of clusters were considered. Although our conclusions would not change were trials with larger numbers of clusters considered, we caution against cluster randomised trials with small numbers of clusters for reasons as described in Taljaard et al.¹⁷

By casting multiple period cluster randomised designs as classical experimental designs and applying ANOVA-based calculations, we have derived results concerning model misspecification using models often applied in practice. Further, we have provided a web-based application to allow researchers to investigate the impact of model misspecification for their particular multiple-period cluster randomised trials.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Health and Medical Research Council of Australia Project Grant ID 1108283.

Appendix 1

References

Campbell

Piaggio

Elbourne

. Consort 2010 statement: extension to cluster randomised trials. BMJ 2012; 345: e5661–e5661.

Chang W, Cheng J, Allaire JJ, et al. Shiny: web application framework for R, 2017. R package version 1.1.0. https://CRAN.R-project.org/package=shiny.

Copas

Lewis

Thompson

, et al. Designing a stepped wedge trial: three main designs, carry-over effects and randomisation approaches. Trials 2015; 16: 1–12.

Feldman

McKinlay

. Cohort versus cross-sectional design in large field trials: precision, sample size, and a unifying model. Stat Med 1994; 13: 61–78.

Girling

Hemming

. Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models. Stat Med 2016; 35: 2149–2166.

Hayes

Moulton

. Cluster randomised trials, Boca Raton, FL: Chapman & Hall/CRC, 2009.

Martin

Taljaard

Girling

, et al. Systematic review finds major deficiencies in sample size methodology and reporting for stepped-wedge cluster randomised trials. BMJ Open 2016; 6.

Murray

Hannan

Wolfinger

, et al. Analysis of data from group-randomized trials with repeat observations on the same groups. Stat Med 1998; 17: 1581–1600.

Hemming

Haines

Chilton

, et al. The stepped-wedge cluster randomised trial: rationale, design, analysis and reporting. BMJ 2015; 350: h391–h391.

10.

Hussey

Hughes

. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials 2007; 28: 182–191.

11.

Hooper

Teerenstra

de Hoop

, et al. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Stat Med 2016; 35: 4718–4728.

12.

Kasza

Hemming

Hooper

, et al. Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials. Stat Meth Med Res 2017. DOI: 10.1177/0962280217734981.

13.

Moerbeek

van Breukelen

GJP

Berger

MPF

. A comparison between traditional methods and multilevel regression for the analysis of multicenter intervention studies. J Clin Epidemiol 2003; 56: 341–350.

14.

Moerbeek

. The consequences of ignoring a level of nesting in multilevel analysis. Multivariate Behav Res 2004; 39: 129–149.

15.

Hemming

Taljaard

Forbes

. Analysis of cluster randomised stepped wedge trials with repeated cross-sectional samples. Trials 2017; 18: 101–101.

16.

Searle

Casella

McCulloch

. Variance components, New York, NY: John Wiley & Sons, Inc., 1992.

17.

Taljaard

Teerenstra

Ivers

, et al. Substantial risks associated with few clusters in cluster randomized and stepped wedge designs. Clin Trials 2016; 183: 459–463.