A comparison of power analysis methods for evaluating effects of a predictor on slopes in longitudinal designs with missing data

Abstract

In many longitudinal studies, evaluating the effect of a binary or continuous predictor variable on the rate of change of the outcome, i.e. slope, is often of primary interest. Sample size determination of these studies, however, is complicated by the expectation that missing data will occur due to missed visits, early drop out, and staggered entry. Despite the availability of methods for assessing power in longitudinal studies with missing data, the impact on power of the magnitude and distribution of missing data in the study population remain poorly understood. As a result, simple but erroneous alterations of the sample size formulae for complete/balanced data are commonly applied. These ‘naive’ approaches include the average sum of squares and average number of subjects methods. The goal of this article is to explore in greater detail the effect of missing data on study power and compare the performance of naive sample size methods to a correct maximum likelihood-based method using both mathematical and simulation-based approaches. Two different longitudinal aging studies are used to illustrate the methods.

Keywords

compound symmetry intraclass correlation linear mixed effects model monotone missing sample size

1 Introduction

The primary goal in many longitudinal studies is to evaluate the effect of a binary or continuous predictor variable on the rate of change in the outcome. For example, in a proposed randomized clinical trial of the effect of cognitive remediation (CR) on mobility in sedentary seniors, the objective was to estimate the effect of the intervention on the change of normal gait velocity. Another example is from the Einstein Aging Study (EAS), in which interest centered on the effect of cerebral vascular function measured at baseline, a continuous variable, on the rate of cognitive decline.

When there are no missing data, simple and widely used methods are available1 for computing the power and sample size of such studies. In reality, however, drop out, missed visits and staggered entry result in missing data. In the above CR trial, 20% drop out is expected at the post-intervention visit. Missing data are also a concern in the EAS since subjects must undergo five annual cognitive assessments.

Approaches for assessing the power of longitudinal and cluster designs have been extensively investigated for both balanced designs2^–17 and designs which are unbalanced due to missing data.18^–31 Specifically, methods for unbalanced clustered designs have been proposed for between group comparisons of overall means,18^–19 means at a specific time point during follow-up,20^–21 slopes,22^–26 and other aspects of longitudinal data, which may include testing of slopes as a special case.27^–31 Both maximum likelihood estimate (MLE) and generalized estimating equation (GEE32) methods have been considered in these approaches.

In spite of the availability of these methods, it remains unclear how the extent and distribution of missing data actually impact the power in longitudinal studies. Consequently, simple and naive alterations of conventional approaches for the complete data case are often used in practice. We consider two such methods: the average sum of squares (ASQ) and average number of subjects (ANS), and evaluate in detail how they perform relative to a correct MLE-based method based on the linear mixed effects (LME) models.33

The remainder of this article is organized as follows. In Section 2, we describe a correct MLE-based method, evaluate the impact of missing data on the power estimated with this approach, and consider how the ASQ and ANS methods deviate from the MLE-based method. Next, the simulation studies to compare the performance of the different approaches are described in Section 3. In Section 4 the methods are applied to two different studies in the elderly. Finally, we conclude the article with a discussion in Section 5.

2 Method

2.1 Background and notation

Consider a longitudinal study of m independent subjects, indexed by i, with n planned visits, indexed by j. Here, n is considered fixed so that the asymptotic property of the estimators are applicable for large m. For subject i, i = 1, … , m, let Z_i denote the time independent predictor, Y_ij the j^th outcome measure of subject i, and t_ij the time of the jth measurement, j = 1, … , n. For complete data, $Y_{i}^{c} = (Y_{i 1}, \dots, Y_{in}) T$ , where superscript T denotes matrix transposition. Let $X_{i}^{c}$ be the design matrix for the fixed effects with dimension n × p, where p is the dimension of the parameter β in the linear mixed effects model

Y_{i}^{c} = X_{i}^{c} β + ε_{i}

where ε_i is the n × 1 residual term, possibly consisting of random effects and measurement error, assumed to be from a multivariate normal distribution with with mean 0 and variance–covariance matrix V of dimension n × n. At the design stage, t_ij is assumed fixed with value denoted by t_j, t = (t₁, … , t_n), and V is a known n × n variance–covariance matrix. A common structure used for V is compound symmetry with common variance σ² and a correlation matrix C with a common correlation coefficient ρ, also referred as the intraclass correlation coefficient i.e. V = σ²C, and

C = (1 - ρ) I_{n} + ρ 1_{n} 1_{n}^{T}

, where I_n the n × n identity matrix, 1_n is the n × 1 vector with unit elements. This is also the variance–covariance structure of Y conditioning on X when ε_ij = u_i + e_ij, where u_i is the random intercept with variance

σ_{u}^{2}

, e_ij the independent measurement error with variance

σ_{e}^{2}

, in which case

ρ = σ_{u}^{2} σ_{u}^{2} + σ_{e}^{2}

In many longitudinal studies, researchers are often interested in the effect of the predictor Z on the slope or rate of change in the outcome, which can be evaluated in the following model

Y_{ij} = β_{0} + β_{1} Z_{i} + β_{2} t_{ij} + β_{3} Z_{i} t_{ij} + ε_{ij}

(1)

In model (1), the j^th row of the design matrix $X_{i}^{c}$ is X_ij = (1, Z_i, t_ij, Z_it_ij). The parameters β₀ and β₂ are the intercept and slope for Z_i = 0; β₁ measures the association between Z and the outcome Y at t_j = 0 or baseline; and β₃ measures the effect of Z on the slope. If Z is binary, e.g. Z_i = 0 for the control and 1 for the intervention group, then β₃ is the expected difference in the slope between the intervention and the control groups; if Z is continuous, β₃ is the expected difference in the slope of change in the outcome when Z increases by 1 unit.

In reality, longitudinal data are often unbalanced due to missing data. Denote R_i = (R_i1, … , R_in)^T as the observation indicator for subject i, where R_ij = 1 if Y_ij is observed and 0 otherwise. For simplicity, we assume that P(R_i1 = 1) = 1 since the data are usually fully observed at baseline, and P(R_in = 1) > 0 because otherwise, we are reduced to the case with a smaller planned or maximum number of measurements. The missing data pattern consists of all the possible values of R_i. For example, a common missing data pattern in longitudinal studies is monotone missing, where R_ik ≥ R_il for k < l, k, l = 1, … , n. In this case, the missing data pattern can be fully described by the number of observed measurements denoted by $K_{i} = \sum_{j = 1}^{n} R_{ij}$ . If the probability of missing Y_ij+1 given that Y_ij is observed, j = 1, … , n − 1, is pd_j = P(R_ij+1 = 0|R_ij = 1, Z_i), then P(K = 1|Z) = pd₁, $P (K = k) = Π_{j = 1}^{k - 1} (1 - pd j) pd k$ , k = 2, … , n − 1, and $P (K = n | Z) = Π_{j = 1}^{n} (1 - pd j)$ .

Denote q as the number of possible values that R_i can take. In the case of a monotone missing data pattern, q = n; in the extreme scenario that each component of R_i except the first can take either 0 or 1, q = 2ⁿ⁻¹. Denote Y_i as the observed outcome, X_i its corresponding design matrix, and V_i the variance of Y_i which is the corresponding sub-matrix of V. The missing data process is assumed not informative, i.e. the missing data mechanism is missing at random (MAR) and the parameter that governs the missing data is disjoint from the parameter β. Under this assumption, the missing data process can be ignored and the inference for β can be based on the observed data.34

The parameter β = (β₀, β₁, β₂, β₃)^T in the linear mixed effects model (1) is estimated by solving the following estimating equation

U (β) = \sum_{i = 1}^{m} X_{i}^{T} V_{i}^{- 1} (Y_{i} - μ_{i}) = 0

The MLE of β can be expressed as

\overset{\land}{β} = {\sum_{i = 1}^{m} X_{i}^{T} {\overset{\land}{V}}_{i}^{- 1} X_{i}} - 1 \sum_{i = 1}^{m} X_{i}^{T} {\overset{\land}{V}}_{i}^{- 1} Y_{i}

where

\overset{\land}{V}

is the estimate of V. And the asymptotic variance of

\overset{\land}{β}

Σ_{β} = 1 m {E (X_{i}^{T} V_{i}^{- 1} X_{i})} - 1

(2)

Expression (2) provides the basis for the inference about β. For example, Tu28 proposed a power analysis method for testing any linear combination of the parameter, Aβ, where A is a known full rank s × 4 matrix, based on the variance of

A \overset{\land}{β}

which can be expressed as AΣ_βA^T. Here, we focus on the power analysis for testing β₃ which is the parameter of interest in most longitudinal studies. The variance of

{\overset{\land}{β}}_{3}

is the (4, 4)^th element of Σ_β, denoted as V_b.

From the variance expression (2), in which the expectation is taken over the joint distribution of R_i and Z_i, it can be seen that Σ_β depends on the distribution of R_i given Z_i. We assume the marginal distribution of R_i|Z_i is known, with P(R_i = r_l|Z_i = z) = p_l(z), l = 1, … , q. Note that this does not imply that R_i is independent from the observed outcome Y_i, i.e. the data are missing completely at random (MCAR). The missing data process is assumed MAR, and thus can depend on the observed outcome Y_i, but for power analyses, it is not necessary to specify the full underlying missing data process, P(R_i|Z_i, Y_i); it is only the marginal distribution of R_i|Z_i that matters.35

Given planned observation times t = (t₁, … , t_n), the residual variance–covariance matrix V, the distribution of Z, and the missing data distribution P(R|Z), the variance of ${\overset{\land}{β}}_{3}$ , V_b, can be calculated using expression (2). Here, we focus on the case of monotone missing data pattern in which the distribution of missing data is expressed by the random variable K. For a given K = k, X_i is a 4 × k matrix whose j^th column is (1, Z_i, t_j, Z_it_j)^T, j = 1, … , k; denote $φ_{lj}^{(k)}$ as the (l, j)^th element of $V_{i} - 1$ , l, j = 1, … , k, t^(k) = (1, … , t_k)^T, and $a_{k} = 1_{k}^{T} V_{i} - 1 1_{k} = \sum_{l = 1}^{k} \sum_{j = 1}^{k} φ_{lj}^{(k)}$ , $b_{k} = 1_{k}^{T} V_{i} - 1 t (k) =$ $\sum_{l = 1}^{k} \sum_{j = 1}^{k} (φ_{lj}^{(k)} t_{l})$ , and $c_{k} = t (k) T V_{i} - 1 t (k) = \sum_{l = 1}^{k} \sum_{j = 1}^{k} (φ_{lj}^{(k)} t_{l} t_{j})$ . Then, Σ_β (2) can be expressed as

1 m (E (a_{k}) E (Z_{i} a_{k}) E (b_{k}) E (Z_{i} b_{k}) E (Z_{i} a_{k}) E (Z_{i}^{2} a_{k}) E (Z_{i} b_{k}) E (Z_{i}^{2} b_{k}) E (b_{k}) E (Z_{i} b_{k}) E (c_{k}) E (Z_{i} c_{k}) E (Z_{i} b_{k}) E (Z_{i}^{2} b_{k}) E (Z_{i} c_{k}) E (Z_{i}^{2} c_{k})) - 1

If K and Z are independent, i.e. the distribution of missing data do not depend on Z, then Σ_β can be further simplified as

1 m (E (a_{k}) BE (b_{k}) B E (b_{k}) B E (c_{k}) B) - 1

where

B = (1 E (Z) E (Z) E (Z 2))

thus

V_{b} = E (a_{k}) m σ_{z}^{2} [E (a_{k}) E (c_{k}) - {E (b_{k})} 2]

(3)

Similar derivations have been shown in the GEE setting using an independent working correlation matrix.23 The formula (3) can be applied to any variance structure. For example, if a random intercept and random slope model is used, and the variances of the random intercept and slope are

σ_{0}^{2}

and

σ_{1}^{2}

, respectively, with a correlation of ρ between the two random effects, and the variance of the random error is

σ_{e}^{2}

, then the j^th diagonal element of V is

σ_{0}^{2} + t_{j}^{2} σ_{1}^{2} + 2 ρ σ_{0} σ_{1} t_{j} + σ_{e}^{2}

, j = 1, … , n; and the (j, l)^th non-diagonal element of V is

σ_{0}^{2} + t_{j} t_{l} σ_{1}^{2} + ρ σ_{0} σ_{1} (t_{j} + t_{l})

for j ≠ l, j, l = 1, … , n. Application of (3) is straightforward since the inverse of a matrix is easily calculated in most statistical software programs for any given V and the distribution of K.

To calculate the sample size m needed or the power 1 − γ for testing the null hypothesis H₀: β₃ = 0 versus H₁: β₃ = d using a two-sided α level test, we solve the equation d² = V_b(Z_1−α/2 + Z_1−γ)², where Z_p is the p^th percentile of the standard normal distribution.

We consider the compound symmetry variance structure because it is common and closed form formulae for V_b can be easily obtained. We first examine power based on the asymptotic properties of the MLE for the parameter of interest, β₃, and evaluate how power depends on the distribution of the missing data in Section 2.2. We then examine two alternative power calculation methods commonly used in practice in Section 2.3. Although we only consider the unadjusted model (1), the methods can be easily extended using the approach of Hsieh et al.36 for controlling for confounders.

2.2 MLE-based method and impact of missing data

2.2.1 MLE-based method for power analysis with missing data

The power for testing β₃ in model (1) in the presence of missing data has been studied,22^–31 and explicit formulae for sample size and power calculations are available,22^–26 but not as a function of the distribution of K for linear mixed effect models. We express the power and sample size formulae in terms of the distribution of K so that it can be more easily compared to the other conventional methods considered here. We assume that the missing data pattern is monotone where the distribution of R is equivalently expressed by the number of measurements K. The distribution p = (p₁, … , p_n) of K, with p_k = P(K = k|Z = z), k = 1, … , n, is assumed known. The variance matrix V is assumed compound symmetry with common variance σ² and correlation coefficient ρ.

Since the inverse of a k × k compound symmetry matrix with variance σ² and correlation coefficient ρ is $1 σ 2 (1 - ρ) (I_{k} - ρ 1 - ρ + k ρ) 1_{k} 1_{k}^{T})$ , the asymptotic variance V_b of ${\overset{\land}{β}}_{3}$ can be expressed as

V_{b} = σ 2 m Δ σ_{z}^{2}

(4)

where

Δ \equiv Δ (n, p, t, ρ) = E (S_{K}^{2}) / (1 - ρ) + δ

S_{K}^{2} = \sum_{j = 1}^{K} t_{j}^{2} - (\sum_{j = 1}^{K} t_{j}) 2 / K

δ = E {(\sum_{j = 1}^{K} t_{j}) 2 K (1 - ρ + K ρ)} -

{E (\sum_{j = 1}^{K} t_{j} 1 - ρ + K ρ)} 2 / E (K 1 - ρ + K ρ)

. As shown in Appendix 1, δ is non-negative, and equals zero only when K ≡ n, i.e. there are no missing data in a balanced design.

Meanwhile, it can also be shown that δ is far less than the first component of Δ, $E (S_{K}^{2}) / (1 - ρ)$ (see Appendix 2 for proof).

When there are no missing data, δ = 0, $E (S_{K}^{2}) = S_{n}^{2} = \sum_{j = 1}^{n} t_{j}^{2} - (\sum_{j = 1}^{n} t_{j}) 2 / n$ , so $Δ = S_{n}^{2} / (1 - ρ)$ and (4) reduces to

V_{b}^{full} = (1 - ρ) σ 2 {mS}_{n}^{2} σ_{z}^{2}

(5)

The sample size m needed to achieve power 1 − γ to detect H₁: β₃ = d, using a two-sided α level test, is calculated by

m = σ 2 (Z_{1 - α / 2} + Z_{1 - γ}) 2 Δ d 2 σ_{z}^{2}

(6)

Equivalently, if the sample size m is given, then the power 1 − γ can be calculated as

1 - γ = Φ (σ_{z} σ \sqrt{d 2 m Δ} - Z_{1 - α / 2}) \equiv Φ (c - Z_{1 - α / 2})

(7)

where

c = σ_{z} σ \sqrt{d 2 m Δ}

. When there are no missing data, i.e. P(K = n) = 1, (6) and (7) reduce to the familiar complete data formulae

m = (1 - ρ) σ 2 (Z_{1 - α / 2} + Z_{1 - γ}) 2 S_{n}^{2} d 2 σ_{z}^{2}

(8)

1 - γ = Φ (σ_{z} σ \sqrt{d 2 (1 - ρ) {mS}_{n}^{2}} - Z_{1 - α / 2})

(9)

If the predictor Z is binary with mean μ_z, e.g. as in a longitudinal study comparing two treatment groups, $σ_{z}^{2} = μ_{z} (1 - μ_{z})$ .

To illustrate how the crucial term Δ in equations (4), (6), and (7) is calculated, we give an example for n = 2. Suppose t = (0, 1) (e.g. t = 0 and 1 indicate baseline and follow-up, respectively), the intraclass correlation coefficient is ρ = 0.5, and the drop out rate at follow-up is 30%, so p = (0.30, 0.70). Since $S_{1}^{2} = 0$ and $S_{2}^{2} = 0.5$ , $E (S_{K}^{2}) = 0 (0.30) + 0.5 (0.70) = 0.35$ , $E {(\sum_{j = 1}^{K} t_{j}) 2 K (1 - ρ + K ρ)} =$ $0 \times 0.3 + 12 (1 + 0.5) \times 0.7 = 0.233$ , $E (\sum_{j = 1}^{K} t_{j} 1 - ρ + K ρ) = 0 \times 0.3 + 11 + 0.5 \times 0.7 = 0.467$ , and $E (K 1 - ρ + K ρ) =$ $1 \times 0.3 + 21 + 0.5 \times 0.7 = 1.233$ , hence δ = 0.233 − 0.467²/1.233 = 0.056, Δ = 0.35/0.5 + 0.056 = 0.756. If σ² = 1 and μ_z = 0.5 so $σ_{z}^{2} = 0.25$ , then the sample size needed to detect d = 0.5 under the alternative hypothesis with 90% power using two-sided α level 0.05 test is, from (6), $m = σ 2 (Z_{1 - α / 2} + Z_{1 - γ}) 2 / (Δ d 2 σ_{z}^{2}) = (1.96 + 1.28) 2 / (0.756 \times 0.52 \times 0.25) = 223$ .

2.2.2 Impact of missing data

Expressions (4), (6), and (7) suggest that the impact of missing data on the power for testing β₃ is through the function Δ, which depends on the maximum number of measurements n, correlation coefficient ρ, the value of time of measurements t, and the distribution p of the number of measurements K. For a given distribution of K, the efficiency and power increase as n and ρ increase, and as the time intervals between repeated measurements increase. Less missing data results in smaller Δ and thus more efficient and powerful estimate of β₃. Studies with a higher proportion of drop out at each time interval, or the same proportions of drop out but occurring earlier (e.g. 40% drop out at the last time interval versus 40% drop out at the first time interval), clearly have more missing data and thus smaller Δ and power.

However, without calculating Δ, the impact of the severity of missing data when n > 2 and with a variable drop out rate across visits is not obvious. In practice, the proportion of subjects with complete data, p_n, or the mean number of repeated measures, E(K), are commonly used as a summary of missing data. Although a large p_n or E(K) intuitively seems better for power, in fact the entire distribution of K needs to be known in order to calculate Δ and hence the power or sample size. To illustrate this using studies with n = 5, t = (0, 1, 2, 3, 4) and ρ = 0.5, we list values of Δ in Table 1 for different values of p, as well as the asymptotic power (Asym) estimate using the MLE-based method and the empirical power (Emp) from 2000 simulations for detecting β₃ = −0.02, with m = 500,

σ 2 = σ_{z}^{2} = 1

, and values of 0.25, 0.50, and 0.75 for the correlation coefficient, ρ. The first and last rows are the complete data cases with n = 5 and n = 4, respectively, as references. Rows 2–3 correspond to the cases of the least and most missing data given p₅ = 0.6, i.e. all drop outs before the last visit occur at the 4th and 1st visits, respectively. Similar results are shown for p₅ = 0.2. Rows 6–9 are examples of distributions of K with Δ in decreasing order and E(K) = 4. The empirical and asymptotic power estimates are similar.

Table 1.

Δ and power for some distributions of K from studies with n = 5.

		ρ = 0.25			ρ = 0.50			ρ = 0.75
p	E(K)	Δ	Asym	Emp	Δ	Asym	Emp	Δ	Asym	Emp
(0,0,0,0,1)	5	13.33	0.372	0.368	20	0.516	0.534	40	0.807	0.799
(0,0,0,0.4,0.6)	4.6	10.81	0.312	0.330	16.10	0.434	0.451	32.07	0.717	0.707
(0.4,0,0,0,0.6)	3.4	9.26	0.275	0.271	13.14	0.367	0.368	25.04	0.609	0.607
(0,0,0,0.8,0.2)	4.2	8.10	0.246	0.250	12.07	0.342	0.345	24.05	0.592	0.596
(0.8,0,0,0,0.2)	1.8	3.90	0.141	0.130	4.94	0.167	0.193	8.76	0.262	0.252
(0.25,0,0,0,0.75)	4	10.88	0.314	0.326	15.83	0.428	0.425	30.79	0.699	0.699
(0.15,0,0.05,0.3,0.5)	4	9.42	0.279	0.269	13.75	0.381	0.382	26.91	0.641	0.649
(0,0,0.2,0.6,0.2)	4	7.42	0.229	0.220	10.96	0.316	0.332	21.72	0.550	0.554
(0,0,0.1,0.8,0.1)	4	7.05	0.220	0.221	10.48	0.304	0.288	20.86	0.533	0.537
(0,0,0,1,0)	4	6.67	0.210	0.218	10	0.293	0.277	20	0.516	0.509

2.2.3 Adjusting for confounders

To adjust for a set of confounders, W, and the interaction between W and time when examining the effect of the main predictor variable on the slope in model (1), we can follow the method in the study of Hsieh et al.36 and simply replace $σ_{z}^{2}$ with $σ_{z}^{2} (1 - R_{z | w}^{2})$ , the conditional variance of Z given W, in all the power calculation formulae.

2.3 Alternative methods

In practice, alternative approaches that simply modify the complete data formula (8) or (9) are often used.

2.3.1 ASQ method

Since the power analysis formulae for complete data, (8) and (9), depend on n only through $S_{n}^{2}$ , it seems reasonable to replace $S_{n}^{2}$ in (8) or (9) by its expectation $E (S_{K}^{2})$ in the presence of missing data; that is

m ASQ = (1 - ρ) σ 2 (Z_{1 - α / 2} + Z_{1 - γ}) 2 E (S_{K}^{2}) d 2 σ_{z}^{2}

and the power estimate 1 − γ^ASQ is

1 - γ ASQ = Φ (σ_{z} σ \sqrt{d 2 (1 - ρ) m E (S_{K}^{2})} - Z_{1 - α / 2})

This method is referred to as the ASQ approach. We have shown that δ ≥ 0. This implies that

E (S_{K}^{2}) \leq E (S_{K}^{2}) + (1 - ρ) δ

. Since ASQ method ignores the nonnegative component, δ, in the denominator of V_b, it is conservative in that the sample size is always overestimated or power is always underestimated when there are missing data and ρ < 1. However, as δ is less than

E (S_{K}^{2}) / (1 - ρ)

, and almost ignorable when the drop out rate is not too high, the ratio of the sample size estimates from ASQ versus the MLE-based methods,

m ASQ / m = 1 + (1 - ρ) δ /

E (S_{K}^{2}) \equiv r ASQ

, is greater than or equal to 1 but less than 2, and is close to 1 when the drop out rate is small or moderate. In term of the power estimates for given sample size m, the ASQ power estimate,

1 - γ ASQ = Φ (c / \sqrt{r ASQ} - Z_{1 - α / 2})

, is less than or equal to the MLE-based power estimate, 1 − γ = Φ(c − Z_1−α/2). This can be intuitively seen by noting that the subjects with only one measurement do not contribute to the power analysis since

S_{1}^{2} = 0

We examined the ratio of the sample size estimates using the ASQ method versus the MLE-based method, m^ASQ/m, in a longitudinal study with n = 2 and n = 5, various values of the correlation coefficient ρ, and a constant drop out rate. Results are shown in Figures 1 and 2 for n = 2 and n = 5, respectively. The figures show that the ratio m^ASQ/m is always greater than or equal to 1, increases as ρ decreases or as the drop out rate increases, and increases faster as the drop out increases when the correlation ρ is smaller. As the drop out rate approaches 1, the ratio reaches a limit less than 2 for both n = 2 and n = 5, i.e. ASQ overestimates sample size by no more than 100%. When ρ is large and the drop out rate is moderate or small, the ratio is close to 1, and thus the ASQ method is a good approximation to the MLE-based method. For example, when ρ = 0.8 and the drop out rate is 30% in the case of n = 2, it only overestimates the sample size by about 3%.

Figure 1.

Ratio of sample size calculated using the ASQ method versus the MLE-based method for n = 2. ASQ, average sum of squares; MLE, maximum likelihood estimate.

Figure 2.

Ratio of sample size calculated using the ASQ method versus the MLE-based method for n = 5. ASQ, average sum of squares; MLE, maximum likelihood estimate.

2.3.2 ANS method

Another commonly used approach for computing power with missing data is to equate it to a balanced longitudinal study without missing data that has the same total number of repeated measurements $\sum_{i = 1}^{m} K_{i} = m E (K)$ .17 We call it the ANS method. To calculate power, m is replaced in the complete data formula (9) with $m * = \sum_{i = 1}^{m} K_{i} / n = m E (K) / n$ , i.e. the ANS estimate of power

1 - γ ANS = Φ (σ_{z} σ \sqrt{nd 2 (1 - ρ) m E (K) S_{n}^{2}} - Z_{1 - α / 2})

To calculate sample size, m* is first determined using the complete data formula (8) and then the sample size is calculated as nm*/E(K); that is, the result from (8) is multiplied by n/E(K), which yields

m ANS = n (1 - ρ) σ 2 (Z_{1 - α / 2} + Z_{1 - γ}) 2 E (K) S_{n}^{2} d 2 σ_{z}^{2}

The function $S_{k}^{2}$ , the realization of $S_{K}^{2}$ for K = k, is an increasing cubic function of k. For example, if n = 5 and t_j = j − 1, j = 1, … , 5, $S_{k}^{2} = (k 3 - k) / 12$ , k = 1, … , 5, i.e. $(S_{1}^{2}, S_{2}^{2}, S_{3}^{2}, S_{4}^{2}, S_{5}^{2}) =$ $(0, 0.5, 2, 5, 10)$ . Hence, the contribution to $E (S_{K}^{2})$ , the dominant component in the denominator of V_b, from a subject with n measurements, is very much larger than that from a subject with number of measurements k < n. Replacing the power estimate by the complete data formula of a balanced study with the same number of total measurements assumes that the contribution to V_b of subjects with k < n number of measurements is the same as that of subjects with n measurements. This results in overestimated power or underestimated sample size, and its difference from the MLE-based estimate is larger when there are more missing data, especially for larger n as the differences between $S_{n}^{2}$ and the other $S_{k}^{2}$ , k < n, increases as n increases.

We examined the ratio of the sample size calculated using the ANS method versus the correct method, $m ANS m = n (1 - ρ) Δ E (K) S_{n}^{2} \equiv r ANS$ , for n = 2 and n = 5, various values of ρ, and a constant drop out rate. The results, shown in Figures 3 and 4 for n = 2 and n = 5, respectively, indicate that the ratio m^ANS/m is always less than or equal to 1, decreases as ρ increases and as the drop out rate increases, and decreases faster as the drop out rate increases when ρ is larger and when n is larger. When the drop out rate is approaching 1, the ratio is approaching 0. This implies that the ANS method can severely underestimate sample size, and thus should be avoided.

Figure 3.

Ratio of sample size calculated using the ANS method versus the MLE-based method for n = 2. ANS, average number of subjects; MLE, maximum likelihood estimate.

Figure 4.

Ratio of sample size calculated using the ANS method versus the MLE-based method for n = 5. ANS, average number of subjects; MLE, maximum likelihood estimate.

When the sample size m is fixed, the ANS power estimate, $1 - γ ANS = Φ (c / \sqrt{r ANS} - Z_{1 - α / 2})$ , is greater than or equal to the MLE-based power estimate, 1 − γ = Φ(c − Z_1−α/2).

2.3.3 Other methods

Other commonly used methods in practice include using only the subset of subjects with all measurements k = n and then applying the complete data formula. Here, only a proportion p_n of the subjects are used, so it clearly is too conservative, especially when p_n is small and n is moderate or large. In the case of n = 2, it is equivalent to the ASQ method. As this method has already been studied in the literature,24 we do not consider it further in this article.

3 Simulation studies

3.1 Simulation study 1

In the first simulation study, the missing data were generated completely at random, and our goal was to examine the performance of the MLE-based methods and the conventional ASQ and ANS ones. We considered longitudinal studies with two and five visits, representing short and moderate lengths of follow-up, respectively. For n = 2, the planned follow-up times t = (0, 1), representing baseline and follow-up visits; for n = 5, we have t = (0, 1, 2, 3, 4), representing baseline and four follow-up visits with equal time intervals (e.g. annually). We considered values of 0.25, 0.50, and 0.75 for the correlation coefficient ρ, representing low, medium, and high levels of correlation, respectively. The predictor variable, Z, was generated from a Bernoulli distribution with P(Z = 1) = 0.5 for the case of n = 2, and standard normal for the case of n = 5. The repeatedly measured outcome Y was generated from model (1), with β₀ = β₁ = β₂ = 0, σ² = 1. The first (or baseline) measure Y₁ was observed for every subject. For the missing data distribution, the observation indicator R₂ for the follow-up measure Y₂ in the case of n = 2 was generated from a Bernoulli distribution with P(R₂ = 1|Z, Y₁) = 1 − p_d. In the case of n = 5, a constant drop out rate p_d was considered, with P(R_ij = 1|R_ij−1 = 1, Z_i, Y_i1, … , Y_ij−1) = 1 − p_d, j = 2, … , 5. Values of 0.1, 0.2, and 0.3, commonly observed in longitudinal studies, were considered for the drop out rate p_d. We evaluated the power for given sample size m of 100, 200, 300, 400, and 500, with β₃ = −0.3 and −0.02 for n = 2 and n = 5, respectively. Empirical power estimates from 2000 simulations (Emp) were obtained, and compared to the power estimates using the MLE-based method (Asym), ASQ and ANS methods. Sample size estimates from different methods for given power of 80% and 90% were also examined, with β₃ = 0.3 and 0.04 for n = 2 and n = 5, respectively, along with the empirical power estimates from 2000 simulations (Emp) using the estimated sample sizes. Extreme drop out rates of 0.5, 0.7, and 0.9 were also considered. For assessing power estimates, sample sizes of m = 500 for the case of n = 2 and m = 10000 for the case of n = 5 were used. For evaluating sample size estimates for a given power, the target power was set at 80% with β₃ = 0.3 and 0.05 for n = 2 and n = 5, respectively. The situation of no missing data was also considered as a reference, in which case all three power analysis methods were reduced to the complete data formulae.

Results of the performance of different power estimates are given in Table 2. Overall, the MLE-based estimate of power is close to the empirical power estimate. The power estimate using the ASQ method is similar to the empirical estimate when p_d = 10%. The trend of ASQ underestimating power becomes slightly more apparent as the missing data proportion becomes larger, especially for small ρ. For example, for n = 2, when p_d = 20% or 30%, ASQ underestimates power by more than 10% compared to the empirical estimates when m = 100 and ρ = 0.25. For n = 5, when p_d = 30%, the degree of underestimation of power by the ASQ method reaches a maximum of 22% when ρ = 0.25. Under the smallest missing data proportion considered, p_d = 10%, the ANS power estimate is similar to the empirical power estimate when n = 2, and is slightly greater when n = 5. The degree of overestimation of the ANS power estimate is more severe when the drop out rate increases, especially for large ρ and n. For example, for n = 2, the power is overestimated by 13% when ρ = 0.75 and m = 100 or 200; and for n = 5 and ρ = 0.75, the magnitude of overestimation is between 15% and 29% when p_d = 20%, and strikingly over 45% when p_d = 30% and m > = 200.

Table 2.

Empirical power from 2000 simulations and power estimates using the MLE-based (Asym), the ASQ and ANS methods under different drop out rate p_d, correlations ρ, and sample size m.

		p_d = 0		p_d = 10%				p_d = 20%				p_d = 30%
ρ	m	Emp	Asym	Emp	Asym	ASQ	ANS	Emp	Asym	ASQ	ANS	Emp	Asym	ASQ	ANS
n = 2, Z is Bernoulli with mean 0.5, β₃ = −0.3
0.25	100	0.244	0.231	0.226	0.219	0.212	0.222	0.222	0.206	0.194	0.212	0.198	0.192	0.175	0.203
	200	0.420	0.410	0.384	0.388	0.376	0.393	0.373	0.364	0.341	0.376	0.325	0.337	0.305	0.358
	300	0.567	0.564	0.535	0.536	0.521	0.543	0.486	0.505	0.475	0.521	0.467	0.470	0.427	0.498
	400	0.675	0.688	0.675	0.659	0.642	0.666	0.639	0.625	0.591	0.642	0.572	0.585	0.536	0.617
	500	0.787	0.782	0.749	0.754	0.738	0.761	0.734	0.721	0.688	0.738	0.688	0.682	0.630	0.714
0.50	100	0.322	0.323	0.298	0.302	0.296	0.309	0.293	0.280	0.268	0.296	0.258	0.256	0.240	0.282
	200	0.558	0.564	0.547	0.531	0.521	0.543	0.488	0.495	0.475	0.521	0.452	0.454	0.427	0.498
	300	0.729	0.738	0.708	0.704	0.693	0.716	0.659	0.664	0.642	0.693	0.625	0.618	0.585	0.668
	400	0.855	0.851	0.832	0.822	0.812	0.833	0.790	0.786	0.765	0.812	0.748	0.742	0.709	0.790
	500	0.924	0.918	0.901	0.897	0.889	0.905	0.871	0.868	0.851	0.889	0.828	0.831	0.801	0.871
0.75	100	0.559	0.564	0.520	0.526	0.521	0.543	0.483	0.485	0.475	0.521	0.442	0.440	0.427	0.498
	200	0.865	0.851	0.818	0.817	0.812	0.833	0.768	0.776	0.765	0.812	0.733	0.725	0.709	0.790
	300	0.952	0.957	0.929	0.939	0.937	0.948	0.906	0.914	0.908	0.937	0.881	0.880	0.867	0.923
	400	0.990	0.989	0.980	0.982	0.981	0.985	0.970	0.970	0.967	0.981	0.948	0.951	0.944	0.975
	500	0.994	0.997	0.995	0.995	0.995	0.996	0.989	0.990	0.989	0.995	0.985	0.982	0.978	0.992
n = 5, Z is standard normal, β₃ = −0.02
0.25	100	0.125	0.109	0.106	0.093	0.090	0.097	0.086	0.080	0.074	0.087	0.071	0.068	0.062	0.078
	200	0.180	0.177	0.156	0.146	0.138	0.153	0.132	0.119	0.109	0.133	0.101	0.097	0.086	0.117
	300	0.248	0.244	0.209	0.197	0.186	0.208	0.163	0.158	0.142	0.178	0.129	0.125	0.109	0.154
	400	0.301	0.309	0.253	0.248	0.234	0.262	0.195	0.196	0.175	0.223	0.160	0.152	0.132	0.192
	500	0.375	0.372	0.293	0.298	0.281	0.315	0.224	0.234	0.209	0.267	0.188	0.179	0.154	0.229
0.5	100	0.147	0.143	0.116	0.117	0.114	0.125	0.092	0.096	0.092	0.110	0.085	0.079	0.074	0.098
	200	0.241	0.244	0.187	0.193	0.186	0.208	0.159	0.151	0.142	0.178	0.130	0.117	0.109	0.154
	300	0.337	0.341	0.274	0.267	0.257	0.288	0.204	0.205	0.192	0.245	0.177	0.155	0.143	0.210
	400	0.434	0.432	0.343	0.338	0.327	0.367	0.257	0.258	0.241	0.311	0.187	0.192	0.176	0.265
	500	0.519	0.516	0.417	0.407	0.393	0.440	0.314	0.310	0.290	0.375	0.240	0.230	0.209	0.319
0.75	100	0.249	0.244	0.188	0.189	0.186	0.208	0.147	0.146	0.142	0.178	0.128	0.113	0.109	0.154
	200	0.437	0.432	0.321	0.332	0.327	0.367	0.269	0.248	0.241	0.311	0.183	0.183	0.176	0.265
	300	0.587	0.591	0.447	0.463	0.456	0.509	0.344	0.348	0.337	0.435	0.254	0.253	0.242	0.371
	400	0.709	0.716	0.585	0.578	0.570	0.629	0.425	0.441	0.428	0.546	0.322	0.321	0.307	0.470
	500	0.808	0.807	0.692	0.674	0.666	0.726	0.522	0.526	0.512	0.640	0.374	0.386	0.370	0.558

ASQ, average sum of squares; ANS, average number of subjects.

Sample size estimates for a given power were also compared across the different approaches and results are given in Table 3. Overall, the empirical power using the sample size estimate from the MLE-based method (Asym) is close to the target power value. The sample size estimate from the ASQ method is close to that from the MLE-based method when ρ is large and p_d is small. Overestimation of the ASQ method in the sample size reaches maximums of 13% and 23% for n = 2 and n = 5, respectively, when p_d = 30%. Correspondingly, the empirical power estimates using the ASQ sample size is larger than the target power level under these circumstances. The ANS method underestimates sample size compared to the MLE-based method, especially when p_d and ρ are larger. The most severe underestimation occurs at the highest drop out rate, p_d = 30%, and the highest ρ = 0.75 considered, with 14% and 37% for n = 2 and n = 5, respectively. Correspondingly, the empirical power estimates using the ANS sample size estimates are lower than the target power level, e.g. 59% when the target power level is 80% for n = 5, p_d = 30%, and ρ = 0.75.

Table 3.

Sample size estimates (empirical power from 2000 simulations) using the MLE-based (Asym), the ASQ and ANS methods under different drop out rates p_d, correlations ρ and power.

		Power = 0.80			Power = 0.90
ρ	p _d	Asym	ASQ	ANS	Asym	ASQ	ANS
n = 2, Z is Bernoulli with mean 0.5, β₃ = 0.3
0.25	0	524 (0.808)	524 (0.793)	524 (0.803)	701 (0.905)	701 (0.903)	701 (0.907)
	0.1	560 (0.789)	582 (0.815)	551 (0.787)	750 (0.897)	779 (0.915)	738 (0.892)
	0.2	606 (0.805)	655 (0.806)	582 (0.765)	810 (0.905)	876 (0.918)	779 (0.888)
	0.3	664 (0.812)	748 (0.856)	616 (0.774)	889 (0.897)	1001 (0.928)	825 (0.883)
0.5	0	349 (0.798)	349 (0.785)	349 (0.800)	467 (0.898)	467 (0.897)	467 (0.905)
	0.1	378 (0.806)	388 (0.804)	368 (0.797)	506 (0.898)	519 (0.905)	492 (0.890)
	0.2	415 (0.788)	437 (0.818)	388 (0.773)	555 (0.898)	584 (0.911)	519 (0.894)
	0.3	461 (0.805)	499 (0.823)	411 (0.755)	618 (0.896)	668 (0.919)	550 (0.861)
0.75	0	175 (0.803)	175 (0.795)	175 (0.798)	234 (0.897)	234 (0.903)	234 (0.894)
	0.1	192 (0.807)	194 (0.810)	184 (0.774)	257 (0.909)	260 (0.909)	246 (0.888)
	0.2	213 (0.791)	219 (0.814)	194 (0.755)	285 (0.896)	292 (0.907)	260 (0.873)
	0.3	240 (0.803)	250 (0.822)	206 (0.733)	322 (0.891)	334 (0.911)	275 (0.861)
n = 5, Z is standard normal, β₃ = 0.04
0.25	0	368 (0.789)	368 (0.780)	368 (0.783)	493 (0.896)	493 (0.891)	493 (0.902)
	0.1	480 (0.777)	516 (0.818)	450 (0.766)	642 (0.896)	691 (0.915)	602 (0.870)
	0.2	646 (0.804)	745 (0.864)	548 (0.730)	864 (0.905)	997 (0.927)	733 (0.859)
	0.3	904 (0.807)	1111 (0.874)	664 (0.664)	1211 (0.901)	1487 (0.950)	889 (0.783)
0.5	0	246 (0.783)	246 (0.791)	246 (0.805)	329 (0.891)	329 (0.898)	329 (0.903)
	0.1	330 (0.791)	344 (0.806)	300 (0.775)	442 (0.889)	461 (0.915)	401 (0.877)
	0.2	458 (0.790)	497 (0.852)	365 (0.717)	613 (0.902)	665 (0.920)	489 (0.845)
	0.3	660 (0.801)	741 (0.840)	443 (0.625)	883 (0.891)	991 (0.924)	593 (0.734)
0.75	0	123 (0.797)	123 (0.788)	123 (0.792)	165 (0.902)	165 (0.888)	165 (0.897)
	0.1	169 (0.775)	172 (0.810)	150 (0.741)	226 (0.881)	231 (0.900)	201 (0.866)
	0.2	240 (0.787)	249 (0.810)	183 (0.678)	321 (0.890)	333 (0.897)	245 (0.805)
	0.3	352 (0.788)	371 (0.818)	222 (0.586)	471 (0.899)	496 (0.907)	297 (0.722)

ASQ, average sum of squares; ANS, average number of subjects.

Results under extremely high drop out rates are given in Table 4 for evaluating power estimates for given sample size, and Table 5 for evaluating sample size estimate for given power. As in the regular case of low-to-moderate drop out rates, for given sample sizes, the MLE-based estimates of power are close to the empirical power estimates; and for given power, the empirical power using the sample size estimate from the MLE-based method is close to the target power level. It is more evident that the ASQ underestimates power for a given sample size, and overestimates sample size for a given power but the magnitudes of under and over estimation are limited. For example, for given sample sizes (Table 4), in all the scenarios with ρ = 0.25, ASQ underestimates power by 12% to 27%; and for given power, the sample size is overestimated by a maximum of 56% when n = 5, p_d = 90%, and ρ = 0.25. On the other hand, the ANS method dramatically overestimates power for given sample size, and underestimates sample size for given power, especially in the case of n = 5. For example, when p_d = 0.9 and ρ = 0.75, the ANS power estimate is 100% with the estimated sample size while the empirical power is only 21% (Table 4), and for the given power, the ANS sample size estimate is underestimated by 97%, and the empirical power using the ANS sample size is only 8.6% in contrast to the target 80% level of power (Table 5).

Table 4.

Empirical power from 2000 simulations using the MLE-based (Asym), the ASQ and ANS methods under extremely large constant drop out rate p_d.

	p_d = 50%				p_d = 70%				p_d = 90%
ρ	Emp	Asym	ASQ	ANS	Emp	Asym	ASQ	ANS	Emp	Asym	ASQ	ANS
n = 2 (m = 500), Z is Bernoulli with mean 0.5, β₃ = −0.3
0.25	0.562	0.575	0.491	0.660	0.400	0.416	0.323	0.598	0.180	0.185	0.137	0.528
0.50	0.714	0.718	0.660	0.828	0.530	0.525	0.451	0.772	0.214	0.225	0.184	0.701
0.75	0.935	0.934	0.918	0.984	0.774	0.776	0.738	0.969	0.375	0.357	0.323	0.940
n = 5 (m = 10000), Z is standard normal, β₃ = −0.02
0.25	0.864	0.871	0.754	0.995	0.427	0.432	0.313	0.974	0.121	0.114	0.088	0.931
0.50	0.951	0.942	0.900	0.999	0.510	0.522	0.438	0.998	0.136	0.133	0.111	0.988
0.75	0.998	0.997	0.996	1.000	0.763	0.764	0.723	1.000	0.210	0.198	0.181	1.000

ASQ, average sum of squares; ANS, average number of subjects.

Table 5.

Sample size estimates (empirical power from 2000 simulations) given 80% power using the MLE-based (Asym), the ASQ and ANS methods under extremely large constant drop out rate p_d.

	p_d = 50%			p_d = 70%			p_d = 90%
ρ	Asym	ASQ	ANS	Asym	ASQ	ANS	Asym	ASQ	ANS
n = 2, Z is Bernoulli with mean 0.5, β₃ = 0.3
0.25	851	1047	698	1287	1745	806	3467	5233	952
	(0.798)	(0.879)	(0.724)	(0.796)	(0.899)	(0.609)	(0.788)	(0.935)	(0.309)
0.5	611	698	466	960	1163	537	2704	3489	635
	(0.799)	(0.839)	(0.678)	(0.789)	(0.867)	(0.546)	(0.802)	(0.890)	(0.269)
0.75	328	349	233	531	582	269	1548	1745	318
	(0.785)	(0.812)	(0.661)	(0.789)	(0.822)	(0.518)	(0.793)	(0.837)	(0.260)
n = 5, Z is standard normal, β₃ = 0.05
0.25	1315	1795	608	3924	5793	827	21997	34375	1060
	(0.780)	(0.907)	(0.484)	(0.786)	(0.922)	(0.255)	(0.792)	(0.938)	(0.109)
0.5	1006	1197	406	3093	3862	551	17557	22917	707
	(0.786)	(0.856)	(0.411)	(0.791)	(0.874)	(0.212)	(0.796)	(0.907)	(0.088)
0.75	555	599	203	1750	1931	276	10146	11459	354
	(0.798)	(0.817)	(0.385)	(0.793)	(0.811)	(0.191)	(0.796)	(0.836)	(0.086)

ASQ, average sum of squares; ANS, average number of subjects.

3.2 Simulation study 2

As introduced in section 2.1, the missing data mechanism is assumed to be MAR in the linear mixed effects model, i.e. R can depend on the observed outcome Y, but the efficiency and power of the MLE depend only on the marginal distribution of the missing data, p = P(R|Z). To examine the performance of the MLE-based method under MAR, we performed the second simulation study with Z and Y generated as described in the previous simulation study, but generated the observation indicator R under MAR, and compared the empirical power with that generated from MCAR with the same marginal distribution. Specifically, for n = 2, we generated R₂ with P(R₂ = 1|Y₁ ≥ 0, Z) = π₁, and P(R₂ = 1|Y₁ < 0, Z) = π₂, so that the marginal distribution P(R₂ = 1|Z) = (π₁ + π₂)/2 since Y₁ had mean 0 and was unrelated to Z. When π₁ ≠ π₂, the missing data mechanism was MAR. Different values of (π₁, π₂) were considered. The value (1,0.8) corresponds to an overall 10% drop out at follow-up, values (1, 0.6) and (0.9, 0.7) an overall 20% drop out, and values of (1, 0.4) and (0.8, 0.6) an overall 30% drop out. For n = 5, we considered for simplicity the situation that drop out occurred only at the first follow-up visit, with P(R_i2 = 1|Y₁ ≥ 0, Z) = π₁, P(R_i2 = 1|Y₁ ≤ 0, Z) = π₂, and R_i5 = R_i4 = R_i3 = R_i2. The marginal distribution of R₂ was therefore P(R_i2 = 1|Z) = (π₁ + π₂)/2 since Y₁ had mean 0 and was unrelated to Z, which correspond to p_d = 1−(π₁ + π₂)/2 drop out at the first follow-up visit, so p = (p_d, 0, 0, 0, 1 − p_d). Values of (0.5, 0.5), (0.3, 0.7), (0.2, 0.8), and (0.1, 0.9) were considered for (π₁, π₂), which all corresponded to p_d = 0.5 at the first follow-up. The cases of π₁ = π₂ = 0.5 correspond to the MCAR missing data case. The results are summarized in Tables 6 and 7 for n = 2 and 5, respectively.

Table 6.

Empirical power from 2000 simulations under different (π₁, π₂) values when missing data are MAR (n = 2).

Marginal		p_d = 10%			p_d = 20%				p_d = 30%
ρ	m	Asym	(0.9,0.9)	(1,0.8)	Asym	(0.8,0.8)	(1,0.6)	(0.9,0.7)	Asym	(0.7,0.7)	(1,0.4)	(0.8,0.6)
0.25	100	0.219	0.226	0.222	0.206	0.222	0.201	0.211	0.192	0.198	0.199	0.193
	200	0.388	0.384	0.397	0.364	0.373	0.377	0.363	0.337	0.325	0.323	0.336
	300	0.536	0.535	0.533	0.505	0.486	0.499	0.501	0.470	0.467	0.478	0.458
	400	0.659	0.675	0.671	0.625	0.639	0.611	0.422	0.585	0.572	0.597	0.582
	500	0.754	0.749	0.760	0.721	0.734	0.727	0.715	0.682	0.688	0.680	0.672
0.50	100	0.302	0.298	0.290	0.280	0.293	0.269	0.273	0.256	0.258	0.254	0.239
	200	0.531	0.547	0.544	0.495	0.488	0.475	0.486	0.454	0.452	0.453	0.442
	300	0.704	0.708	0.712	0.664	0.659	0.657	0.682	0.618	0.625	0.613	0.623
	400	0.822	0.832	0.806	0.786	0.790	0.790	0.796	0.742	0.748	0.741	0.742
	500	0.897	0.901	0.895	0.868	0.871	0.877	0.870	0.831	0.828	0.822	0.837
0.75	100	0.526	0.520	0.538	0.485	0.483	0.490	0.487	0.440	0.442	0.436	0.451
	200	0.817	0.818	0.823	0.776	0.768	0.779	0.788	0.725	0.733	0.718	0.709
	300	0.939	0.929	0.947	0.914	0.906	0.910	0.919	0.880	0.881	0.869	0.877
	400	0.982	0.980	0.983	0.970	0.970	0.973	0.970	0.951	0.948	0.954	0.948
	500	0.995	0.995	0.994	0.990	0.989	0.987	0.993	0.982	0.985	0.979	0.980

MAR, missing at random.

Table 7.

Empirical power from 2000 simulations under different (π₁, π₂) values when missing data are MAR (n = 5) (all correspond to marginal distribution p = (0.5, 0, 0, 0, 0.5)).

ρ	m	Asym	(0.5, 0.5)	(0.3, 0.7)	(0.2, 0.8)	(0.1, 0.9)
0.25	100	0.082	0.083	0.090	0.083	0.094
	200	0.124	0.105	0.140	0.132	0.123
	300	0.165	0.170	0.155	0.163	0.171
	400	0.206	0.194	0.193	0.214	0.222
	500	0.246	0.234	0.247	0.257	0.256
0.50	100	0.099	0.100	0.116	0.109	0.099
	200	0.156	0.171	0.138	0.159	0.159
	300	0.212	0.212	0.210	0.206	0.216
	400	0.268	0.272	0.260	0.268	0.262
	500	0.323	0.317	0.343	0.335	0.326
0.75	100	0.149	0.150	0.136	0.153	0.144
	200	0.255	0.253	0.279	0.274	0.260
	300	0.356	0.342	0.367	0.349	0.361
	400	0.451	0.455	0.458	0.430	0.448
	500	0.538	0.548	0.538	0.514	0.539

MAR, missing at random.

From Tables 6 and 7, the empirical power estimates from different MAR cases with the same marginal drop out rate are all similar to that from the MCAR case, and all empirical power estimates are similar to the MLE-based power estimates which only used the marginal distribution of the missing data.

4 Examples

We apply the MLE-based method and the ASQ and ANS methods to the power analysis of two longitudinal studies in the elderly. Empirical power estimates from 2000 simulations are also obtained.

4.1 Example 1: Between group comparison of change before and after treatment in a clinical trial with two visits

In designing a randomized clinical trial to test the efficacy of CR in improving mobility in sedentary seniors,37 eligible participants will be randomized equally either to CR or health education control group. The primary mobility outcome is the change in gait velocity from baseline at the end of the intervention. The primary hypothesis is that subjects in the CR intervention group will show better improvement in gait velocity than those in the control group.

The linear mixed effects model (1) will be used to analyze the data. Here, the predictor Z, an indicator of the intervention group, is binary with P(Z_i = 1) = 0.5. The goal was to estimate the sample size needed to detect, with 85% power, a difference of 4 cm/s in the change of gait velocity between the intervention and control group, with an expected drop out rate of 20% after intervention.

Based on previous studies,38 the value of σ is assumed to be 24 cm/s and the intraclass correlation coefficient ρ is 0.875. Using formula (6) from the MLE-based method, the estimate of sample size needed is m = 400. Using the ASQ method, the sample size estimate is 406, close to the correct value because ρ is large and the expected drop out is not severe. With the ANS method, the estimated sample size is 360, which underestimates the correct sample size by 10%. Simulation studies with 2000 repetitions were performed to evaluate the empirical power using these sample size estimates. The empirical power from sample sizes of 400, 406, and 360, estimated using the MLE-based, ASQ, and ANS methods, respectively, are 85.2%, 85.1%, and 81.2%. The MLE-based and ASQ sample size estimates are similar, and both yield empirical power estimates similar to the target level of 85%. As expected, the underestimated sample size using the ANS approach results in an empirical power estimate smaller than the target value.

4.2 Example 2: Effect of a continuous predictor on the slope in a longitudinal study with five visits

In designing one of the projects in the renewal application of the EAS,39 a long-term study of aging, it is assumed that 390 subjects with baseline measures of cerebral vascular reactivity would be available. The primary hypothesis is that this variable would be associated with the rate of cognitive decline as measured by Free and Cued Selective Reminding Test (FCSRT).40 The study cohort will consist of a mix of active study participants and new enrollees. The outcome will be measured at baseline and each annual follow-up visit during a 5-year study period. Given the study design and the expected attrition rate, it is projected that 30%, 9%, 24%, 19%, and 18% of the sample will have 1, 2, 3, 4, and 5 repeated measures in FCSRT, respectively; the intraclass correlation ρ is 0.50, and the conditional standard deviation σ is 6.4.

The goal was to determine the power for detecting a difference of 0.35 points per year in the rate of cognitive decline in FCSRT corresponding to a 1 SD unit difference in the cerebral vascular reactivity measure using a two-sided α level 0.05 test and 390 total subjects.

The power estimates are 0.83, 0.79, and 0.95 using the MLE-based, ASQ, and ANS methods, respectively. The empirical power estimate from 2000 simulations is 0.813, close to the MLE-based estimate. The ASQ method slightly underestimated power, while the ANS method overestimated power by 17%.

5 Discussion

Missing data are common in longitudinal studies and therefore should be taken into account when assessing power for testing the effect of a predictor variable on the rate of change in the outcome. Despite the availability of methods for evaluating power with missing data, it has remained unclear how missing data actually affect power, and erroneous approaches are still often used in practice. We have shown that the ASQ method ignores a non-negative component of the Δ function, and thus results in an overestimated sample size or underestimated power, especially when the correlation between repeated measures is small and the proportion of missing data is high. For small to moderate degrees of missing data and large intraclass correlation, the ASQ is similar to the MLE-based approach. The ANS method, on the other hand, assumes that subjects with incomplete and complete data contribute to the power analysis equally, and thus can result in seriously overestimated power or underestimated sample size, especially when the degree of missing data is severe and the number of planned visits and the correlation among repeated measures are moderate or large. Extensive simulation studies have confirmed these conclusions. The power analysis method is based on a linear mixed effects model which assumes that the data are MAR, i.e. the missing data can depend on the observed outcome given the predictor. However, only the marginal distribution of the missing data, i.e. the distribution of missing data given the predictor, is needed for determining power. Our findings suggest that the ANS method can be greatly misleading and should not be used. The conservative ASQ method can be a good approximation when the correlation is large and the extent of missing data is moderate. Based on our simulation studies, the asymptotically correct MLE-based method performs well with limited sample sizes and is recommended especially with large amounts of missing data.

Footnotes

Funding

This study was funded by National Institute of Health [P01-AG03949 and R01-AG02511903].

Conflict of interest

None declared.

Appendix 1

Appendix 2

References

Diggle

Heagerty

Liang

. Analysis of longitudinal data, 2nd ed. New York: Oxford University Press, 2002.

Donner

Klar

. Design and analysis of cluster randomization trials in health research, London, England: Arnold, 2000.

Bloch

. Sample size requirements and the cost of a randomized clinical trial with repeated measurements. Stat Med 1986; 5: 663–667.

Rochon

. Sample size calculations for two-group repeated- measures experiments. Biometrics 1991; 47: 1383–1398.

Rochon

. Application of GEE procedures for sample size calculations in repeated measures experiments. Stat Med 1998; 17: 1643–1658.

Overall

Doyle

. Estimating sample sizes for repeated measurement design. Controlled Clin Trials 1994; 15: 100–123.

Overall

Shobaki

Anderson

. Comparative evaluation of two models for estimating sample sizes for tests on trends across repeated measurements. Controlled Clin Trials 1998; 19: 188–197.

Lipsitz

Fitzmaurice

. Sample size for repeated measures studies with binary responses. Stat Med 1994; 13: 1233–1239.

Muller

LaVange

Ramey

. Power calculations for general linear multivariate models including repeated measures applications. J Am Stat Assoc 1992; 87: 1209–1226.

10.

Schouten

HJA

. Planning group sizes in clinical trials with a continuous outcome and repeated measures. Stat Med 1999; 18: 255–264.

11.

Kim

Williamson

Lyles

. Sample size calculations for studies with correlated ordinal outcomes. Stat Med 2005; 24: 2977–2987.

12.

Kowalski

Zhang

. Power analyses for longitudinal trials and other clustered designs. Stat Med 2004; 23: 2799–2815.

13.

Brooks

Cottenden

Fader

. Sample sizes for designed studies with correlated binary data. The Statistician 2003; 52: 539–551.

14.

Hendricks

Wassell

Collins

. Power determination for geographically clustered data using generalized estimating equations. Stat Med 1996; 15: 1951–1960.

15.

Patel

Rowe

. Sample size for comparing linear growth curves. J Biopharm Stat 1999; 9: 339–350.

16.

Liu

Liang

. Sample size calculations for studies with correlated observations. Biometrics 1997; 53: 937–947.

17.

Liu

Shih

Gehan

. Sample size and power determination for clustered repeated measurements. Stat Med 2002; 21: 1787–1801.

18.

Manatunga

Hudgens

. Sample size estimation in cluster randomized studies with varying cluster size. Biometrical J 2001; 43: 75–86.

19.

Taljaard

Donner

Klar

. Accounting for expected attrition in the planning of community intervention trials. Stat Med 2007; 26: 2615–2628.

20.

Hedeker

Gibbons

Waternaux

. Sample size estimation for longitudinal designs with attrition: comparing time-related contrasts between two groups. J Educ Behav Stat 1999; 24: 70–93.

21.

Luo

Chen

. Sample size estimation for repeated measures analysis in randomized clinical trials with missing data. Int J Biostat 2008; 4(1): Article 9.

22.

. Sample size for comparison of changes in the presence of right censoring caused by death, withdrawal, and staggered entry. Controlled Clin Trials 1988; 9: 32–46.

23.

Jung

Ahn

. Sample size estimation for GEE method for comparing slopes in repeated measurements data. Stat Med 2003; 22: 1305–1315.

24.

Ahn

Jung

. Effect of dropouts on sample size estimates for test on trends across repeated measurements. J Biopharm Stat 2005; 15: 33–41.

25.

Lefante

. The power to detect differences in average rates of change in longitudinal studies. Stat Med 1990; 9: 437–446.

26.

Dawson

. Sample size calculations based on slopes and other summary statistics. Biometrics 1998; 54: 323–330.

27.

Dawson

Lagakos

. Size and power of two-sample tests of repeated measures data. Biometrics 1993; 49: 1022–1032.

28.

Zhang

Kowalski

. Power analyses for longitudinal study designs with missing data. Stat Med 2007; 26: 2958–2981.

29.

Jung

Ahn

. Sample size for a two-group comparison of repeated binary measurements using GEE. Stat Med 2005; 24: 2583–2596.

30.

Mehrotra

Liu

. Sample size determination for constrained longitudinal data analysis. Stat Med 2009; 28: 679–699.

31.

Roy

Bhaumik

Aryal

. Sample size determination for hierarchical longitudinal designs with differential attrition rates. Biometrics 2007; 63: 699–707.

32.

Liang

Zeger

. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73: 13–22.

33.

Laird

Ware

. Random effects models for longitudinal data. Biometrics 1992; 38: 963–974.

34.

Rubin

. Inference and missing data. Biometrika 1976; 63: 581–592.

35.

Verbeke

Molenberghs

. Linear mixed models for longitudinal data, New York: Springer-Verlag, 2000.

36.

Hsieh

Bloch

Larsen

. A simple method of sample size calculation for linear and logistic regression. Stat Med 1998; 17: 1623–1634.

37.

Verghese

Mahoney

Ambrose

. Effect of cognitive remediation on gait in sedentary seniors. J Gerontol Biol Sci Med Sci 2010; 65: 1338–1343.

38.

Rolita

Holtzer

Wang

. Homocysteine and mobility decline in older adults. J Am Geriat Soc 2010; 58: 545–550.

39.

Lipton

Katz

Kuslansky

. Screening for dementia by telephone using the memory impairment screen. J Am Geriatr Soc 2003; 51: 1382–1390.

40.

Buschke

. Cued recall in amnesia. J Clin Exp Neuropsychol 1984; 6: 433–440.