Detecting and correcting for publication bias in meta-analysis – A truncated normal distribution approach

Abstract

Publication bias can significantly limit the validity of meta-analysis when trying to draw conclusion about a research question from independent studies. Most research on detection and correction for publication bias in meta-analysis focus mainly on funnel plot-based methodologies or selection models. In this paper, we formulate publication bias as a truncated distribution problem, and propose new parametric solutions. We develop methodologies of estimating the underlying overall effect size and the severity of publication bias. We distinguish the two major situations, in which publication bias may be induced by: (1) small effect size or (2) large p-value. We consider both fixed and random effects models, and derive estimators for the overall mean and the truncation proportion. These estimators will be obtained using maximum likelihood estimation and method of moments under fixed- and random-effects models, respectively. We carried out extensive simulation studies to evaluate the performance of our methodology, and to compare with the non-parametric Trim and Fill method based on funnel plot. We find that our methods based on truncated normal distribution perform consistently well, both in detecting and correcting publication bias under various situations.

Keywords

Maximum likelihood meta-analysis method of moments publication bias selection methods Trim and Fill truncated normal distribution

1 Introduction

Meta-analysis is a statistical procedure of combining results from independent studies to answer specific questions, with the hope of drawing reliable conclusions. Since it was first introduced by Glass,¹ the use of meta-analysis techniques has increased tremendously, especially in health and medical research. While combined studies certainly have higher statistical power than individual studies, one major criticism of meta-analysis is that not all relevant studies are published. Investigators tend to ignore studies with less significant results and journals tend to reject publishing them as well.^2–5 The result is called publication bias, also known as the “file drawer” problem because some studies are “tucked away in file drawer”.⁶ Synthesizing results of a non-representative sample of studies lead to biased conclusion, one that favors only significant outcomes. Hence, it is desirable to test whether any bias exists in studies collected for a meta-analysis. If there is publication bias, bias correction needs to be done so as valid conclusion can be reached.

An earlier approach to testing publication bias is “Failsafe N” suggested by Rosenthal.⁶ Failsafe N is defined as the number of additional studies needed so that the significance of an overall test from combined studies can be reduced to non-significance. Rosenthal⁶ argued that if Failsafe N is large (for example, N > 19k, where k is number of studies in the meta-analysis), then results from meta-analysis can be considered reliable (not affected by publication bias), as it is unrealistic to have a large number of unpublished studies. Thus, small Failsafe N is an indication of the presence of publication bias.

Another group of methods, collectively called selection methods,^7–14 utilize a weight function to describe the probability of a specific study being selected in meta-analysis. The choice of weight function depends on the p-value or both the effect size and its standard error from individual study.¹⁵ Although the selection method approach appears to have more realistic assumptions of inclusion and exclusion of studies, it is rather complicated and is rarely used in actual application. The complexity of this approach comes from the arbitrariness in specifying the weight function, the requirement of a large number of studies, and the lack of readily available computer programs.

Alternatively, a funnel plot¹⁶ is a visual tool for detecting publication bias. It is a scatter plot of the effect size from each individual study versus its precision measure, such as standard error or sample size. Studies with small sample sizes tend to produce insignificant results, and thus get omitted, resulting in an asymetrical funnel plot. Thus, the asymmetry of funnel plot maybe an indication of the presence of publication bias, although there may be other reasons for an asymmetrical funnel plot.¹⁷ However, since the funnel plot is not a rigorous statistical test, especially when the number of studies is small, visual detection of asymmetry is not enough to make a firm conclusion. Several formal statistical test methods have been developed for testing the asymmetry of funnel plots. These include the non-parametric rank correlation test by Begg and Mazumdar¹⁸ and the regression-based test by Egger et al.¹⁷ However, simulation analyses have shown that the power of these methods is low unless the level of bias is severe.^18,19

To date, the most popular method is the Trim and Fill (TF) method developed by Duval and Tweedie.^20,21 The TF method is a funnel plot-based, iterative, non-parametric (rank-based) method. It detects publication bias by estimating number of omitted/trimmed studies on one side of the funnel plot. Once these trimmed studies are filled, the funnel plot should become less asymmetrical, and the overall effect can be estimated using both filled and observed study effects. A major criticism of the TF method is that it cannot distinguish publication bias from heterogeneity.^22–24 Also, the TF method only considers publication bias caused by small effect sizes, and does not consider studies omitted due to non-significant results (with large p-values).

In this paper, we propose a new parametric procedure to detect and correct for publication bias. As with all funnel plot-based procedures, we assume that some studies on one side of the funnel plot are omitted. However, instead of testing the symmetry of the funnel plot, we formulate the omitted studies as truncation of a distribution. Montes and Lotyczewski²⁵ considered a similar approach in 2003, but they did not consider variability from one study to another, and simply assumed that the study effects follow a truncated normal distribution. By ignoring the within-study variances, the method essentially assumed that the study variances were equal, which made the estimation much easier. Naturally, however, such an assumption is unrealistic and not applicable in practical meta-analyses. We seek to improve upon their approach and the other meta-analysis methods. We also distinguish between the two truncation mechanisms (small effect size or large test p-value). Our primary interest is to estimate the underlying mean, which is the corrected estimate of the overall effect, free of publication bias. The estimated truncation point can be used to estimate proportion of omitted studies, and thus serves as a measure of severity of truncation and the graveness of publication bias. We consider both fixed- and random-effects models, as the asymmetry of a funnel plot may be due to heterogeneity of studies, not only because of truncation.¹⁵ We use maximum likelihood method for parameter estimation with a fixed-effects model and method of moments with a random-effects model.

2 Modeling publication bias

Consider the meta-analysis that deals with n independent studies. The i-th study has an estimate x_i of a parameter of interest with a corresponding variance estimate $s_{i}^{2}$ of $σ_{i}^{2}, i = 1, \dots, n$ . The goal is to synthesize these n estimated effects to obtain a global mean μ. Here, μ can be log rates, log odds ratios, or any real quantity that is being tested under a one-sided hypothesis.

When assuming no between-study heterogeneity,²⁶ traditional fixed-effects model can be used. The fixed-effects model assumes that all studies in the meta-analysis share a common true effect size, and thus the observed effect size varies from one study to the next only because of the random error inherent in each study. The fixed-effects model is specified as follow

x_{i} = μ + ε_{i}

(1)

where

ε_{i}

is the random error from each individual study, with

var (ε_{i}) = σ_{i}^{2}

and

E (ε_{i}) = 0

, and μ is the common true effect size.

In the above model, the within-study variance $σ_{i}^{2}$ is not observed, but is estimated by $s_{i}^{2}$ , so $σ_{i}^{2}$ is usually assumed known and replaced by $s_{i}^{2}$ in meta-analysis literature, and in this paper, whenever the unknown $σ_{i}^{2}$ is used, we replace it with $s_{i}^{2}$ .

If these n studies constitute a representative sample from the population of all studies, the global mean can be obtained by a weighted average of the n estimates and we refer to it as the uncorrected mean

{\hat{μ}}_{uc} = \frac{\sum s_{i}^{- 2} x_{i}}{\sum s_{i}^{- 2}}

(2)

The uncorrected estimator (2) can be obtained using maximum likelihood method under certain assumptions, which we shall discuss in the next section.

However, as noted in the Introduction section, it may not be valid to assume that the n studies are a representative sample of the population of all relevant studies. To develop our new methodology, we first note that most studies, if not all, assume an aymptotic normal distribution of the estimate x_i when obtaining its p-value and also its confidence interval. Hence, we begin by assuming that

x_{i} \sim N (μ, σ_{i}^{2})

(3)

where μ is the global mean, the parameter of interest. The unknown

σ_{i}^{2}

is replaced by its estimate

s_{i}^{2}

Now we consider that some studies are omitted, causing publication bias. When they are omitted due to small effect size, we observe studies only when the individual effect x_i is larger than some unknown value, k. Hence, we observe { $x_{i} | x_{i} \geq k$ }, which results in a truncated normal distribution under our assumption. Similarly, when studies are omitted due to large p-values, it occurs to those with $(x_{i} / σ_{i}) < t$ for some unknown value t under the null model $H_{0} : μ = 0$ . Again, we observe studies from a truncated normal distribution with { $x_{i} | x_{i} / σ_{i} \geq t$ }.

Note that our choice of an unknown cut-off point k (or t) is essentially the same as the “suppressed Bernoulli model” assumption in the TF method, as “the suppression has taken place in such a way that it is the k₀ values of the Y_j with the most extreme left-most values that have been suppressed”²¹

Extra variation among study effects (i.e. heterogeneity) can also cause asymmetry in a funnel plot. Random-effects models can effectively deal with heterogeneity in meta-analysis. To incorporate the heterogeneity among studies, we consider that the effects are distributed as

x_{i} = μ_{i} + ε_{i}

(4)

where μ_i is the unknown effect size for the i-th study, with

μ_{i} \sim N (μ, τ^{2})

and

τ^{2}

is the extra variation among the studies. Therefore, equation (3) becomes

x_{i} \sim AN (μ, τ^{2} + σ_{i}^{2})

(5)

under the random-effects model, where “AN” stands for “asymptotically normally distributed”. The uncorrected estimate for the overall mean is obtained similarly as in equation (2) with

s_{i}^{- 2}

replaced by

({\hat{τ}}^{2} + s_{i}^{2})^{- 1}

, where

{\hat{τ}}^{2}

is the estimate of

τ^{2}

3 Parameter estimation

We derive likelihood functions for the two truncation mechanisms, from which we estimate the overall effect and the proportion of omitted studies.

3.1 Truncated by effect size

3.1.1 Estimation under fixed-effects model

If truncated by effect size, study i is not included in the meta-analysis when $x_{i} < k$ . Then under model (3), the probability of truncating i-th study is

p_{i} = P (x_{i} < k) = P (\frac{x_{i} - μ}{σ_{i}} < \frac{k - μ}{σ_{i}}) = Φ (\frac{k - μ}{σ_{i}})

(6)

for given k and μ, where

Φ ()

is the cumulative distribution function of standard normal distribution. Thus, the likelihood of including study i in the meta-analysis is

L_{i} (μ, k) = P (x_{i} | x_{i} > k) = \frac{P (x_{i}) P (x_{i} > k | x_{i})}{P (x_{i} > k)} = \frac{φ (\frac{x_{i} - μ}{σ_{i}})}{[1 - Φ (\frac{k - μ}{σ_{i}})]} I (x_{i} > k)

where

φ ()

is the density function of standard normal distribution. Hence, the likelihood of observing the n studies is

L (μ, k) = Π_{i = 1}^{n} L_{i} = Π_{i = 1}^{n} \frac{φ (\frac{x_{i} - μ}{σ_{i}})}{[1 - Φ (\frac{k - μ}{σ_{i}})]} I (x_{i} > k)

Therefore, maximum likelihood estimators of μ and k can be obtained by maximizing the log-likelihood (with σ_i replaced by s_i)

l (μ, k) = \sum_{i = 1}^{n} log φ (\frac{x_{i} - μ}{s_{i}}) - \sum_{i = 1}^{n} log [1 - Φ (\frac{k - μ}{s_{i}})]

(7)

with the constraint that

x_{i} > k, \forall i

The constraint on the log-likelihood function poses difficulties in the maximization procedure and the statistical inference, as it violates the regularity conditions for the maximum likelihood estimation theory.²⁷ Note that the first term in the right-hand side of equation (7) does not depend on k, while the second term increases as k increases. Therefore, the likelihood increases as k increases, but with the constraint $x_{i} > k, \forall i$ , the estimate of k thus should be

\hat{k} = {min}_{i} x_{i}

(8)

Appendix 1 proves that this estimator is asymptotically unbiased and consistent for true k. Now, we obtain

{\hat{μ}}_{FE} = {max}_{μ} l (μ | \hat{k}) = {max}_{μ} {\sum_{i = 1}^{n} log φ (\frac{x_{i} - μ}{s_{i}}) - \sum_{i = 1}^{n} log [1 - Φ (\frac{\hat{k} - μ}{s_{i}})]}

(9)

as a function of

\hat{k}

Given $k = \hat{k}$ , we have

\frac{\partial l (μ | \hat{k})}{\partial μ} = \sum_{i = 1}^{n} \frac{x_{i} - μ}{s_{i}^{2}} - \sum_{i = 1}^{n} \frac{φ (\frac{\hat{k} - μ}{s_{i}}) \frac{1}{s_{i}}}{1 - Φ (\frac{\hat{k} - μ}{s_{i}})}

By letting $\partial l (μ | \hat{k}) / \partial μ = 0$ , we get the maximum likelihood estimator of μ in equation (9). Note that if $\hat{k} = - \infty$ , that is, no truncation at all, then the second term is 0, and we have the estimator of μ as $\hat{μ} = (\sum x_{i} s_{i}^{- 2}) / (\sum s_{i}^{- 2})$ , which is the uncorrected estimator of the overall effect size, given in equation (2). Thus, the second term is the penalty on the likelihood function caused by truncation. We use the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization method to find the estimate of μ in equation (9).

An estimated $\hat{k}$ in equation (8) can be used to construct an indicator of the amount of truncation, denoted as p. Since ${\hat{p}}_{i} = P (x_{i} < \hat{k})$ is the estimated probability of study i being truncated, the sum of the truncation probabilities over all n observed studies is the estimated number of studies being truncated. Denote the estimated number of omitted studies as ${\hat{n}}_{{\hat{μ}}_{FE}, \hat{k}}$

{\hat{n}}_{{\hat{μ}}_{FE}, \hat{k}} = \sum_{i = 1}^{n} {\hat{p}}_{i} = \sum_{i = 1}^{n} P (x_{i} < \hat{k}) = \sum_{i = 1}^{n} Φ (\frac{\hat{k} - {\hat{μ}}_{FE}}{s_{i}})

Then the estimated proportion of omitted studies can be defined as

{\hat{p}}_{{\hat{μ}}_{FE}, \hat{k}} = \frac{{\hat{n}}_{{\hat{μ}}_{FE}, \hat{k}}}{n}

(10)

Traditional maximum likelihood theory is not applicable for statistical inferences and the construction of confidence intervals for ${\hat{μ}}_{FE}$ and ${\hat{p}}_{{\hat{μ}}_{FE}, \hat{k}}$ due to the constraint on k, which makes $\hat{k}$ lie on the boundary of its possible values. Instead, we use the bootstrap method. Here is a brief description of how we applied the bootstrap procedure. For the given effect size (x_i) and standard error (s_i) for individual studies, we draw B bootstrap samples (simple random sampling with replacement), and from the b-th bootstrap sample, we obtain estimates of k, μ and p, denote as ${\hat{k}}^{b}, {\hat{μ}}^{b}$ and ${\hat{p}}^{b}$ , for $b = 1, 2, \dots, B$ . Then we obtain the variance (denote as ${\hat{V}}_{\hat{μ}}^{boot}$ ) of ${\hat{μ}}^{1}, {\hat{μ}}^{2}, \dots, {\hat{μ}}^{B}$ , which is the bootstrap estimator of the variance of ${\hat{μ}}_{FE}$ . Note the calculation of ${\hat{μ}}^{b}$ depends on ${\hat{k}}^{b}$ , which differs from different bootstrap samples. The $1 - α$ confidence interval of μ can be obtained as ${\hat{μ}}_{FE} \pm z_{1 - α / 2} \sqrt{{\hat{V}}_{\hat{μ}}^{boot}}$ , where $z_{1 - α / 2}$ is the $1 - α / 2$ percentile of the standard normal distribution ( $z_{0.975} \approx 1.96$ ). The variance and the confidence interval for the proportion of truncation ${\hat{p}}_{{\hat{μ}}_{FE}, \hat{k}}$ can be obtained similarly, using ${\hat{p}}^{1}, {\hat{p}}^{2}, \dots, {\hat{p}}^{B}$ .

3.1.2 Estimation under a random-effects model

Under a random-effects model with truncation of studies by effect size ( $x_{i} < k$ ), the probability of truncation (6) for the i-th study becomes

p_{i} = P (x_{i} < k) = P (\frac{x_{i} - μ}{\sqrt{τ^{2} + s_{i}^{2}}} < \frac{k - μ}{\sqrt{τ^{2} + s_{i}^{2}}}) = Φ (\frac{k - μ}{\sqrt{τ^{2} + s_{i}^{2}}})

(11)

Compared to the truncation probability under a fixed-effects model, we have an extra parameter $τ^{2}$ . We can use the same estimation procedure as with a fixed-models to determine the truncation point k as in equation (8) and then use the BFGS optimization method to find the estimates of μ and τ, solving a similar form for equation (9) by replacing s_i with $\sqrt{{\hat{τ}}^{2} + s_{i}^{2}}$ . But this procedure may produce large negative bias for μ due to the reasons described below. We shall use the method of moments instead.

Note, for any $w_{i} > 0$ , it can be shown that

E [\sum_{i = 1}^{n} w_{i} (x_{i} - μ)] = \sum_{i = 1}^{n} w_{i} \sqrt{τ^{2} + s_{i}^{2}} f (k_{i})

where

f (k_{i}) = \frac{φ (k_{i})}{1 - Φ (k_{i})}

and

k_{i} = (k - μ) / \sqrt{τ^{2} + s_{i}^{2}}

. Thus, the overall mean μ from the random-effects model can be estimated as

{\hat{μ}}_{RE} = \frac{\sum_{i = 1}^{n} w_{i} [x_{i} - \sqrt{τ^{2} + s_{i}^{2}} f (k_{i})]}{\sum_{i = 1}^{n} w_{i}}

(12)

Note that k_i depends on both μ and $τ^{2}$ .

For given $τ^{2}$ , note that when $w_{i} = 1 / (τ^{2} + s_{i}^{2})$ , and k = −∞, it reduces to ${\hat{μ}}_{uc}$ of equation (2), the uncorrected estimator of the overall mean under a random-effects model. With $k < μ$ , as μ decreases, both k_i and $f (k_{i})$ increase, and this causes the updated estimator (12) to decrease. Hence, continuously iterating the steps may produce unstable and very large negative bias estimate for μ. This is the reason that we do not use the maximum likelihood estimator with a random-effects model. We use equation (12) and update μ in only one or two steps. In other words, we estimate μ_RE by first using equation (2) to calculate k_i and then evaluate equation (12), updating it once or twice. This approach produces much better results as indicated by the simulation study. As with fixed-effects model, we can also estimate the proportion of truncation as in equation (10) with s_i and ${\hat{μ}}_{FE}$ replaced by $\sqrt{τ^{2} + s_{i}^{2}}$ and ${\hat{μ}}_{RE}$ , respectively.

In reality, $τ^{2}$ is unknown and must be estimated. Appendix 2 shows detailed derivation for estimating $τ^{2}$ . As k_i depends on $τ^{2}$ , we need to use an iteration procedure to find a stable estimate of $τ^{2}$ . We suggest again to use a two-step approach to estimate $τ^{2}$ . First, we estimate μ and $τ^{2}$ by ignoring truncation, using the un-corrected estimator for μ and the DerSimonian-Laird estimator²⁸ for $τ^{2}$ . Then we update μ and $τ^{2}$ using equations (12) and (23) (in Appendix 2) once or twice. These one- or two-step strategies were suggested by DerSimonian and Kacker²⁹ as well.

3.2 Truncated by p-value

When some studies are truncated by large p-value, we can consider all values

\frac{x_{i}}{s_{i}} < t

are truncated under

H_{0} : μ = 0

. The probability of truncation of the i-th study is

p_{i} = P (\frac{x_{i}}{s_{i}} < t) = P (\frac{x_{i} - μ}{s_{i}} < t - \frac{μ}{s_{i}}) = Φ (t - \frac{μ}{s_{i}})

(13)

and the log-likelihood function is

l (μ, t) = \sum_{i = 1}^{n} log φ (\frac{x_{i} - μ}{s_{i}}) - \sum_{i = 1}^{n} log [1 - Φ (t - \frac{μ}{s_{i}})]

with the constraint that

x_{i} / s_{i} < t, \forall i

Similar to the situation when the truncation is by effect size, we estimate t by

\hat{t} = min \frac{x_{i}}{s_{i}}

(14)

and the estimate of μ is

{\hat{μ}}_{FE} = {max}_{μ} l (μ, \hat{t}) = {max}_{μ} {\sum_{i = 1}^{n} log φ (\frac{x_{i} - μ}{s_{i}}) - \sum_{i = 1}^{n} log [1 - Φ (\hat{t} - \frac{μ}{s_{i}})]}

(15)

under a fixed-effects model. The score function with μ is given by

\frac{\partial l (μ | \hat{t})}{\partial μ} = \sum_{i = 1}^{n} \frac{x_{i} - μ}{s_{i}^{2}} - \sum_{i = 1}^{n} \frac{φ (\hat{t} - \frac{μ}{s_{i}}) \frac{1}{s_{i}}}{1 - Φ (\hat{t} - \frac{μ}{s_{i}})}

The proportion of truncation p can be estimated through

{\hat{n}}_{{\hat{μ}}_{FE}, \hat{t}} = \sum_{i = 1}^{n} {\hat{p}}_{i} = \sum_{i = 1}^{n} P (x_{i} < \hat{t} | {\hat{μ}}_{FE}) = \sum_{i = 1}^{n} Φ (\hat{t} - \frac{\hat{μ}}{s_{i}})

and we find

{\hat{p}}_{{\hat{μ}}_{FE}, \hat{t}} = \frac{{\hat{n}}_{{\hat{μ}}_{FE}, \hat{t}}}{n}

(16)

Confidence intervals for the estimates μ and p can be obtained using the bootstrap procedure, in the same way as described in previous subsection. Also, replacing k_i in equation (12) with $t_{i} = t - μ / \sqrt{s_{i}^{2} + τ^{2}}$ , and using the same one- or two-step iteration approach, we can find the estimate of μ from a random-effects model.

4 Simulation studies

In this section, we investigate the empirical performance of our proposed method. We compare the performance of our method with that of the TF method in terms of the ability to detect and correct the estimates for publication bias.

4.1 Generation of the data for meta-analysis

In this section, we describe how we generate the data $(x_{i}, s_{i}^{2})$ that will be used to estimate the μ and p in meta-analysis. Our data generation and the induction of publication bias are similar to those described in Duval and Tweedie^20,21 and Peters et al.²³ As mentioned before, x_i is an estimate of a parameter of interest from study i based on continuous or binary responses. In a binary response situation, the x_i is the log OR, and the $s_{i}^{2}$ is a function of x_i.

4.1.1 Continuous responses

To conduct simulation studies, we first generate continuous data through the following steps.

Step c1. Generate standard error s_i from a $χ^{2} (1) / 4$ , restricted to $(0.01, 6)$ and effect size x_i from $N (μ, s_{i}^{2})$ for μ = 0 and 0.5.

Step c2. Repeat step c1 n times, with n = 20 to 100, by an increment of 10.

Step c3. Induce publication bias by excluding some studies.

Step c4. Calculate the estimates of the global mean and the proportion of truncation as discussed in previous section

Data from a random-effects model were generated by replacing $s_{i}^{2}$ with $s_{i}^{2} + τ^{2}$ , where $τ^{2}$ ranges from $0, 0.1, 017,$ and 0.32 in step c1. These correspond to I² of heterogeneity by Higgins²⁵ ranging $0, 0.167, 0.372, 0.659$ , respectively. We follow Brockwell and Gordon ³⁰ and Kontopantelis et al.³¹ for the choice of $s_{i}^{2}$ and $τ^{2}$ . Then, using these simulated data, we estimate μ, p, and τ under both fixed-effects and random-effects models.

4.1.2 Binary responses

Step b1. Generate number of events and determine event rates based on the tests of odds ratio for binary outcome data from control (r_c) and treatment (r_t) groups as in Peters et al.²³

Step b2. Generate total number of subjects (n) in each group from the exponential of a normal distribution $N (5, 0.3)$ , which has a median sample size of around 150 subjects in each group.

Step b3. Generate subjects with event from two binomial distributions, $B (n, r_{c})$ and $B (n, r_{t})$ , for the control and treatment group, respectively.

Step b4. Calculate log odds ratio and the corresponding standard error for each study, which are our simulated individual effect sizes x_i and standard errors s_i.

Step b5. Induce publication bias by excluding some studies.

Step b6. Calculate the estimates of the global mean of log odds ratios and the proportion of truncation.

Under random-effects model, before proceeding to Step b2, we would need the following additional steps. First, for a given τ, generate $l_{τ}$ from $N (0, τ^{2})$ . Then multiply OR by a factor of $e^{l_{τ}}$ to control that we have the same level of OR from both random- and fixed-effects models. For $τ = 0, 0.1, 0.2, 0.3$ , the simulated heterogeneity ranges $0, 0.15, 0.42, 0.62$ , respectively

In step b1, the event rate of the control group is allowed to vary for each sample study, generated from a uniform distribution with lower and upper limits of 0.3 and 0.7, respectively. The event rate for the treatment group is then determined according to the relation $OR = {r_{t} / (1 - r_{t})} / {r_{c} / (1 - r_{c})}$ , where OR is the pre-specified odds ratio and r_c and r_t are the event rates from the control and treatment groups, respectively. We consider OR = 1 or 3.

In steps c3 and b5, number of excluded studies is $n' = n * p$ , where n is the total number of studies and p is the proportion of truncation. We induce publication bias by excluding $n'$ studies in two ways: (1) by effect size (the $n'$ studies with smallest x_i values are excluded), (2) by p-value (the $n'$ studies with smallest $x_{i} / s_{i}$ values are excluded). The remaining $n - n'$ studies are our final simulated data.

We also conducted a simulation to verify the general proof for the property of the estimator for the cut-off point for the truncation. Data were generated similarly as the steps c1 and c2, but in step c3, data were excluded based on a fixed cut-off point for the truncation. We used −0.4 and −0.5, which corresponds to studies with OR < 0.61 or <.67 being excluded because of having negative results to the desired study objectives. We calculated the estimates of the bias of the cut-off point and examined their behaviors.

We considered nine sample sizes for the number of studies in a meta-analysis $(20, 30, 40, \dots, 100)$ , and for the proportion of truncation p, we choose values ranging from 0 to 0.4 with an increment of.05. For each combination, we generate 500 sets of data to calculate the empirical mean effect size and the truncation proportion, and also the coverage of the nominal 95% confidence intervals (from B = 200 bootstrap samples for each estimate). We then plot the bias from the means and coverages against sample sizes by proportions of truncation to examine the performance of our method and the TF method. A positive bias indicates an overestimation of the true value. In our simulation, we use the R package meta³² to apply the TF method with the R₀ estimator for the number of trimmed studies, as recommended by Duval and Tweedie.²¹

4.2 Simulation results

We present simulation results using data generated from both random- and fixed-effects models, and compare various estimation strategies discussed in the previous section. With a fixed-effects model (τ = 0), we plot all six estimators for selected truncation proportions $p = 0, . 15$ , and.3. For a random-effects model ( $τ = . 15$ for binary outcome studies), we only report three estimators, as indicated in the following figures. Simulation results based on continuous outcomes are similar to those based on binary outcomes, and thus, we only present results on binary responses.

4.2.1 Truncation point

In Appendix 1, we proved that the estimator for the cut-off point given in equation (8) is asymptotically unbiased and consistent with its mean squared error approaching zero and it was unconditional to where the truncation happened. To numerically verify the proof, we plotted the bias of the estimate in Figure 1 for two cases where the studies are removed due to negative results than expected. In the simulation set up, studies with OR < 0.61 or 0.67 were truncated in the data generated, and we used our estimator to estimate the bias. In Figure 1, it is shown that the bias in the estimate decreases as n increases, approaching zero, as expected from the proof given in Appendix 1. Although the positive bias does not seem to approach zero quickly, it is relatively minor in value even in small samples, and we are satisfied with its consistency.

Figure 1.

Bias in estimation of the cut-off point for data truncation: tau is the between-study standard error. The dashed horizontal line is inserted to indicate a zero bias.

4.2.2 Overall effect size

Figure 2 illustrates the bias of the estimates with studies using binary outcomes. The uncorrected estimator has the largest bias in estimating the overall effect size, as expected. The bias is positive (overestimation), and increases substantially with increasing proportion of truncation. When there is no truncation (p = 0), the uncorrected estimator has a very small positive bias and the bias converges to zero as the number of studies increases, as expected.

Figure 2.

Bias in estimation of the global mean with binary responses: tau is the between-study standard error and P is the truncation proportion. The dashed horizontal line is inserted to indicate a zero bias: UC: uncorrected; ML: the maximum likelihood; TF: the Trim and Fill method; 2-step: the method of moments; RE: a random-effects model; FE: a fixed-effects model.

The TF estimator performs only slightly better than the uncorrected estimator, especially when the truncation proportion is large. In most situations, the TF estimator has a positive bias, but when there is no or small truncation proportion, the bias is negative. It also performs somewhat differently when the truncation is by p-value under a large underlying overall effect size (OR = 3), where the bias is much smaller than that of the uncorrected estimator, and decreases as sample size increases. The TF estimator performs similarly under both fixed and random-effects models.

On the other hand, our estimators produce results with the smallest bias for almost all situations. Under a fixed-effects model, our maximum likelihood estimator ( $ML_FE$ in Figure 2) produces a small negative bias, and the magnitude of the bias decreases as the number of studies increases, but the bias does not seem to be affected by the severity of truncation. Under a random-effects model, our two-step estimator has a small negative bias when the number of studies is small, but the bias increases from negative to positive as the number of studies increases. Under the random-effects model, as the between-study standard error and truncation proportion increase, the bias of the two-step estimator also increases. Nevertheless, it has the smallest bias compared to the uncorrected and the TF estimators.

Figure 3 compares the mean squared errors (MSE) in estimating the global mean with data generated from binary responses. There are no real differences in MSE when there is no truncation of studies. However, in likely situations of various levels of truncation, our methods were shown to have smaller MSE compared to the TF or un-corrected estimators in almost all situations consistently.

Figure 3.

MSE in estimation of the global mean with binary responses. One-step effect and two-step effect refer to the Method of moments updated once and twice, respectively, assuming truncation by small effect size where tau is the between-study standard error and P is the truncation proportion.

Overall, our estimators consistently perform much better than those of the TF and the uncorrected meta-analysis methods in estimating the global effect size.

4.2.3 Percentage of truncation

We now compare the performance of our estimator with the TF method in their ability to determine the severity of publication bias. The bias of the estimated percentages of truncation is presented in Figure 4. Our estimator is calculated as the average truncation probabilities for all included studies. The TF estimator is calculated as $p' = \frac{n_{0}}{n}$ , where n₀ is the number of filled studies and n is the number of studies used in the meta-analysis (including the filled studies).

Figure 4.

Bias in estimation of the truncation proportion with binary responses. See note for Figure 2.

Similarly to the performance of the estimated global mean, the TF estimator produces the largest bias, regardless whether we consider a fixed or a random-effects model and whether the meta-analysis concerns with continuous or binary response. When there is no truncation, the bias is positive, but when there is substantial truncation, it is negative, and it increases in magnitude with the increase of truncation proportion. The bias increases as the sample size increases when the underlying effect size is large (OR = 3). Again, the performance of our estimators is quite consistent in most situations. It has a slightly positive bias (overestimating the severity of truncation), but the bias decreases as the number of studies increases. When there is large between-study variation, the bias decreases and may become negative, but it is still much smaller than the TF estimator in magnitude. When data are from a fixed-effects model, both the maximum likelihood estimator and the two-step estimator perform similarly, but the two-step estimator has a smaller bias than the maximum likelihood estimator. Figure 5 shows the MSE in estimating the truncation proportion. Similar to the pattern observed in Figure 3, our methods consistently had smaller MSE than those using the TF method.

Figure 5.

MSE in estimating the truncation proportion with binary responses. See note for Figure 2. One-step and two-step effects refer to the method of moments updated once and twice, respectively, assuming truncation by small effect size.

4.2.4 Coverage of nominal 95% confidence interval for the global mean

Finally, we compare the coverage probabilities of the two estimation methods. The R package meta does not provide the standard error for the number of filled studies with the TF estimator, and so its 95% coverage probability for estimating the proportion of truncation is not presented here. For our estimators, the confidence interval is obtained using the bootstrap method with B = 200 bootstrap samples for each combination of the simulations. The results based on binary outcomes are plotted in Figure 6.

Figure 6.

Coverage of 95% CI for the global mean with binary responses. See note for Figure 2.

The coverage probability for the global mean in our approach is very close to the nominal 95% level in almost all situations, except when the truncation proportion and the between-study variation are large. Under a fixed-effects model, both the maximum likelihood and two-step estimators are very similar in performance. On the other hand, the coverage probability of the TF estimates is far less than the nominal 95% coverage, even when there is no truncation, and it gets worse as the proportion of truncation gets large and as the number of studies gets large.

In summary, our proposed estimators are shown to outperform the TF method in detecting and correcting for the bias of estimating the effect size and the truncation proportion whether the truncation is due to small effect size or large p-value. It outperforms on the coverage probability as well. Although not shown (available upon request), our approach performs consistently better in estimating τ than either the TF method or the uncorrected approach. We also examined their performance in relatively small samples. At n = 10 with p = .4, the meta-analysis would involve just six studies. As depicted in Figures 3 and 5, while the bias and therefore the MSE are higher than when the sample size is larger, this issue is shown to be common to all methods. However, the general pattern remains and ours outperform the other methods. We recommend the two-step method to be the best overall.

4.3 Comparing with selection methods

Note that both our proposed method and the TF method assume the “Suppressed Bernoulli Model” for the induction of publication bias. Unlike the selection methods approach, this assumption may seem infeasible or too strong in real applications, as it can be viewed as using a weight of 1 for larger effect size (smaller p-value) and 0 for smaller effect size (larger p-value). However, when compared to the selection methods, our approach requires fewer number of studies and is easier to implement.

To examine and compare the empirical performance of our approach against the selection method approach, we also conduct simulations by including randomly selected studies back in the meta-analysis, which were initially excluded. This effectively provides us with sensitivity analysis results for using the truncation points in equations (8) and (14). We then apply our approach, the TF method, and the Dear–Begg⁷ selection method.³³ The results were similar to those shown here: not only that our method still out-perform the TF method in terms of the estimation bias of the underlying overall effect size and the percentage of publication bias, but also when compared to the Selection method, our method has smaller estimation bias for the overall effect size, especially when there is a severe truncation. These results are omitted here, but available upon request.

5 Meta-analysis of passive smoking studies

We now apply our methods to assess and analyze possible publication bias and correct the estimation of overall effect using the data set from Hackshaw et al.³⁴

Hackshaw et al.³⁴ conducted a meta-analysis of 37 studies (4 cohort studies and 33 case-control studies), comparing the risk of lung cancer between two groups of women. One group consists of non-smoking women living with currently smoking spouses, and the other consists of non-smoking women living with spouses who have never smoked. The goal was to estimate the risk of lung cancer in non-smokers who were exposed to second-hand (passive) tobacco smoking. This data set, which has been re-analyzed by many other investigators since its publication,^35–37 has generated heated debate about a possible publication bias unaccounted for.

Figure 7 is the funnel plot of the data set displayed by their relative risks against their standard error in a log scale. The funnel plot indicates obvious asymmetry. However, a visual inspection does not provide a definite conclusion. The original authors³⁴ concluded that there is no evidence of publication bias based on the Failsafe N idea. On the other hand, Copas and Shi³⁵ examined the relationship between the relative risks and their corresponding uncertainty measures and concluded that publication bias could not be exempted.

Figure 7.

Funnel plot for the passive smoking meta-analysis.

Table 1 lists the estimates and 95% bootstrap confidence intervals for the overall effect and the proportion of truncation using various methods.

Table 1.

Parameter estimates and their 95% CI for the passive smoking data.

Model	Estimation method		μ	p	τ
Fixed effects	Uncorrected		0.186 (0.114, 0.258)	–	–
	Trim and Fill		0.156 (0.085, 0.227)	0.160	–
	Truncated normal	Effect size	0.168 (0.061, 0.275)	0.082 (0.051, 0.144)	–
		p-value	0.183 (0.084, 0.277)	0.007 (0.005, 0.124)	–
Random effects	Uncorrected		0.213 (0.122, 0.305)	–	0.129
	Trim and Fill		0.173 (0.078, 0.268)	0.160	0.161
	Truncated normal	Effect size	0.181 (0.081, 0.282)	0.089 (0.028, 0.151)	0.135
		p-value	0.184 (0.092, 0.275)	0.013 (−0.057, 0.082)	0.132

For the overall effect (i.e. relative risk) estimation, all methods produce significant results under both fixed and random-effects models, with the uncorrected estimate being the largest, and the TF method being the smallest. The TF method fills seven studies, and thus the truncation proportion is $7 / (7 + 37) \times 100 = 16 %$ . Our method assuming truncation by effect size produces the overall effect size of 0.168 under a fixed-effects model and 0.181 under a random-effects model, and when assuming truncation by p-value, the estimates are similar (0.183 and 0.184). In terms of the severity of publication bias, our method assuming truncation induced by effect size shows less publication bias than the TF method, as only about $37 \times 0.082 \approx 3$ studies are estimated to have been excluded. But it indicates no publication bias at all when assuming truncation due to a large p-value, as it estimates less than 1 study to have been omitted.

In estimating the between-study standard error under random-effects model, all methods produce estimates ranging from 0.13 to 0.16. Our approach also gives the bootstrap confidence intervals as (.018, .251) and (.016, .247) for truncation by effect size and p-value, respectively. The estimates of the truncation proportions are very similar to the estimates under fixed-effects model. From the results of our method under random-effects model assumption, it shows that there may be significant heterogeneity among the studies, and the asymmetry of funnel plot may has been caused by heterogeneity, rather than truncation of some studies leading to publication bias. But because the TF method cannot distinguish heterogeneity from asymmetry, even though it produces large between-study standard error ( $τ = 0.161$ ), it also produces large proportion of truncation value (p = 0.16), and over-corrects the estimate of overall effect size.

6 Discussion

By modeling publication bias as truncation to a normal distribution, we have developed a new parametric approach to deal with publication bias in meta-analysis. Simulation study results indicate that our approach has the following desirable properties: It combines the detection and correction for publication bias; It has a smaller bias; It can handle publication bias induced by the two main mechanisms, by small effect size and by large p-value; and it works well under various levels of severity of truncation. Furthermore, with random-effects model, our approach can distinguish heterogeneity from publication bias.

Our model includes two primary parameters, the underlying overall effect size μ and the truncation point k or t. The estimated truncation point k or t can be used to estimate p for determining the percentage of truncation. This approach is different from methodologies based on a funnel plot, such as Begg and Mazumdar’s¹⁸ rank correlation test and Egger’s regression test,¹⁷ where the focus is on detecting the asymmetry of the funnel plot. These methodologies do not provide an estimate of the severity of publication bias, nor do they provide for correction of the bias in the overall effect size estimation. Gleser and Olkin³⁸ used the distribution of p-values from individual studies to estimate the number of unpublished studies, but they do not provide estimates of effect sizes for unpublished studies, and thus cannot provide a correction based on the number of unpublished studies alone.

Our approach is similar to the TF method in that it estimates the number of unpublished studies and corrects the estimate of the overall effects for publication bias. However, as shown in our simulation study and previous simulation studies,²³ the performance of the TF method is less than satisfactory, as it severely underestimates the number of unpublished studies with a large bias in estimating the overall effect, especially when the truncation proportion is severe and/or large heterogeneity among studies. In these situations, the TF method is only slightly better than the uncorrected estimator. Further, unlike selection model approaches, ours does not require an impractically large number of studies, and does not rely on an arbitrary choice of weight functions.

A novel feature of our approach is that we distinguish between the two main causes of publication bias (a small effect size and a large p-value) and develop maximum likelihood functions for respective truncation mechanisms. Simulation results show that our approach works well under both truncation mechanisms. Funnel plot-based methods do not consider the difference between these two mechanisms, and focus on effect size only (i.e. TF). In real application, truncation and thus publication bias may be a combined effect of both a small effect size and a large p-value, and thus it might be desirable to develop a method that can handle both mechanisms at the same time. We provided methods for dealing with these two mechanisms, and researchers can determine which mechanism best fit the data. One could apply both and assess the results accordingly.

Simulation results also indicate that our truncated normal distribution approach works consistently well whether there is no, mild or severe truncation. The bias in estimation for both the overall effect size and the proportion of truncation is rather similar regardless of the level of truncation. On the other hand, both the uncorrected and the TF estimators are severely affected by the level of truncation: the magnitude of bias increases with the severity of truncation.

It is well recognized that heterogeneity is another source of asymmetry in funnel plots.^15,17,23 The TF method cannot distinguish heterogeneity from publication bias. Thus, even when there is no truncation, it still fills some studies. This explains the negative bias in the estimated overall effect size and the positive bias in the estimated truncation proportion by the TF method in Figures 1 and 2 for $τ = 0.15$ and P = 0. On the other hand, our approach with a random-effects model can separate the heterogeneity from a publication bias, when either the heterogeneity or a publication bias is not too high. The difficulty is when there are both substantial heterogeneity and publication bias (see, for example, $τ = 0.15$ and P = 0.3 in Figures 1 and 2). The combined effect of these two can make identifying the real cause of the asymmetry rather difficult. Although our methods provided an improvement over the existing approaches for such a situation, further research is needed.

Finally, it is worth further noting that both our method and the TF method have the same assumption that suppression happens with the most extreme left-most values, and use the same normal distribution assumption in obtaining the underlying overall effect size. The only difference lies in the way of determining the cut-off point or the number of suppressed studies. The TF method uses non-parametric (rank) approach, which ignores the distributional property of the effect sizes, even though this distributional property is utilized in the calculation of overall effect size. Our method estimates the cut-off point and utilizes the same distributional property of the effect sizes in both determining the cut-off point and calculating the overall effect size. A consequence of the lack of distributional assumption of the TF method is that when the standard errors of effect sizes are very close, or in the extreme case that all the standard errors are equal, no matter how many studies are suppressed on the left, the funnel plot will always be symmetric. Thus, the TF method will not be able to detect any publication bias at all. On the other hand, the distribution of the remaining effect sizes will be skewed to the right, and thus our method can still detect the existence of publication bias. While the new strategy we developed was shown to improve the existing current methods in all aspects we considered, further work is needed to improve the limitations we observed, especially with meta-analyses using small samples.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded in part by a grant from the Natural Sciences and Engineering Council of Canada.

References

Glass

. Primary, secondary, and meta-analysis of research. Edu Res 1976; 5: 3–8.

Dickersin

Chan

Chalmers

et al.

Publication bias and clinical trials. Controll Clin Trials 1987; 8: 343–353.

Greenwald

. Consequences of prejudice against the null hypothesis. Psychol Bull 1975; 82: 1–20.

Coursol

Wagner

. Effect of positive findings on submission and acceptance rates: A note on meta-analysis bias. Prof Psychol Res Pract 1986; 17: 136–137.

Sommer

. The file drawer effect and publication rates in menstrual cycle research. File Draw Effect Publicat Rates Mens Cycle Res 1987; 11: 233–242.

Rosenthal

. The “file-drawer problem” and tolerance for null results. Psychol Bull 1979; 86: 638–641.

Dear

KBG

Begg

. An approach for assessing publication bias prior to performing a meta-analysis. Stat Sci 1992; 7: 237–245.

Copas

. What works? Selectivity models and meta-analysis. J Royal Stat Assoc Ser A 1999; 162: 95–109.

Copas

Jackson

. A bound for publication bias based on the fraction of unpublished studies. Biometrics 2004; 60: 146–153.

10.

Hedges

. Estimation of effect size under nonrandom sampling: The effects of censoring studies yielding statistically insignificant mean differences. J Edu Stat 1984; 9: 61–85.

11.

Hedges

. Modeling publication selection effects in meta-analysis. Stat Sci 1992; 7: 246–255.

12.

Iyengar

Greenhouse

. Selection models and the file drawer problem. Stat Sci 1988; 3: 109–135.

13.

Copas

Shi

. A sensitivity analysis for publication bias in systematic reviews. Stat MethMed Res 2001; 10: 251–265.

14.

Copas

. A likelihood-based sensitivity analysis for publication bias in meta-analysis. Journal of the Royal Statistical Society, Series C 2013; 62: 47–66.

15.

Rothstein

Sutton

Borenstein

. Publication bias in meta-analysis: Prevention, assessment and adjustments, Chichester, UK: John Wiley & Sons Inc, 2005.

16.

Light

Pillemer

. Summing up: The science of reviewing research, Cambridge: Harvard University Press, 1984.

17.

Egger

Smith

Schneider

et al.

Bias in meta-analysis detected by a simple graphical test. Br Med J 1997; 315: 629–634.

18.

Begg

Mazumdar

. Operating characteristics of a rank correlation test for publication bias. Biometrics 1994; 50: 1088–1101.

19.

Sterne

JAC

Gavaghan

Egger

. Publication and related bias in meta-analysis: Power of statistical tests and prevalence in the literature. J Clin Epidemiol 2000; 53: 1119–1129.

20.

Duval

Tweedie

. A non-parametric “trim and fill” method of accounting for publication bias in meta-analysis. Journal of the American Statistical Association 2000; 95: 89–98.

21.

Duval

Tweedie

. Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics 2000; 56: 455–463.

22.

Terrin

Schmid

Lau

et al.

Adjusting for publication bias in the presence of heterogeneity. Stat Med 2003; 22: 2113–2126.

23.

Peters

Sutton

Jones

et al.

Performance of the trim and fill method in the presence of publication bias and between-study heterogeneity. Stat Med 2007; 26: 4544–4562.

24.

Peters

Sutton

Jones

Abrams

. Assessing publication bias in meta-analyses in the presence of between-study heterogeneity. Journal of the Royal Statistical Society 2010; 173: 575–591.

25.

Montes

Lotyczewski

. Correcting publication bias in meta-analysis: A truncation approach. J Modern Appl Stat Meth 2003; 2: 433–442.

26.

Higgins

JPT

Thompson

. Quantifying heterogeneity in a meta-analysis. Stat Med 2002; 327: 557–560.

27.

Lehmann

Casella

. Theory of point estimation, 2nd ed. New York: Springer-Verlag, Inc, 1988.

28.

DerSimonian

Laird

. Meta-analysis in clinical trials. ControllClini Trials 1986; 7: 177–188.

29.

DerSimonian

Kacker

. Random-effects model for meta-analysis in clinical trials: An update. Contem Clin Trials 2007; 28: 105–114.

30.

Brockwell

Gordon

. A comparison of statistical methods for meta-analysis. Stat Med 2001; 20: 825–840.

31.

Kontopantelis

Springate

Reeves

. A re-analysis of the Cochrane library data: The dangers of unobserved heterogeneity in meta-analyses. PLoS One 2013; 8: 1–14.

32.

Schwarzer

. Meta: Meta-analysis with R. R package version 2.1-1 2012; 7: 40–45.

33.

Rufibach

. Selection models with monotone weight functions in meta-analysis. Biometrical Journal 2011; 53: 689–704.

34.

Hackshaw

Law

Wald

. The accumulated evidence on lung cancer and environmental tobacco smoke. BrMed J 1997; 315: 980–988.

35.

Copas

Shi

. Reanalysis of epidemiological evidence on lung cancer and passive smoking. BrMed J 2000; 320: 417–418.

36.

Henmi

Copas

Eguchi

. Confidence intervals and p-values for meta-analysis with publication bias. Biometrics 2007; 63: 475–482.

37.

Givens

Smith

Tweedie

. Publication bias in meta-analysis: A bayesian data-augmentation approach to account for issues exemplified in the passive smoking debate. Stat Sci 1997; 12: 221–250.

38.

Gleser

Olkin

. Models for estimating the number of unpublished studies. Stat Med 1996; 15: 2493–2507.