Why Replication Probabilities Depend on Prior Probability Distributions

Abstract

Killeen (2005) recommended replacing the p values of significance tests with replication probabilities, which he determined as follows: Assume d _x, the estimate of the effect size δ in study x, is positive and its sampling error, e _x=d _x−δ, is normally distributed with mean 0 and variance σ², that is, N(0, σ²). It follows that the effect size in a replication, d _y= (δ+e _y), can be rewritten as (d _x−e _x+e _y), where e _y is the replication sampling error. Consequently, d _y is distributed as N(d _x, 2σ²), because d _x is a constant and e _x and e _y are independently distributed as N(0, σ²). The replication probability (p _rep), the probability that d _y > 0, can now be found by integrating the distribution of d _y (see Killeen's Equation 1):

p_{rep} \int_{0}^{\infty} N (d_{x}, 2 σ^{2}) (1)

(1)

No reference to δ appears in the calculation; nevertheless, a distribution has implicitly been attributed to δ. Consider the replication probability based on n identical replications yielding effect sizes d _y, d _y, d _y, …, d _y with errors e_y1 , e_y2 , e_y3 , …, e_yn , each independently distributed as N(0, σ²). The combined effect size is the mean of the d _ys (d̄_y), which is distributed as N(δ, σ²/n). An analogous argument leads to the conclusion that

{d̄}_{y} = (δ + ē_{y}) = (d_{x} - e_{x} + Σ e_{y} / n)

is distributed as N(d _x, (1 + 1/n)σ²). Now as n increases, N(d _x, (1 + 1/n)σ²) tends to N(d _x, σ²), and as a result, the uncertainty regarding d̄_y is practically the same as the uncertainty regarding δ for very large n. Killeen's argument implies that following an empirical study yielding an effect size of d _x, δ is distributed approximately as N(d _x, σ²), which is what Fisher (1956/1973) called a fiducial probability distribution.

WHAT IS WRONG WITH FIDUCIAL PROBABILITIES?

In any attempt to compute replication probabilities, information concerning the population effect size distribution needs to be taken into account. Imagine a researcher who tested only true null hypotheses (e.g., randomized control trials testing the existence of supposed supernatural phenomena that did not exist). In this case, given standard assumptions and discounting effect sizes of 0, the replication probabilities would equal .5. Admittedly, this example is an idealization, but studies differ in what is known about the effects they test, and replication probabilities should reflect this. Indeed, any simulation of replication probabilities has to incorporate a distribution for δ. Killeen's approach ignores such distributions and as a result seems appropriate only when there is no relevant prior information.

Fisher's fiducial argument was similarly based on taking prior information to be irrelevant, but it has not stood the test of time (Seidenfeld, 1992; Zabell, 1992). The problem with fiducial inference is that the probability of δ depends on how the prior ignorance of δ is characterized. Seidenfeld associated the problem with representing ignorance of a parameter by a uniform distribution, and Lindley (1958) argued that in the one-parameter case, the fiducial argument is either inconsistent or equivalent to a Bayesian argument with a uniform prior. However, when a uniform prior is assumed, ignorance of δ and ignorance of any one-to-one function of δ result in different prior probability distributions despite the ignorance being the same in each case. In addition, unbounded uniform priors are not proper probability distributions, as their integrals over all possible values do not equal 1. Indeed, the posterior distribution N(d _x, σ²) cannot be obtained using any genuine prior probability distribution (O'Hagan, 1994, p. 77). O'Hagan recommended that the use of improper priors be restricted to situations in which all “reasonable” priors would result in posterior distributions that did not differ greatly from each other (p. 111).

Returning to Killeen's argument, it seems to result in the following paradox. Before d _x is observed, there can be information concerning the likely values of δ and d _y, yet after d _x is observed, such prior information is irrelevant because δ is distributed as N(d _x, σ²) and d _y is distributed as N(d _x, 2σ²). In the case of d _y, this follows because, as stated earlier, once d _x is known, it is assumed that d _y=d _x−e _x+e _y, where d _x is a constant and both error terms are independently distributed as N(0, σ²). However, the previously mentioned example concerning the investigator of supernatural phenomena contradicts this characterization of the uncertainty after d has been observed. Here the prior distribution of δ is δ= 0 with p= 1, and if d _x is found to be c, then e _x also equals c with a probability of 1. Thus, e _x is not distributed as N(0, σ²), as Killeen supposed. The point is that once d _x is known, the uncertainty regarding e _x should be given by its probability distribution conditioned on the observed value of d _x. This conditioning process incorporates the prior distribution of δ into the calculation of both the distribution of replication probabilities and the posterior probability distribution of δ.

CONCLUSIONS

As the adage has it, “when something looks too good to be true, it probably is.” If it were possible to compute replication probabilities ignoring everything but the data, then these probabilities would be “objective” and could form the basis of an alternative to significance tests. Unfortunately, this is not possible. To find a replication probability from d _x, one needs to know both the probability law generating d _x and the prior probability distributions of the law's parameters. A problem exists because complete ignorance of a parameter does not seem equivalent to any prior, but this is just an example of the more general principles that probability models only approximate people's uncertainties (Macdonald, 2002) and that statistical inferences are not entirely based on logic (Macdonald, 2004). Seidenfeld (1992) summarized the case against fiducial probabilities by quoting Savage (1961, p. 578), who said that to make the Bayesian omelet, you have to break the Bayesian eggs.

References

Fisher

R.A.

(1973). Statistical methods and scientific inference ( 3rd ed.). New York: Hafner. (Original work published 1956)

Killeen

P.R.

(2005). An alternative to null-hypothesis significance tests. Psychological Science, 16, 345–353.

Lindley

D.V.

(1958). Fiducial distributions and Bayes theorem. Journal of the Royal Statistical Society, B, 20, 102–110.

Macdonald

R.R.

(2002). The incompleteness of probability models and the resultant implications for theories of statistical inference. Understanding Statistics, 1, 167–189.

Macdonald

R.R.

(2004). Statistical inference and Aristotle's Rhetoric. British Journal of Mathematical and Statistical Psychology, 57, 193–203.

O'Hagan

(1994). Kendall's advanced theory of statistics: Vol. 2. Bayesian inference. London: Edward Arnold.

Savage

L.J.

(1961). The foundations of statistics reconsidered. In Neyman

(Ed.), Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 575–585). Berkeley: University of California Press.

Seidenfeld

(1992). R. A. Fisher's fiducial argument and Bayes theorem. Statistical Science, 7, 358–368.

Zabell

S.L.

(1992). R. A. Fisher and the fiducial argument. Statistical Science, 7, 369–387.