Relational and Arelational Confidence Intervals

Abstract

In their recent article, Fidler, Thomason, Cumming, Finch, and Leeman (2004) noted that although researchers often use confidence intervals (CIs) in presenting data, they do not refer to CIs for drawing inferences. As the title of the article indicates, Fidler et al. consider this lack of reference to CIs in inference to be undesirable, even thoughtless. We offer an alternative interpretation. Unlike conventional tests, CIs, as commonly used, do not provide a direct comparison of groups or conditions. Consequently, researchers are acting thoughtfully by using CIs as descriptive indices rather than inferential ones.

CIs are typically drawn around sample means. These CIs are arelational in that each provides information about only a single group or condition. There are many advantages to arelational CIs: They provide a rough guide to variability in data, a coarse view of the replicability of patterns, and a quick check of the heterogeneity of variance. Arelational CIs, however, do not reflect between-groups information and cannot be used for direct comparisons.

A number of researchers have advocated the use of alternative CIs that do represent group or condition differences. For example, Tryon (2001) discussed how CIs can be constructed such that the presence or absence of overlap reflects hypothesis-test outcomes. Masson and Loftus (2003) discussed CIs for main effects and interaction contrasts. Tukey (1977; see also Velleman & Hoaglin, 1981) recommended Tukey HSD-derived CIs for comparing means and medians in one-way designs. Thompson (2002) recommended reporting and drawing CIs around meta-analytic measures such as Cohen's d or the proportion of variance explained. CIs in this class are termed relational, to describe their functionality. The advantage of relational CIs is that they may provide directly interpretable information about group or condition differences. The most salient disadvantage is the lack of standard convention for constructing and describing these intervals.

To illustrate both types of CIs, we adapt here a well-known example from Hays (1994, p. 570, Table 13.21.2). The data are presented in Table 1, and condition means are plotted in the left panel of Figure 1. The dependent variable was the completion time for jigsaw puzzles. The independent variables were the overall shape of the puzzle (round vs. square) and the color scheme (monochromatic vs. colored). Each participant completed four puzzles, one in each condition. Repeated measures analysis of variance (ANOVA) reveals significant main effects of color scheme, F(1, 11) = 13.9, p<.05, Cohen's d = 1.08, and shape, F(1, 11) = 7.5, p<.05, Cohen's d = 0.79, but not a significant interaction, F(1, 11) ≈ 0, p ≈ 1, Cohen's d = 0. In Figure 1, for each bar in the plot we have drawn three different CIs as follows: Error Bar A is the 95% CI for the condition mean as calculated in Table 1. This CI includes variability across participants. In situations in which it is a nuisance, it may be removed by subtracting from each score the participant's mean (Loftus & Masson, 1994; Masson & Loftus, 2003). Error Bar B is the 95% CI after this normalization; it does not reflect the overall variability across participants. Error Bar C is the 95% normalized CI from a pooled variance estimate.¹ The disadvantage of Error Bar C is that heterogeneity of variance across groups is not graphically displayed. These three error bars are arelational; they do not reflect any particular comparison. Relational CIs can be constructed for specific contrasts, as shown in the right panel of Figure 1. This graph displays Cohen's d effect size for both main effects and the interaction.²

Fig. 1.

Example of arelational and relational confidence intervals (CIs) adapted from Hays (1994). The left panel shows puzzle completion times with arelational CIs. For each condition, the error bars, from left to right, are as follows: Error Bar A graphs the CI from variance within the condition, Error Bar B graphs the CI after participant effects are removed, and Error Bar C graphs the CI from pooled within-participants variance (see footnote 1). The right panel shows relational CIs around Cohen's d effect-size measures for planned contrasts.

TABLE 1

Puzzle Completion Time as a Function of Puzzle Shape and Color Scheme


	Monochromatic		Colored
Participant	Round	Square	Round	Square	Participant's mean

1	41	40	41	37	39.75
2	57	56	56	53	55.50
3	52	53	53	50	52.00
4	49	47	47	47	47.50
5	47	48	48	47	47.50
6	37	34	35	36	35.50
7	47	50	47	46	47.50
8	41	40	38	40	39.75
9	48	47	49	45	47.25
10	37	35	36	35	35.75
11	32	31	31	33	31.75
12	47	42	42	42	43.25
Condition mean	44.6	43.6	43.6	42.6
Condition SE	2.05	2.27	2.21	1.85

Relational error bars are often burdensome to implement and describe. For some purposes, such as theory testing, they do not provide more information than the ANOVAs. Researchers who do not wish to draw attention away from their substantive points may be better served by referring to formal tests than by referring to relational CIs. An additional problem with relational CIs is that there has not been sufficient statistical development of them. Current development, while growing, applies to a limited number of designs (Fidler & Thompson, 2001; Masson & Loftus, 2003). There are few discussions of the coverage probabilities and biases of relational CIs (cf. Algina & Keselman, 2003). The unique advantage of relational error bars, however, is that they provide a graphic index of the replicability of effect sizes, which is not available in either arelational error bars or formal tests.

We agree with Fidler et al. that (a) CIs provide valuable information about variability in the data, and (b) there must be continuing discourse over their proper use and meaning. On balance, we prefer the parsimony of arelational CIs: They provide for a rough guide to the variability in data and a quick check of the heterogeneity of variance. Because arelational CIs may not be sufficient for comparative purposes, statistical tests are often necessary. In such cases, the statistical test provides more precise information about the comparison, and should be the main focus in discussion. Researchers who rely on relational CIs need to carefully document their construction, as well as provide some justification for beliefs about their coverage probabilities.

Footnotes

Acknowledgements

We thank Fiona Fidler and Bruce Thompson for helpful comments. This research is supported by National Science Foundation Grant SES-0095919 to J. Rouder, D. Sun, and P. Speckman.

1

Following , we calculated variance estimates in the construction of Error Bar C as follows: $M S_{S \times A B} = \frac{S S_{S \times A} + S S_{S \times B} + S S_{S \times A \times B}}{d f_{S \times A} + d f_{S \times B} + d f_{S \times A \times B}},$ where SS refers to sum of squares, df refers to the degrees of freedom, A and B refer to fixed factors (shape and color in the example), and S refers to the random participant factor.

2

We followed Cumming and Finch's (2001) recommendations for iterating bounds with the noncentral t distribution. For the main effect of color, the appropriate estimate of residual variance is the mean squares for the color-by-participant interaction. Likewise, the appropriate estimates for the main effect of shape and the shape-by-color interaction are the mean squares for the shape-by-participant interaction and the three-way interaction, respectively. We tested our construction for equal-tail 95% CIs through Monte Carlo simulation for both main effects and found a small bias toward overcoverage (coverage of .958). Details of the construction and simulation may be found on the Web at .

References

Algina

Keselman

H.J.

(2003). Approximate confidence intervals for effect sizes. Educational and Psychological Measurement, 63, 537–553.10.1177/0013164403256358

Cumming

Finch

(2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532–574.

Fidler

Thomason

Cumming

Finch

Leeman

(2004). Editors can lead researchers to confidence intervals, but can't make them think: Statistical reform lessons from medicine. Psychological Science, 12, 119–126.10.1111/j.0963-7214.2004.01502008.x

Fidler

Thompson

(2001). Computing correct confidence intervals for ANOVA fixed- and random-effects effect sizes. Educational and Psychological Measurement, 61, 575–604.10.1177/00131640121971383

Hays

W.L.

(1994). Statistics (5th ed.). Ft. Worth, TX: Harcourt Brace.

Loftus

G.R.

Masson

M.E.J.

(1994). Using confidence intervals in within-subject designs. Psychonomic Bulletin & Review, 1, 476–490.

Masson

E.J.

Loftus

G.R.

(2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology, 57, 203–220.14596478

Thompson

(2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31 (3), 25–32.

Tryon

W.W.

(2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis significance tests. Psychological Methods, 7, 371–386.10.1037//1082-989X.6.4.371

10.

Tukey

J.W.

(1977). Exploratory data analysis. Reading, MA: Addison-Wesley.

11.

Velleman

P.F.

Hoaglin

D.C.

(1981). Applications, basics, and computing of exploratory data analysis. Boston: Duxbury Press.