The weighted sum of binomial proportions and the interaction effect are two important cases of the linear combination of binomial proportions. Existing confidence intervals for these two parameters are approximate. We apply the -function method to a given approximate interval and obtain an exact interval. The process is repeated multiple times until the final-improved interval (exact) cannot be shortened. In particular, for the weighted sum of two proportions, we derive two final-improved intervals based on the (approximate) adjusted score and fiducial intervals. After comparing several currently used intervals, we recommend these two final-improved intervals for practice. For the weighted sum of three proportions and the interaction effect, the final-improved interval based on the adjusted score interval should be used. Three real datasets are used to detail how the approximate intervals are improved.
Consider independent binomials for . The linear combination of binomial proportions,
for given constants ’s, is an important parameter of interest. For examples, the weighted sum of proportions and the interaction effect are often used in the field of biomedicine, and they are two special cases of .
The weighted sum of proportions requires for positive ’s and is mainly applied to stratified data. For example, assume that the cure rate of a drug relates to the gender, and we know and , the male and female proportions in the population. Then the overall cure rate is equal to the weighted sum of two cure rates of male and female and using the weights and . The interaction effect requires for or . It mainly studies the interaction effect between two factors. Here are two simple examples for the two cases.
Innes et al.1 tested the tumorigenicity of Avadex (a fungicide) by continuous oral administration to both males (M) and females (F) of hybrid mice Y. Table 1 contains the number of hybrid mice that develop the tumor in each sex group. The goal is to estimate the tumorigenicity of Avadex using the weighted sum of two proportions. The parameter of interest is the probability of the tumorigenicity of Avadex
where and are the probabilities of the tumorigenicity of Avadex for males and females, respectively, and the weights of and are and , respectively.
The tumorigenicity of Avadex to males and females of hybrid mice Y.
Hybrid mice Y
Sex
With tumors
No
The sample size
M
4
14
18
F
1
14
15
Bonett and Price2 extracted a two-factor factorial design from Howell3 (1997, p. 404), where the factor A, age, is a two-level blocking factor (young and old) and the factor B, instruction, is also a two-level treatment factor (intentional and counting). The subjects in the study are classified into two categories by the number of words they recall: (i) less than or equal to six; and (ii) larger than six. The frequencies for four factor-level combinations are reported in Table 2. The parameter of interest is the interaction effect between age and instruction and is given in (17).
The effects of age and instruction on the status of the number of words recalled.
Frequencies
Age
Instruction
Sample size
Young
Intentional
2
8
10
Counting
1
9
10
Old
Intentional
5
5
10
Counting
0
10
10
There have been efforts to derive confidence intervals for the linear combination of proportions . Price and Bonett4 proposed the adjusted Wald interval that is simple to compute. Tebbs and Roths5 obtained a multivariate extension of the interval by Beal6 that focuses on . Zou et al.7 derived an interval based on the Wilson interval,8 and they claimed it is better than the adjusted Wald interval as the former has a coverage probability closer to the nominal level and a shorter interval length. Martín Andrés et al.9 proposed the score interval and several adjusted Wald-type approximate intervals. They recommended the score interval based on the numerical comparisons on coverage probability and interval length. Martín Andrés et al.10 further concluded that the score interval is the best in general, but the Wald3 interval (a variant of the Wald interval) is the best when the sample sizes are very small. Krishnamoorthy et al.11 proposed the closed-form fiducial confidence interval, and they claimed that the interval is even better than the score interval.
There are also efforts on constructing confidence intervals for the weighted sum of proportions. A stratified Wilson interval was given by Yan and Su.12 This interval is easy to compute and the confidence level of the interval is justified by extensive simulations. Decrouez and Robinson13 summarized seven confidence intervals for the weighted sum of two proportions that were originated from the intervals for the difference in two proportions. They recommended using the adjusted score interval for small samples, unless a simple calculation is important, in which case they advocated the Jeffreys–Perks interval.
The aforementioned intervals, however, are all approximate, and their confidence coefficients are less than the nominal level by significant amounts (see Tables 4, 5, and 8). To the best of our knowledge, the research on exact intervals for , including the weighted sum and the interaction effect, is limited. Wang14 recently proposed the -function method, which can improve any given approximate interval to an exact one and uniformly shorten any given exact interval. Here, an interval is exact if it has an infimum coverage probability (ICP, also called the confidence coefficient) over the entire parameter space greater than or equal to the nominal level , see Casella and Berger.15 Any confidence interval without this property is approximate. Lehmann and Romano16 (pp. 423–424) provided the definitions of the pointwise and uniformly asymptotically level confidence intervals. The goal of this article is to derive exact intervals by applying the general -function method to two groups of intervals. The first group is for the weighted sum of two proportions and consists of six intervals described by Krishnamoorthy et al.11 and Decrouez and Robinson;13 the second group is for the weighted sum of three proportions and the interaction effect and contains five intervals proposed by Price and Bonett,4 Zou et al.,7 Martín Andrés et al.,9 and Krishnamoorthy et al.11 Then we choose the optimal exact interval from each group.
In Section 2, we describe five approximate intervals for and the -function method for improving any given confidence interval. In Sections 3 and 4, the method is used to derive optimal exact intervals for the weighted sum of two and three binomial proportions, respectively. Section 5 focuses on the optimal exact interval for the interaction effect. Discussions are given in Section 6.
Preliminaries
Suppose we observe a random vector , where ’s are independent and each follows for . Let be the maximum likelihood estimator (MLE) of and let be the probability mass function (PMF) of . The parameter and sample spaces are
and
respectively. The linear function of binomial proportions, introduced in (1), is the parameter of interest. Let and . Then, belongs to a fixed interval and has the MLE
Five approximate intervals for
We describe five approximate intervals for . These intervals are used to infer the weighted sum of proportions and interact effect in the next three sections, and they are to be improved by the -function method.
The first interval for is the adjusted Wald interval proposed by Price and Bonett4:
where and is the upper -th percentile of the standard normal distribution.
The second interval is the modified Wilson interval proposed by Zou et al.7 Its lower and upper confidence limits are
respectively, where
The third interval , proposed by Martín Andrés et al.,9 is inverted from a family of the score tests, each test deals with the hypotheses:
for a fixed value . Let be the log-likelihood function in terms of , and let be the restricted MLE of under for . Introduce
Then, the score test statistic is
where
The lower and upper confidence limits of the score interval are the solutions of as an equation in . Equivalently, the two confidence limits are solved by
The fourth interval is the adjusted score interval, which is a variant of . A special case of this interval for was discussed by Decrouez and Robinson.13 Here, we extend the interval to any . The confidence limits of the interval are obtained by solving the following equation in :
Equations (6) and (7) are nearly identical except for an extra constant factor in the right-hand side of (7), which is due to Miettinen and Nurminen.17
The fifth interval is the fiducial interval proposed by Krishnamoorthy et al.11:
with
where and . Here, denotes the upper -th percentile of the Beta distribution with two positive shape parameters and .
The -function method
For a given initial interval for , , Wang14 proposed the -function method which improves of any level to an exact interval of level , denoted by ; if itself is exact, then is a subset of . To derive , introduce a test statistic for in (5) through
where a small value of supports . The -function based on is
Let
Then, the improved interval for at the sample point is given by The process of (9), (10), and (11) is the -function method to improve in Wang.14 He proved two facts: (i) is exact for any interval , that is, the ICP of interval satisfies
(ii) is contained in if is exact.
If we treat the one-time improved interval as the initial interval and apply the -function method to it, then the two-time improved interval is generated. Therefore, is also exact and is a subset of . We repeat this process for times until . Then, is the final-improved interval by the -function method, denoted by . Wang14 proved that is a exact interval and is a subset of . Our numerical calculations show that a finite exists. In practice, if the lengths of and are close enough for some , say their difference is <0.0001, then we stop the construction process and define .
Exact intervals for the weighted sum of two proportions
In this section, we consider in equation (1). Here, two independent and are observed. The parameter and sample spaces are
respectively. The parameter of interest is , where , and . Here, we use a new notation to replace the general in (1), that is, is the weighted sum of two proportions. Now, has a range of . The MLE of is .
Following the -function method, for any fixed , consider the hypotheses:
The null hypothesis , in terms of , is equal to , where
For a given interval for , define a test statistic as in (9). Then, the -function follows (10), i.e.
This function looks complicated, but we give the details of calculation in Section 3.2. Find and . Then, the one-time improved interval (exact) is
We repeat the improving process multiple times and obtain the final-improved interval , which is also exact and contained in .
Improving six approximate intervals to exact intervals
We apply the above approach to improve the following five approximate intervals by Decrouez and Robinson13 and the fiducial interval by Krishnamoorthy et al.11 to exact intervals, and further derive the corresponding final-improved intervals.
The first interval for is a special case of the adjusted Wald interval in (3) when . The second interval is the Jeffreys–Perks interval, which is obtained by solving the following inequality in
where
and
The third interval needs the likelihood-ratio test statistic
where is the restricted MLE of under and . The interval is
The intervals , , and are the special cases of the score interval in (6), the adjusted score interval in (7), and the fiducial interval in (8), respectively, when . Decrouez and Robinson13 recommended for small sample sizes. Krishnamoorthy et al.11 claimed that is better than .
Eighteen 95% confidence intervals for : , , for , and their lengths at for and at for .
Lower
Upper
Length
Lower
Upper
Length
Lower
Upper
Length
0.0480
0.2866
0.2386
0.0520
0.2923
0.2403
0.0528
0.2915
0.2387
0.0547
0.2771
0.2224
0.0436
0.2914
0.2478
0.0436
0.2888
0.2452
0.0462
0.2658
0.2196
0.0500
0.2811
0.2311
0.0534
0.2777
0.2243*
0.0544
0.2856
0.2312
0.0528
0.2938
0.2410
0.0536
0.2915
0.2379
0.0537
0.2888
0.2351
0.0529
0.2943
0.2414
0.0536
0.2917
0.2381
0.0600
0.2782
0.2182
0.0555
0.2872
0.2317
0.0555
0.2872
0.2317
0.0756
0.3410
0.2654
0.0814
0.3619
0.2805
0.0847
0.3590
0.2743
0.0794
0.3347
0.2553
0.0666
0.3606
0.2940
0.0656
0.3513
0.2857
0.0714
0.3274
0.2560
0.0681
0.3425
0.2744
0.0775
0.3431
0.2656
0.0809
0.3401
0.2592
0.0807
0.3446
0.2639
0.0840
0.3451
0.2611*
0.0799
0.3432
0.2633
0.0819
0.3486
0.2667
0.0861
0.3482
0.2621
0.0863
0.3360
0.2497
0.0817
0.3471
0.2654
0.0829
0.3459
0.2630
Note: The shortest exact interval at a sample point is marked by asterisk.
A real-data analysis: Example 1 (continued)
The goal is to estimate the overall proportion of tumorigenicity of Avadex, , which is also given in (2). Assume the ratio of male to female is , then and .
For hybrid mice Y, as shown in Table 1, the dataset is . Table 3 contains 18 intervals at the observed point . Among the 12 exact intervals, the final-improved likelihood-ratio interval is the shortest among the exact intervals.
Innes et al.1 were also interested in the tumorigenicity of Avadex of hybrid mice X. The observations are and the associated intervals are given in Table 3 as well. In this case, the final-improved score interval is the shortest.
Next, we give the details of calculating the final-improved interval, for example, the above .
Step 1: Following (14) compute the likelihood-ratio interval for all sample points in . Here, there are 304 sample points. For example, .
Step 2: We compute in this step. Let the test statistic as in (9). Following (10), the -function at is
where and are given in (13). Then, we find the smallest and largest solutions of and obtain . Two specific steps of computing for any and finding the smallest and largest solutions of are as follows:
Step 2-1: Computing at a fixed involves finding a global supremum of the summation in the right-hand side of the previous equation when runs over set . First, we use the grid search method to search for a maximum point, say , that maximizes the summation on an evenly distributed subset of . Then, we find the local maximum of the summation within a small neighborhood of by the function “optimize” in R. If the subset is dense enough in , then the local maximum is equal to the target value . For example, when , , runs over this range with a step of length 0.0005, and .
Step 2-2: The range of is . To find the smallest root, , of , we start the search from 0 to 1 in a lattice manner and find the local root by the function “uniroot” in R. For example, if using the step length of 0.0005, we compute , where is the multiples of 0.0005, and find for but . Then, we apply the function “uniroot” on interval and obtain a root . To be accurate on the fourth decimal place, we always round down to . Similarly, we start the search from 1 to 0 and find the largest root, , of .
Step 3: Repeat Step 2 to compute for all sample points in .
Step 4: Repeat Step 3 on and obtain for all ’s. Repeat this process for times, where is the smallest integer so that . Then, we have . In the current example, since . Thus, , which shows that the improvements over both the lower and upper limits of are observed. It is worth noting that when constructing a from the grid search is conducted only within rather than , the whole range of . This is because is a subset of , which greatly simplifies the search of the smallest and largest roots when . The R codes for this calculation are given in the Supplemental Material. In addition, varies but is less than 20 in all intervals calculated in this article.
In order to observe the changes of intervals at the fixed sample point in the improvement process, we give the lower and upper limits of six given initial intervals, the one-time improved intervals, and the final improved intervals, respectively, as shown in Figure 1. It is evident that all final improved exact intervals are subsets of the corresponding one-time improved exact interval. Therefore, the -function method continuously shortens exact intervals and the final improved intervals depend on the initial intervals.
Six initial 95% approximate intervals and their one-time and final improved exact intervals at in Example 1. The dot points are the lower and upper limits of intervals, and the dashed lines represent the lower and upper limits of the one-time improved exact interval.
Figure 2 exhibits three level-0.05 rejection regions of the likelihood-ratio test, the one-time improved test and the final improved test, respectively, for the hypotheses in (12) when and . The rejection region of the final improved test is the union of that of the one-time improved test and six triangle points. In particular, the observed sample point in Example 1 belongs to the former region but not the latter region. So, the final improved test is uniformly more powerful than the one-time improved test, and this conclusion is consistent with the fact that the final improved interval is contained in the one-time improved interval .
Three level-0.05 rejection regions for testing in Example 1 when : (a) the likelihood ratio test (the dots); (b) the one-time improved test (the dots); and (c) the final improved test (the dots plus the triangles).
Comparing the six approximate intervals and their exact improvements
We evaluate the performance of a confidence interval using the ICP for reliability and the total interval length (TIL) for precision. For a given interval of , its coverage probability function is
Then, the ICP of interval is the infimum of over the entire parameter space . The TIL of interval is
which is well-defined due to the finite sample space and the finite range of . The TIL is a much simpler measurement for precision than the expected length of since the TIL is a single value but the expected length is a function of the parameter vector and comparing two single values is much easier than comparing two functions. A good exact interval should have an ICP no less than and a small TIL. In particular, for two exact intervals and , is preferred if .
In Section 3.1, we obtain 18 confidence intervals for , including six original (approximate) intervals , six one-time improved (exact) intervals , and six final-improved (exact) intervals for . Wang14 ensured that the shortest exact interval here should be equal to one of the ’s.
When using the -function method to improve an interval, numerical calculation errors may occur in the interval construction process, especially when implementing (10) and (11). Equation (10) computes a global supremum of probability over and equation (11) solves the smallest and largest roots of the -function larger than . No existing software is able to find these values quickly and always correctly because a software typically provides a local supremum and an arbitrary root. Therefore, calculating the ICP of improved intervals is an effective way to validate the computation of the intervals. If the calculated ICP is not smaller than , it is safe to say that the improved interval is correct. Otherwise, there exist some errors in the computation.
We can compute the coverage probability of any at any given point in using (15) without any errors. However, since the coverage probability function is not continuous, the ICP requires extensive computations: (i) choose 140,000 of pairs in the parameter space . Among them, 40,000 pairs are selected in a way that each is equal to the multiples of 0.005, and 100,000 pairs are randomly selected from . (ii) Compute the precise coverage probabilities at these pairs using (15) and the minimum of these coverage probabilities is the computed ICP.
Table 4 contains the TIL and ICP of the eighteen 95% intervals discussed so far for four cases of . Decrouez and Robinson13 claimed the adjusted score interval is the best for small samples. However, we find the ICP of is worse than those of and . Interestingly, also has a smaller TIL than . So, is better than because the former has a larger ICP but a smaller TIL. Nevertheless, none of these approximate intervals reach the nominal level 0.95, while all the improved intervals and have an ICP at 0.95. We find that or is the shortest among the exact intervals. If the computation time is a concern, one may use the one-time improved interval or .
The TILs and ICPs for the eighteen 95% confidence intervals for : (adjusted-Wald), (Jeffreys–Perks), (likelihood ratio), (score), (adjusted score), (fiducial), and the improved intervals, and , for , when varies.
TIL
ICP
TIL
ICP
TIL
ICP
35.358
0.8991
37.654
0.9500
36.981
0.9500
33.628*
0.7837
38.637
0.9500
37.936
0.9500
34.055
0.8327
38.983
0.9500
37.901
0.9500
34.676
0.8415
36.878
0.9500
36.465
0.9500
35.651
0.8504
36.838
0.9500
36.440
0.9500
33.773
0.8949
36.688*
0.9500
36.360*
0.9500
80.607
0.9099
85.672
0.9500
84.446
0.9500
78.119
0.7851
86.869
0.9500
86.391
0.9500
78.607
0.8270
89.021
0.9500
88.710
0.9500
79.514
0.8514
83.621
0.9500
83.522
0.9500
80.766
0.8514
82.901*
0.9500
82.292
0.9500
77.947*
0.8983
83.140
0.9500
82.080*
0.9500
85.507
0.9040
90.925
0.9500
89.404
0.9500
82.977
0.8255
91.754
0.9500
90.523
0.9500
83.194
0.8308
95.043
0.9500
90.309
0.9500
84.209
0.8543
88.000
0.9500
86.745
0.9500
85.485
0.8555
88.008
0.9500
86.718*
0.9500
82.512*
0.9001
87.639*
0.9500
87.128
0.9500
198.166
0.9185
208.426
0.9500
205.953
0.9500
194.545
0.8603
210.374
0.9500
208.780
0.9500
196.353
0.8332
211.765
0.9500
207.241
0.9500
197.420
0.8603
205.013
0.9500
202.656
0.9500
199.080
0.8740
205.081
0.9500
202.220
0.9500
194.180*
0.9092
201.909*
0.9500
200.383*
0.9500
Note: TILs: total interval lengths; ICP: infimum coverage probability. The shortest interval is marked by asterisk.
To gain insight into the overall changes in confidence intervals, we present the TILs of six given initial intervals, the one-time improved intervals, and the final improved intervals in Figure 3. Notably, the TILs of all final improved exact intervals are shorter than those of the corresponding one-time improved exact intervals. This observation underscores the effectiveness of the -function method in continuously shortening exact intervals. Again, different initial intervals lead to different final improved intervals. Here, the final improved adjusted score interval is the shortest.
The total interval lengths (TILs) of eighteen 95% confidence intervals for when .
Exact intervals for the weighted sum of three proportions
We now consider the weighted sum of proportions for . Three independent ’s for are observed. The parameter and sample spaces are
respectively. The parameter of interest is , for some and , and the range of is still .
For any fixed , consider the hypotheses:
The null hypothesis, in terms of , is equal to where
and
Similar to Section 3, for a given interval for , introduce the test statistic and -function below follow (9) and (10):
and
where . Then, the one-time improved confidence interval for at is where and are the infimum and supremum of set , respectively. Furthermore, we derive the final-improved interval , a subset of .
Improving five approximate intervals to exact intervals
We use the above procedure to improve the following five approximate intervals to exact intervals and then uniformly shorten these exact intervals.
The five approximate intervals, for , are the special cases of the adjusted Wald interval in (3), the modified Wilson interval in (4), the score interval in (6), the adjusted score interval in (7), and the fiducial interval in (8), respectively, for . After applying the -function method to these approximate intervals, we obtain the one-time improved intervals and the final-improved intervals for .
Interval comparison and a real-data analysis
The comparison among intervals is still conducted using the ICP in (15) for reliability and the TIL in (16) for precision.
The TIL and ICP of the five approximate 95% intervals and 10 exact improved intervals discussed in the previous section are presented in Table 5. Zou et al. claimed that the modified Wilson interval is shorter than the adjusted Wald interval . This is true in the table, but we also find has a much smaller ICP than . In other words, the dominance of over on TIL is because it has an ICP value 0.7530 much less than both of 0.95, the nominal level, and 0.8687, the ICP of . In fact, if we only compare the TIL, would be the best interval among the five approximate intervals in Table 5. Again, this is because has the smallest ICP’s (0.7530 and 0.5994). It is not appropriate to conclude that is the shortest interval without checking the ICP.
Any comparison, including interval comparison, must have an objective baseline. The above discussion shows that the nominal level does not serve this purpose. The major problem is that an interval of level may have an ICP value anywhere in . An example is the well-known Wald interval for a single proportion that has a zero ICP for any sample size, see Agresti.18 However, it is also a common practice to compare intervals which have the same nominal level . In particular, this occurs frequently in comparison of approximate intervals. Such practices are easy to implement but often lead to some misleading conclusions, including that is the shortest.
The TILs and ICPs of the fifteen 95% confidence intervals for : (adjusted Wald), (modified Wilson), (score), (adjusted score), (fiducial), and the improved intervals, and , for , when and .
TIL
ICP
TIL
ICP
TIL
ICP
82.112
0.8687
89.751
0.9500
89.395
0.9500
71.666*
0.7530
89.881
0.9500
89.542
0.9500
81.704
0.8610
88.110
0.9500
87.355
0.9500
87.602
0.8890
87.510*
0.9500
87.079*
0.9500
77.963
0.8897
87.553
0.9500
87.191
0.9500
314.321
0.8670
339.917
0.9500
337.927
0.9500
266.203*
0.5994
347.929
0.9500
347.389
0.9500
311.322
0.9134
332.468
0.9500
331.180
0.9500
323.119
0.9206
331.619*
0.9500
330.376*
0.9500
302.854
0.8995
332.046
0.9500
330.384
0.9500
Note: TILs: total interval lengths; ICP: infimum coverage probability. The shortest interval is marked by asterisk.
A fair baseline for interval comparison is to require an ICP not less than the nominal level , that is, we choose an interval with the shortest TIL as the best interval among exact intervals. Using the TIL rather than the expected length makes such a choice possible because the TIL is a single value, but the expected length is a function over the parameter space and the interval with the smallest expected length typically does not exist. Table 5 shows that is the shortest. So, when we recommend the final-improved adjusted score interval or the one-time improved interval if the computation time for is a big concern. Martín Andrs et al. recommended the score interval and the Wald3 interval, a variant of the Wald interval. The final-improved interval for the Wald3 interval has a large TIL, and we do not include the numerical results of this interval.
Li et al.19 conducted a study on the efficacy of thymosin in the treatment of bronchogenic carcinoma patients receiving radiotherapy in three gruops . The sample sizes and the number of survival for the three groups are . Let , and be the survival rates for the three groups, respectively. Our goal is to estimate the overall survival rate of patients under the assumption that the ratio of three groups is . Then, the parameter of interest is for , and . Fifteen intervals and their lengths at the observed point are listed in Table 6. The final-improved score interval is the shortest.
Fifteen 95% confidence intervals for : , , for , and their lengths at when .
Lower
Upper
Length
Lower
Upper
Length
Lower
Upper
Length
0.8156
1.0088
0.1932
0.8022
0.9988
0.1966
0.8113
0.9986
0.1873
0.7961
0.9952
0.1991
0.7557
0.9990
0.2433
0.7558
0.9990
0.2432
0.8363
0.9952
0.1589
0.8318
0.9986
0.1668*
0.8318
0.9986
0.1668*
0.8278
0.9954
0.1676
0.8295
0.9986
0.1691
0.8295
0.9986
0.1691
0.8264
0.9971
0.1707
0.8293
0.9988
0.1695
0.8295
0.9986
0.1691
Note: The shortest exact interval at a sample point is marked by asterisk.
Exact intervals for the interaction effect
We assess the interaction effect in a two-factor factorial design of two factors A and B, where each factor assumes two levels: 1 and 0. Suppose four independent binomials are observed from the factor-level combinations: , as shown in Table 7.
The frequency table for a 22 factorial design.
B
()
1
0
A
1
()
()
0
()
()
The parameter of interest is the interaction effect, measured by with a range of . Similar to the analysis of variance, when is not equal to zero, then there exists an interaction effect between factors A and B.
Denote the parameter and sample spaces by
respectively. For any fixed , consider the hypotheses:
The null hypothesis is rewritten as
where
As described in Section 2, for any given interval for , we follow the process of (9) to (11) and derive the one-time and final-improved intervals and , respectively.
We improve the five approximate intervals discussed in Section 2.1 for the interaction effect , that is, the adjusted Wald interval , the modified Wilson interval , the score interval , the adjusted score interval , and the fiducial interval . These intervals are given in (3), (4), and (6) to (8), respectively, using , , and . Also, their exact improvements and for are derived.
The TIL and ICP of these 95% intervals are displayed in Table 8 for two cases of small sample sizes. The final-improved adjusted score interval is shorter than the others.
Example 2 (continued). The dataset in Table 2 yields and in the setting of Table 7. Here, is the number of persons who recalled more than six words for a factor-level combination and is the success probability of the binomial . The parameter of interest is the interaction effect,
At the observed , the five one-time improved intervals are , , , , and , and their lengths are 0.9863, 1.0290, 0.9624, 0.9567, and 0.9789, respectively. Here, is the shortest. All intervals include zero. Hence, there is no significant interaction effect. Due to the computation time, we do not compute the final-improved interval here.
Similar to the two-way ANOVA, when there is no interaction, we estimate the two main effects: (age) and (instruction). At the observed , the five one-time improved intervals for and are listed in Table 9. All intervals for contain zero. So, age does not affect the number of words recalled. However, all intervals for exclude zero. Instruction affects the number of words recalled and the probability of recalling seven words or more for the level “intentional” is less than that of the level “counting.” Here, is the shortest interval.
The TILs and ICPs of fifteen 95% confidence intervals for : (adjusted-Wald), (modified Wilson), (score), (adjusted score), (fiducial), and the improved intervals and for , when and .
TIL
ICP
TIL
ICP
TIL
ICP
414.263
0.8306
472.433
0.9500
470.471
0.9500
379.213
0.8286
480.655
0.9500
479.193
0.9500
422.280
0.8985
471.912
0.9500
466.705
0.9500
460.983
0.9222
470.440
0.9500
465.609*
0.9500
392.309
0.8587
470.426
0.9500
466.812
0.9500
1481.475
0.8331
1683.047
0.9500
1655.935
0.9500
1346.452
0.7681
1638.409
0.9500
1636.757
0.9500
1483.880
0.9125
1637.221
0.9500
1612.980
0.9500
1563.320
0.9269
1629.838
0.9500
1603.388*
0.9500
1409.003
0.9020
1610.231
0.9500
1606.992
0.9500
Note: TILs: total interval lengths; ICP: infimum coverage probability. The shortest exact interval is marked by asterisk.
Five one-time improved 95% confidence intervals for and : and for , and their lengths at when .
Lower
Upper
Length
Lower
Upper
Length
0.1435
0.3368
0.4803*
0.5231
0.0397
0.4834
0.1924
0.3394
0.5318
0.5509
0.0156
0.5353
0.1688
0.3221
0.4909
0.5396
0.0574
0.4822
0.1688
0.3221
0.4909
0.5386
0.0571
0.4815
0.1722
0.3087
0.4809
0.5223
0.0472
0.4751*
Note: The shortest exact interval at a sample point is marked by asterisk.
Discussion
The weighted sum of proportions and the interaction effect are two important cases of the general linear combination of proportions. How to estimate them with a guaranteed confidence? In this article, we first propose exact intervals to answer this question. We recommend the final-improved fiducial interval or adjusted score interval for the weighted sum of two proportions ; the final-improved adjusted score intervals and for the weighted sum of three proportions and the interaction effect , respectively. When the computation time is a concern, we use the one-time improved intervals.
There exist many approximate intervals for the weighted sum of proportions and interaction effect. However, as discussed in Section 4.2, simply selecting a seemingly short approximate interval, for example, , may result in a big loss on interval reliability. Thus, selecting a short interval without check its ICP should not be done in practice and utilizing exact intervals is necessary.
Exact intervals have a long-time reputation for being difficult to derive. With the appearance of the -function method, it is not a problem anymore at least from the mathematical point of view. This is because any interval can be improved to an exact interval following the process of (9) to (11). Furthermore, as the computing ability advances, the numerical implementations of exact intervals become more feasible.
One drawback for using the final-improved intervals is the computing time of these intervals. The major obstacle is from the calculation of in (10) as we still do not have an efficient computing program to find a global supremum quickly and precisely, especially for a multivariate function. As seen in Sections 3 to 5, we need to find the supremum of unary, binary, and ternary functions, respectively. In fact, this is a general optimization problem without a solid answer yet. A simple replacement for the final-improved interval is the one-time improved interval, which can be computed, as seen in Example 2, and also provides reliable inferences.
Supplemental Material
sj-pdf-1-smm-10.1177_09622802241229200 - Supplemental material for Exact interval estimation for the linear combination of binomial proportions
Supplemental material, sj-pdf-1-smm-10.1177_09622802241229200 for Exact interval estimation for the linear combination of binomial proportions by Shuiyun Lu, Weizhen Wang and Tianfa Xie in Statistical Methods in Medical Research
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Lu and Wang’s research is partially supported by the Beijing Natural Science Foundation (No. 1222002) and Xie’s research is partially supported by the National Natural Science Foundation of China (Nos. 11971045, 12071457).
ORCID iD
Weizhen Wang
Supplemental material
Supplemental material for this article is available online.
References
1.
InnesJRMUllandBMValerioMG, et al. Bioassay of pesticides and industrial chemicals for tumorigenicity in mice: a preliminary note. J Natl Cancer Inst1969; 42: 1101–1114.
2.
BonettDGPriceRM. Statistical inference for a linear function of medians: confidence intervals, hypothesis testing, and sample size requirements. Psychol Methods2002; 7: 370–383.
3.
HowellDC. Statistical methods for psychology. Belmont: Duxbury Press, 1997.
4.
PriceRMBonettDG. An improved confidence interval for a linear function of binomial proportions. Comput Stat Data Anal2004; 45: 449–456.
5.
TebbsJMRothsSA. New large-sample confidence intervals for a linear combination of binomial proportions. J Statist Plann Inference2008; 138: 1884–1893.
6.
BealSL. Asymptotic confidence intervals for the difference between two binomial parameters for use with small samples. Biometrics1987; 43: 941–950.
7.
ZouGYHuangWZhangX. A note on confidence interval estimation for a linear function of binomial proportions. Comput Stat Data Anal2009; 53: 1080–1085.
8.
WilsonEB. Probable inference, the law of succession, and statistical inference. J Am Stat Assoc1927; 22: 209–212.
9.
Martín AndrésAÁlvarez HernándezMHerranz TejedorI. Inferences about a linear combination of proportions. Stat Meth Med Res2011; 20: 369–387.
10.
Martín AndrésAHerranz TejedorIÁlvarez HernándezM. The optimal method to make inferences about a linear combination of proportions. J Stat Comput Simul2012; 82: 123–135.
11.
KrishnamoorthyKLeeMZhangD. Closed-form fiducial confidence intervals for some functions of independent binomial parameters with comparisons. Stat Meth Med Res2017; 26: 43–63.
12.
YanXSuXG. Stratified Wilson and Newcombe confidence intervals for multiple binomial proportions. Stat Biopharm Res2010; 2: 329–335.
13.
DecrouezGRobinsonAP. Confidence intervals for the weighted sum of two independent binomial proportions. Aust N Z J Stat2012; 54: 281–299.
14.
WangWZ. On construction of optimal exact confidence intervals. Statistica Sinica2023; 33: 2739–2762.
MiettinenONurminenM. Comparative analysis of two rates. Stat Med1985; 4: 213–226.
18.
AgrestiA. Categorical Data Analysis. 3rd ed.New York: John Wiley & Sons, 2013.
19.
LiSHSimonRMGartJJ. Small sample properties of the Mantel-Haenszel test. Biometrika1979; 66: 181–183.
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.