Hypothesis Testing on the Variances for Two Normal Populations Under Equality Restriction on the Means

Abstract

This article deals with the problem of testing the variances of two normal populations under the assumption that the mean parameters are equal. We propose an approximate likelihood ratio test and a test based on bootstrap observations. Two computational approach tests are suggested using the maximum likelihood estimators of the model parameters. Further, several generalized test methods are discussed to test the variances. Through a detailed Monte Carlo simulation study, the performances of the proposed test procedures are compared numerically in terms of their sizes and powers. Finally, we present real-life data sets to demonstrate the potential applicability of the suggested test procedures.

AMS Subject Classification: 62F03, 62F05, 62F10, 65C05

Keywords

Bootstrap samples plug-in estimators power function simulation study size values

1. Introduction

Suppose there are two independent random samples $\underset{˜}{X} = (X_{1}, X_{2}, \dots, X_{m})$ and $\underset{˜}{Y} = (Y_{1}, Y_{2}, \dots, Y_{n})$ available from two normal populations $N (μ, σ_{1}^{2})$ and $N (μ, σ_{2}^{2})$ , respectively. The primary focus of this study is to test hypothesis about the variance parameters $σ_{1}^{2}$ and $σ_{2}^{2}$ , denoted by λ₁ and λ₂, respectively. It is important to mention that one may take interest in considering the hypothesis testing on functions of the scale parameters such as standard deviation, precision or powers of the variances, that is, $λ_{i}^{\pm k} (i = 1, 2); k > 0$ . However, it can be shown that testing such transformations is statistically equivalent to testing the variances themselves. Motivated by this observation, we restrict our attention to the problem of testing hypothesis about the variances under the assumption that the means are equal for two normal populations. Furthermore, we observe that the model problem is symmetric about the populations; hence, we only discuss testing the hypothesis about the variance of the first population, that is, about λ₁. However, the same arguments are also valid for the second population; specifically, the test procedures for hypotheses concerning λ₂ can be obtained by swapping the roles of the two populations and their respective sample observations. The primary goal of the current study is to test the variances when samples are available from two normal distributions, given that their means are equal. Mathematically, the hypothesis testing problem is stated as: test

H_{0} : λ_{1} = λ_{0} against H_{a} : λ_{1} \neq λ_{0},

(1.1)

where λ₀ is some known specified value of λ₁. Under the current model set-up, researchers were fascinated in inferring the common mean parameter μ because of its wide variety of applications in the meta-analysis, inter-block designs, medicine, agriculture, economics and other related fields. Graybill and Deal^[1] were likely the first to look into the problem of estimating the common mean μ, and proposed a combined improved estimator for μ, which depends on the sample variances. This estimator outperforms the sample means when sample sizes exceed 10. Later on, many researchers conducted significant research to obtain certain classical as well as decision-theoretic results. For some interesting results and a detailed review of the point estimation of the common mean parameter of two independent normal distributions, we refer to the articles by Kubokawa,^[2] Pal and Sinha,^[3] Pal et al., ^[4] Tripathy and Kumar,^[5], Rukhin,^[6] and the references cited therein. From a Bayesian perspective, the common mean estimation problem has been investigated by Kelleher^[7] and Mitra and Sinha^[8].

Fairweather^[9] studied the problem of interval estimation of the common mean μ. The author used a linear combination of Student’s t-statistics as the pivot variable and determined an exact confidence interval for parameter μ. Jordan and Krishnamoorthy^[10] utilized the idea of Fairweather^[9] and derived confidence intervals for μ using a linear combination of independent F-statistics. Cohen and Sackrowitz^[11] derived several test procedures for parameter μ, such as t-tests, Fisher’s combined test, likelihood ratio test (LRT) and tests based on the Bayesian approach. For some noteworthy contributions and interesting papers concerning hypothesis testing and interval estimation on the common mean of two or more normal populations, readers may look into the articles of Krishnamoorthy and Lu,^[12] Chang and Pal,^[13] Li and Williamson,^[14] Malekzadeh and Kharrati-Kopaei^[15] and the references cited therein.

The common mean problem for two or more normal populations has received considerable attention in the literature on statistical inference. However, some researchers are also curious about inferring the scale parameter (namely, the variances) or any function of it. In one population case, Strawderman^[16] proposed a class of minimax estimators and a class of generalized Bayes estimators for estimating positive powers of the variance. Maruyama and Strawderman^[17] derived a class of minimax generalized Bayes estimators for the variance of a single normal population under entropy and quadratic loss. Using the linearly exponential (Linex) loss function, Zou et al.^[18] proposed an improved Stein-type estimator for the standard error for the case of the single normal population. The authors further extended their results to any power of the standard deviation. In this sequel, Bobotas and Kourouklis^[19] investigated the estimation problem of normal variance ratio and precision parameters when samples are available from two independent normal distributions with unknown means. The authors derived a class of smooth and simple estimators for the ratio parameter by considering the quadratic and entropy loss functions. From a decision-theoretic perspective, Tripathy et al.^[20] investigated the problem of estimating the common standard deviation, whereas Patra et al.^[21] studied the estimation problem for the common variance of multiple normal populations.

In this article, we revisit the model previously studied by Jena et al.^[22] and focus on the problem of testing the hypothesis about the variances. In regression problems, the error variances or the standard errors play an important role; hence, it is vital to study them. Jena et al.^[22] considered the point and interval estimation of powers of the variances for two normal populations under equality restriction on the mean parameters. The authors considered various interval estimators, including some generalized confidence intervals based on certain exciting estimators of ‘the common mean μ’. Utilizing the combined estimators of μ, several plug-in estimators are constructed for the variance parameters λ_i; (i = 1, 2), and corresponding generalized test variables have been derived to propose some generalized test methods. In addition, we have also presented a few more test procedures based on a computational approach and compared their performances with all these generalized test procedures. Our simulation investigation reveals that these generalized test procedures do not perform well, unlike in the case of confidence intervals, where the generalized intervals are the second best, the first being the highest posterior density interval.

The organization of the article is structured in the following manner. Section 2 introduces some fundamental point estimators for λ₁, including the maximum likelihood estimator (MLE), the unbiased estimator and several plug-in estimators using the existing estimators of ‘the common mean μ’. Section 3 is devoted for testing hypothesis on λ₁. Specifically, Subsection 3.1 presents the asymptotic LRT, whereas in Subsection 3.2, we propose a bootstrap-based LRT. Subsection 3.3 discusses two computational approach tests (CATs) that use the MLEs of the model parameters. In Subsection 3.4, we utilize the plug-in estimators and propose several test methods with the help of generalized p value method. A comprehensive simulation study is carried out to measure the effectiveness of the suggested tests with respect to their sizes and powers. Section 4 demonstrates the practical relevance of the proposed methodologies utilizing real-life data examples and concludes with a summary of key observations in Section 5.

Remark 1.1. If we have only one population, $\underset{˜}{X} = (X_{1}, X_{2}, \dots, X_{m}) \sim N (μ, λ_{1})$ , then to test a hypothesis on λ₁, one has to use the statistic $G (S_{x}^{2})$ , a function of $S_{x}^{2}$ where $S_{x}^{2} = \sum_{i = 1}^{m} {(X_{i} - \bar{X})}^{2}$ . However, in the presence of another population, $s a y \underset{˜}{Y} = (Y_{1}, Y_{2}, \dots, Y_{n}) \sim N (μ, λ_{2})$ , one has several competing estimators for μ, namely, ${\hat{μ}}_{G D}, {\hat{μ}}_{K S}, {\hat{μ}}_{M K}, {\hat{μ}}_{T K}, {\hat{μ}}_{B C 1}, {\hat{μ}}_{B C 2}$ and ${\hat{μ}}_{G M}$ .^[5] Further, we note that these competing estimators perform better than the individual sample means in terms of risk under some conditions on the sample sizes^{[1, 5]} This motivates us to utilize these combined competing estimators of μ (since these estimators contain information from both populations) to construct test methods for testing hypothesis on λ₁ and see how these test-procedures perform as compared to the test based on only $G (S_{x}^{2})$ , and other likelihood-based tests.

2. A Review on Point Estimation of the Variances Under Common Mean Model Set-up

This section provides a review of point estimation methods for the variances of two normal populations under the assumption that their means are equal.

Various point estimators, such as the MLEs and different plug-in estimators for the powers of the variances, have been derived by Jena et al.^[22] by incorporating information from both populations through existing well-established estimators of μ. One can directly get all these estimators for λ₁ by substituting k = 1. However, for the sake of completeness, we discuss these without giving the details. Let us consider independent random samples $\underset{˜}{X} = (X_{1}, X_{2}, \dots, X_{m})$ and $\underset{˜}{Y} = (Y_{1}, Y_{2}, \dots, Y_{n})$ drawn from two normal populations N(μ, λ₁) and N(μ, λ₂), respectively. For the proposed model, the minimal sufficient exist and are given by $(\bar{X}, \bar{Y}, S_{x}^{2}, S_{y}^{2})$ , where the random variables are defined as:

\bar{X} = \frac{1}{m} \sum_{i = 1}^{m} X_{i}, \bar{Y} = \frac{1}{n} \sum_{j = 1}^{n} Y_{j}, S_{x}^{2} = \sum_{i = 1}^{m} {(X_{i} - \bar{X})}^{2}, S_{y}^{2} = \sum_{j = 1}^{n} {(Y_{j} - \bar{Y})}^{2} .

(2.1)

Closed-form expressions for the MLEs of parameters μ, λ₁ and λ₂ are not available.^[4] Consequently, we obtained the MLEs numerically by solving a non-linear system of three equations in three unknowns, as given in Jena et al.^[22] The resulting solutions, denoted as ${\hat{μ}}_{M L}, {\hat{λ}}_{1 M L}$ and ${\hat{λ}}_{2 M L}$ , are the MLEs of the associated parameters μ, λ₁ and λ₂, respectively.

Further, $\bar{X} \sim N (μ, λ_{1} / m), \bar{Y} \sim N (μ, λ_{2} / n), S_{x}^{2} / λ_{1} \sim χ_{(m - 1)}^{2}$ (a chi-square distribution with m – 1 degrees of freedom) and $S_{y}^{2} / λ_{2} \sim χ_{(n - 1)}^{2}$ (a chi-square distribution with n − 1 degrees of freedom). Using this knowledge, the unbiased estimator ${\hat{λ}}_{1 U}$ of λ₁ is defined as:

{\hat{λ}}_{1 U} = \frac{S_{x}^{2}}{m - 1} = \frac{1}{m - 1} \sum_{i = 1}^{m} {(X_{i} - \bar{X})}^{2} .

(2.2)

It is challenging to obtain a uniformly minimum variance unbiased estimator for λ₁ because the sufficient statistics $(\bar{X}, \bar{Y}, S_{x}^{2}, S_{y}^{2})$ are not complete. But, Equation (2.2) shows that the unbiased estimator ${\hat{λ}}_{1 U}$ of λ₁ contains the sufficient statistic $\bar{X}$ , which is the unbiased estimator for μ, based on the samples available from the first population only. Now, replacing $\bar{X}$ in ${\hat{λ}}_{1 U}$ by the combined estimator ${\hat{μ}}_{G D}$ (proposed by Graybill and Deal^[1]) for μ, we get the plug-in estimator for λ₁ as:

{\hat{λ}}_{1 G D} = \frac{1}{m - 1} \sum_{i = 1}^{m} {(X_{i} - {\hat{μ}}_{G D})}^{2} .

(2.3)

Similarly, using the other combined estimators ${\hat{μ}}_{K S}, {\hat{μ}}_{M K}, {\hat{μ}}_{T K}, {\hat{μ}}_{B C 1}, {\hat{μ}}_{B C 2}$ and ${\hat{μ}}_{G M}$ , which are given in Tripathy and Kumar,^[5] we can construct plug-in estimators for λ₁, denote as ${\hat{λ}}_{1 K S}, {\hat{λ}}_{1 M K}, {\hat{λ}}_{1 T K}, {\hat{λ}}_{B C 1}, {\hat{λ}}_{1 B C 2}$ and ${\hat{λ}}_{1 G M}$ , respectively. We refer to Jena et al.^[22] for details of the expressions of these plug-in estimators for estimating λ₁.

The primary focus of this study is to construct several testing procedures for testing the hypothesis H₀ against the alternative H_a. The above point estimators will be utilized in constructing certain generalized test procedures (see Subsection 3.4). In the subsequent section, we discuss various test procedures, including these generalized tests.

3. Test Methodologies for Testing the Variances

Several test procedures to test hypothesis about the variance of the first population, that is, for testing hypothesis on λ₁ are discussed in this section. Specifically, we derive the conventional LRT, a parametric bootstrap version of the LRT, two computational approach-based tests and certain generalized p value approach tests.

3.1. The Likelihood Ratio Test

Let the observed values of $\underset{˜}{X} = (X_{1}, X_{2}, \dots, X_{m})$ and $\underset{˜}{Y} = (Y_{1}, Y_{2}, \dots, Y_{n})$ are denoted as $\underset{˜}{x} = (x_{1}, x_{2}, \dots, x_{m})$ and $\underset{˜}{y} = (y_{1}, y_{2}, \dots, y_{n})$ , respectively. Using these observed samples, the likelihood function $L \{(μ, λ_{1}, λ_{2}) ∣ (\underset{˜}{x}, \underset{˜}{y})\}$ is obtained as:

L \{(μ, λ_{1}, λ_{2}) | (\underset{˜}{x}, \underset{˜}{y})\} \propto λ_{1}^{- m / 2} λ_{2}^{- n / 2} \exp \{- \frac{1}{2 λ_{1}} \sum_{i = 1}^{m} {(x_{i} - μ)}^{2} - \frac{1}{2 λ_{2}} \sum_{j = 1}^{n} {(y_{j} - μ)}^{2}\}

(3.1)

To construct the LRT statistic, it is needed to maximize the likelihood function L over the whole parameter space $Ω = \{(μ, λ_{1}, λ_{2}) : - \infty < μ < \infty, λ_{1} > 0, λ_{2} > 0\}$ as well as over the restricted parameter space Ω₀ ⊂ Ω defined by the constraint λ₁ = λ₀. Thus, the LRT statistic is given by:

Λ = \frac{\sup_{Ω_{0}} L}{\sup_{Ω} L}

The exact distribution of the LRT statistic Λ cannot be obtained since it is difficult to derive the closed-form expression of the MLEs of the associated parameters under null hypothesis H₀ and also over the complete parameter space Ω. Therefore, we use Λ^* = −2 log Λ as the test variable, which is asymptotically a $χ_{(1)}^{2}$ random variable. Thus, the LRT method is reject H₀ when $Λ^{*} > χ_{(1), α}^{2};$ otherwise do not reject H₀. We denote this test method by $ℚ_{L T}$ , and its associated power function is defined by:

Υ_{L T} = P (Λ^{*} > χ_{(1), α}^{2}) .

(3.2)

3.2. The Parametric Bootstrap Likelihood Ratio Test

Since the LRT is asymptotic and its cut-off points are found using the chi-square random variable, it may provide inappropriate findings for small sample sizes. Because of this, LRT is often not appropriate for real-life situations. To handle this problem, Chang et al.^[23] suggested the PBLRT method, where the distribution of the test variable Λ^* is repeated many times using the bootstrap replications generated under H₀. As a consequence, the test’s cut-off points are found automatically. Step-by-step instructions for implementing the PBLRT procedure’s algorithm for computing size values and power functions are as follows:

Step-1: Based on the sample observations(X₁, X₂, …, X_m) and (Y₁, Y₂, …, Y_n) from two populations N(μ, λ₁) and N(μ, λ₂), one determines the value of the test variable Λ^*.

Step-2: Assume that the null hypothesis H₀ is true. Under this situation, determine the MLEs for parameters μ and λ₂, call them as ${\hat{μ}}_{R M L}$ and ${\hat{λ}}_{2 R M L}$ , respectively. Generate bootstrap replicates $(X_{1}^{*}, X_{2}^{*}, \dots, X_{m}^{*})$ from $N ({\hat{μ}}_{R M L}, λ_{0})$ and $(Y_{1}^{*}, Y_{2}^{*}, \dots, Y_{n}^{*})$ from $N ({\hat{μ}}_{R M L}, {\hat{λ}}_{2 R M L})$ , respectively. Based on these bootstrap replicates, determine Λ^* and name it as $Λ_{0}^{*}$ .

Step-3: Compute the estimates $Λ_{01}^{*}, Λ_{02}^{*}, \dots, Λ_{0 D}^{*}$ of Λ^* by repeating the above step (Step-2) for D times. Sort all these Λ^* values in increasing order as $Λ_{0 (1)}^{*} \leq Λ_{0 (2)}^{*} \leq \dots \leq Λ_{0 (D)}^{*}$ .

Step-4: Define $Λ_{U}^{*} = Λ_{0 ((1 - α) D)}^{*}$ as the upper cut-off value. Reject H₀ if $Λ^{*} \geq Λ_{U}^{*}$ ; else, accept H₀. Call this test as $ℚ_{P B}$ and the corresponding power function is defined as

Υ_{P B} = P (Λ^{*} \geq Λ_{U}^{*}) .

(3.3)

3.3. The Computational Approach Test

Pal et al.^[24] proposed a general approach for testing hypothesis, known as the CAT. In general, the CAT technique is a parametric bootstrap-based testing procedure; it is a computational method that utilizes the MLEs and operates without requiring prior knowledge of the sampling distribution of the test variable. This section applies the CAT procedures to our model problem to test H₀ against H_a. Step-by-step instructions for implementing the CAT procedure’s algorithm for calculating size values and power functions are given below:

Step-1: Based on the sample observations (X₁, X₂, …, X_m) and (Y₁, Y₂, …, Y_n) from two populations N(μ, λ₁) and N(μ, λ₂), one must evaluate the associated MLEs ${\hat{μ}}_{M L}, {\hat{λ}}_{1 M L}$ and ${\hat{λ}}_{2 M L}$ , numerically. Let us denote the MLE of λ₁ as δ^*, that is, $δ^{*} = {\hat{λ}}_{1 M L}$ .

Step-2: Assuming that H₀ holds, obtain restricted MLEs ˆμRML and λˆ2RML for the parameters μ and λ₂,, respectively.

Step-3: Generate bootstrap replicates $(X_{1}^{*}, X_{2}^{*}, \dots, X_{m}^{*})$ from $N ({\hat{μ}}_{R M L}, λ_{0})$ and $(Y_{1}^{*}, Y_{2}^{*}, \dots, Y_{n}^{*})$ from $N ({\hat{μ}}_{R M L}, {\hat{λ}}_{2 R M L})$ for E of times (‘E is sufficiently large’). For each iteration, determine the MLE for λ₁ and denote it with the notation $δ_{0}^{*}$ . Successively, we obtain the MLE values of λ₁ as $δ_{01}^{*}, δ_{02}^{*}, \dots, δ_{0 E}^{*}$ . Sort all these $δ_{0}^{*}$ values in increasing order as $δ_{0 (1)}^{*} \leq δ_{0 (2)}^{*} \leq \dots \leq δ_{0 (E)}^{*}$ .

Step-4: Let $δ_{U}^{*} = δ_{0 ((1 - \frac{α}{2}) E)}^{*}$ and $δ_{L}^{*} = δ_{0 (\frac{α}{2} E)}^{*}$ are the upper and lower cut-off point values, respectively. The decision rule of the $CAT (ℚ_{C T})$ is reject H₀ when $δ^{*} \leq δ_{L}^{*}$ or $δ^{*} \geq δ_{U}^{*}$ ; otherwise do not reject H₀, and the corresponding power is computed as:

Υ_{C T} = P (δ^{*} \leq δ_{L}^{*} \cup δ^{*} \geq δ_{U}^{*}) .

(3.4)

Further, we consider a modified version of the CAT known as the modified CAT. The two-sided hypothesis testing problem given in Equation (1.1) can be reformulated as a one-sided test as:

H_{0}^{*} : g (λ_{1}) = {(λ_{1} - λ_{0})}^{2} = 0 against H_{a}^{*} : g (λ_{1}) > 0.

(3.5)

The algorithmic steps needed to perform the modified CAT (denote as $ℚ_{M T}$ ) procedure for testing the right tailed hypothesis $H_{0}^{*} : g (λ_{1}) = 0$ against the alternative $H_{a}^{*} : g (λ_{1}) > 0$ are as follows:

Bootstrap replicates are generated as described earlier and the quantity $θ_{0}^{*} = {(δ_{0}^{*} - λ_{0})}^{2}$ is computed for many times (say, E times), yielding the values $θ_{01}^{*}, θ_{02}^{*}, \dots, θ_{0 E}^{*}$ . These values are then arranged in ascending order as $θ_{0 (1)}^{*} \leq θ_{0 (2)}^{*} \leq \dots \leq θ_{0 (E)}^{*}$ .

Compute the statistic $θ^{*} = {(δ^{*} - λ_{0})}^{2}$ . Reject the hypothesis $H_{0}^{*} if θ^{*} > θ_{0 ((1 - α) E)}^{*}$ , else reject $H_{0}^{*}$ . The power function of $ℚ_{M T}$ is:

Υ_{M T} = P (θ^{*} > θ_{0 ((1 - α) E)}^{*}) .

(3.6)

3.4. The Generalized p Value Tests

This well-known technique was first introduced by Tsui and Weerahandi^[25] and has since been effectively employed by academicians for constructing test statistics; for example, see Khatun et al.^[26] It is noted that these test methods have been generalized for fiducial inference by Hannig^[27] and Hannig et al.^[28] The following definition and remark facilitate the construction of test statistics for hypothesis testing.

Consider a variable Y, whose distribution depends on parameters (β, κ), where β represents the parameter of primary interest and κ is a nuisance parameter. Suppose the interest is in testing the hypothesis:

H_{0} : β \leq β_{0} against H_{a} : β > β_{0},

(3.7)

where β₀ is a known constant.

Definition 3.1. A variable Q(Y; y, β, κ) is said to be a generalized test variable (GTV) for testing the hypothesis given in Equation (3.7), provided it fulfils the following three criteria.

For a fixed y, the distribution of Q(Y; y, β, κ) is free from the nuisance parameter κ.

The value of Q(Y; y, β, κ) at Y = y, that is, the observed value of Q(Y; y, β, κ) is free of any unknown parameters.

When the values y and κ are fixed, the distribution of Q(Y; y, β, κ) is either stochastically increasing or decreasing with respect to β.

Remark 3.1. Let q = Q(y; y, β, κ) represent the observed value of Q(Y; y, β, κ) at Y = y. When the distribution function of Q(Y; y, β, κ) exhibits a stochastically increasing behaviour with respect to β, the generalized p value is calculated by:

\sup_{H_{0}} P {Q (Y; y, β, κ) \geq q} = P \{Q (Y; y, β_{0}, κ) \geq q\},

(3.8)

and when the distribution of Q(Y; y, β, κ) exhibits a stochastically decreasing behaviour with respect to β, the generalized p-value is determined by:

\sup_{H_{0}} P {Q (Y; y, β, κ) \leq q} = P \{Q (Y; y, β_{0}, κ) \leq q\} .

(3.9)

Next, we explore the idea of the generalized p value approach to test the hypothesis about the variance parameter λ₁. Considering the unbiased estimator ${\hat{λ}}_{1 U}$ , we construct the generalized pivot variable (GPV) and the corresponding GTV to test $H_{0} : λ_{1} = λ_{0}$ versus $H_{a} : λ_{1} \neq λ_{0}$ as:

G_{U} = (m - 1) \times \frac{{\tilde{λ}}_{1 U}}{W_{1}} and ℚ_{U} = G_{U} - λ_{1},

respectively, where ${\hat{λ}}_{1 U}$ denotes the observed value of the unbiased estimator ${\hat{λ}}_{1 U}$ and $W_{1} = S_{x}^{2} / λ_{1} \sim χ_{(m - 1)}^{2}$ distribution. For a given value of $(\bar{x}, \bar{y}, s_{x}^{2}, s_{y}^{2})$ , it is straightforward to observe that the distribution of $ℚ_{U}$ is free of any unknown parameters and meets all the requirements for being a GTV. As a result, the generalized p value for testing H₀ against H_a is calculated as:

2 \times \min \{P (G_{U} \geq λ_{0}), P (G_{U} \leq λ_{0})\} .

(3.10)

It is noted that for testing hypothesis about λ₁, the generalized test based on the unbiased estimator ${\hat{λ}}_{1 U}$ uses the chi-square distribution, but it is different from the usual chi-square test. Next, utilizing all the plug-in estimators of λ₁, we attempt to obtain the GTV for testing H₀ against H_a. First, consider the plug-in estimator ${\hat{λ}}_{1 G D}$ , which uses the Graybill and Deal^[1] estimator ${\hat{μ}}_{G D}$ . Using the estimator ${\hat{λ}}_{1 G D}$ , we suggest the GPV for λ₁ as:

G_{G D} = \frac{(m - 1) (\frac{s_{x}^{2}}{W_{1}}) {\tilde{λ}}_{1 G D}}{[s_{x}^{2} + m W {(\frac{n (n - 1) s_{x}^{2}}{m (m - 1) s_{y}^{2} + n (n - 1) s_{x}^{2}})}^{2}]},

where $W = χ_{(1)}^{2} (\frac{s_{x}^{2}}{m W_{1}} + \frac{s_{y}^{2}}{n W_{2}}), W_{2} = S_{y}^{2} / λ_{2} \sim χ_{(n - 1)}^{2}$ and ${\tilde{λ}}_{1 G D}$ denotes the observed value of ${\tilde{λ}}_{1 G D}$ . For a fixed choice of $(\bar{x}, \bar{y}, s_{x}^{2}, s_{y}^{2})$ , the distribution function of $G_{G D}$ is free from the unknown parameters and also equal to λ₁, the parameter of interest. Using this GPV $G_{G D}$ , we propose the $GTV ℚ_{G D}$ as $ℚ_{G D} = G_{G D} - λ_{1}$ and the corresponding p value can be determined by:

2 \times \min \{P (G_{G D} \geq λ_{0}), P (G_{G D} \leq λ_{0})\} .

(3.11)

In a very similar manner, we use the other plug-in estimators of λ₁, namely ${\hat{λ}}_{1 K S}, {\hat{λ}}_{1 M K}, {\hat{λ}}_{1 T K}, {\hat{λ}}_{B C 1}, {\hat{λ}}_{1 B C 2}$ and ${\hat{λ}}_{1 G M}$ , for constructing the generalized tests $ℚ_{K S}, ℚ_{M K}, ℚ_{T K}, ℚ_{B C 1}, ℚ_{B C 2}$ and $ℚ_{G M}$ , respectively. Then we compute their size and power functions, for numerical comparison purposes.

Remark 3.2. The simulation results reveal that all these generalized tests, which are based on combined estimators that utilizes information from both populations, compete with each other in terms of power and size values. Further, we note that the estimator ${\hat{μ}}_{G D}$ proposed by Graybill and Deal^[1] is the most popular and commonly used estimator for the ‘common mean’. Thus, for brevity and convenience, we choose only the test $ℚ_{G D}$ among the tests based on combined estimators for presentation purposes, along with other proposed estimators in tables in Section 4. However, the details of the results and conclusions have been discussed.

3.5. Simulation Study

Note that none of the above test methods has a closed-form expression, except for the generalized p value approach tests. Also, it is hard to figure out the exact distribution of the generalized test variables. As a result, an analytic evaluation of either their size values or power functions is quite impossible. Therefore, in this section, we try to assess and compare the efficiencies of the suggested test techniques in terms of size values and power functions numerically using the high-level computing resources accessible today.

To compute the sizes and powers of the suggested test techniques, we have generated 10, 000 random observations of size m and n from N(μ, λ₁) and N(μ, λ₂), respectively. The simulation study has been done using the program through R software version 4.3.1. The inner loop has been replicated 10, 000 times for the PBLRT, CATs and generalized test procedures. The size of each suggested test is evaluated by setting μ = 0, λ₁ = 1, and varying ρ via the value of $\sqrt{λ_{2}}$ from 0.25 to 4.0 because all the tests are location invariant and depend only upon $ρ = \sqrt{λ_{2}} / \sqrt{λ_{1}}$ . The size and power of each test method is obtained at α = 0.05 nominal level, and the standard error of the simulation is fixed above the 0.002 threshold.

Table 1.

Size Values of All the Test Methods of λ1 for Sample Sizes (m, n) = (5, 5), (10, 10), (15, 15), (30, 30), (5, 10), (15, 25), (10, 5), (25, 15) with α = 0.05.

ρ	$ℚ_{L T}$	$ℚ_{P B}$	$ℚ_{C T}$	$ℚ_{M T}$	$ℚ_{U}$	$ℚ_{G D}$
0.25	0.0589	0.0550	0.0545	0.0480	0.0489	0.0475
	0.0527	0.0480	0.0490	0.0495	0.0487	0.0499
	0.0505	0.0530	0.0485	0.0450	0.0445	0.0468
	0.0493	0.0500	0.0495	0.0500	0.0503	0.0495
	0.0570	0.0660	0.0570	0.0535	0.0496	0.0489
	0.0510	0.0480	0.0410	0.0425	0.0491	0.0498
	0.0567	0.0520	0.0595	0.0465	0.0483	0.0474
	0.0518	0.0520	0.0510	0.0505	0.0458	0.0465
1.00	0.0871	0.0440	0.0430	0.0420	0.0498	0.0457
	0.0649	0.0460	0.0515	0.0495	0.0497	0.0504
	0.0599	0.0560	0.0515	0.0490	0.0510	0.0508
	0.0550	0.0560	0.0520	0.0540	0.0493	0.0487
	0.0811	0.0420	0.0520	0.0560	0.0512	0.0491
	0.0588	0.0570	0.0505	0.0570	0.0475	0.0499
	0.0708	0.0490	0.0525	0.0515	0.0526	0.0498
	0.0564	0.0530	0.0450	0.0435	0.0524	0.0526
2.50	0.1105	0.0490	0.0545	0.0495	0.0503	0.0511
	0.0772	0.0530	0.0430	0.0460	0.0493	0.0502
	0.0617	0.0420	0.0480	0.0445	0.0524	0.0530
	0.0553	0.0510	0.0465	0.0440	0.0499	0.0501
	0.0969	0.0480	0.0545	0.0625	0.0486	0.0479
	0.0625	0.0450	0.0505	0.0470	0.0510	0.0512
	0.0883	0.0500	0.0580	0.0535	0.0526	0.0550
	0.0596	0.0500	0.0445	0.0440	0.0501	0.0504
4.00	0.1125	0.0550	0.0495	0.0520	0.0499	0.0508
	0.0723	0.0570	0.0490	0.0480	0.0491	0.0496
	0.0690	0.0480	0.0465	0.0455	0.0478	0.0474
	0.0539	0.0530	0.0475	0.0435	0.0467	0.0467
	0.1028	0.0450	0.0500	0.0495	0.0524	0.0533
	0.0662	0.0670	0.0610	0.0560	0.0483	0.0494
	0.0832	0.0570	0.0535	0.5210	0.0492	0.0507
	0.0516	0.0430	0.0455	0.0515	0.0530	0.0530

An extensive simulation analysis was carried out for a range of sample size combinations and parameter values, but for the sake of concise presentation, only the size and power functions corresponding to selected configurations are reported. Initially, we determined the size values of all test methods using the null hypothesis H₀ and the simulated results are reported in Table 1. In Table 1, for each choice of ρ (in the first column), there are eight values representing the size values corresponding to eight different sample combinations. Moreover, the powers of some selected tests (those that attain the significance level within 20 per cent) are computed for several sample size combinations, and these are tabulated in Table 32. In Table 2, for a particular value of λ₁, there are eight values that represent the powers for eight different sample combinations. The entries in the tables are to be interpreted sequentially from top to bottom according to the specified order of the sample sizes. A detailed summary of the performance of all the test methods, along with the key findings from Tables 1 and 2, is presented below:

For testing the hypothesis about λ₁, except for the LRT, the tests $ℚ_{P B}, ℚ_{C T}, ℚ_{M T}$ and all the generalized tests attain the size values within 20 per cent of the nominal level α = 0.05. Since the LRT is unable to reach the nominal level within 20 per cent it is not considered for power comparison.

The tests obtained using the generalized p value approach are pretty much conservative, in the sense that they more consistently achieve the level α = 0.05. The PBLRT and both CAT and modified CAT are quite liberal and attain the size value less frequently.

In comparing the efficiencies of the selected tests (those that attain the size value α = 0.05 within 20 per cent) based on their powers, it is observed that the power increases with an increase in sample sizes. Furthermore, for a fixed sample size, the power functions of these tests are seen to increase as the values of λ₁ become larger.

In terms of power, a clear-cut ranking among all the generalized test procedures, namely $ℚ_{U}, ℚ_{G D}, ℚ_{K S}, ℚ_{M K}, ℚ_{T K}, ℚ_{B C 1}, ℚ_{B C 2}$ and $ℚ_{G M}$ , is not possible; that is, these tests exhibit similar power behaviour, with no single method uniformly outperforming the others.

In terms of power performance, the modified $CAT (ℚ_{M T})$ is found to outperform all the proposed test methods and is therefore recommended for use. The generalized tests come in second position, followed by the $CAT (ℚ_{C T})$ , and finally the $PBLRT (ℚ_{P B})$ , for most of the parameter values and sample sizes.

Table 2.

Power Comparison of Some Selected Tests of λ1 for Sample Sizes (m, n) = (5, 5), (10, 10), (15, 15), (30, 30), (5, 10), (15, 25), (10, 5), (25, 15) with α = 0.05.

λ1	$Υ_{P B}$	$Υ_{C T}$	$Υ_{M T}$	$ℚ_{U}$	$ℚ_{G D}$
1.10	0.0360	0.0450	0.0680	0.0565	0.0522
	0.0510	0.0580	0.0730	0.0596	0.0588
	0.0540	0.0685	0.0805	0.0630	0.0638
	0.0640	0.0825	0.0980	0.0684	0.0686
	0.0570	0.0675	0.0705	0.0573	0.0540
	0.0450	0.0600	0.0885	0.0653	0.0653
	0.0520	0.0605	0.0775	0.0614	0.0582
	0.0580	0.0765	0.0925	0.0727	0.0746
1.50	0.0790	0.1345	0.1870	0.1250	0.1147
	0.1150	0.1895	0.2530	0.1773	0.1787
	0.1730	0.2420	0.3080	0.2348	0.2385
	0.3440	0.4190	0.4795	0.4008	0.4024
	0.0800	0.1400	0.1885	0.1250	0.1251
	0.1860	0.2450	0.3085	0.2459	0.2507
	0.1380	0.1730	0.2445	0.1872	0.1717
	0.2780	0.3560	0.4095	0.3430	0.3429
3.00	0.3150	0.4535	0.5305	0.4483	0.4394
	0.6480	0.7280	0.7890	0.7048	0.7124
	0.8190	0.8485	0.8830	0.8474	0.8528
	0.9280	0.9790	0.9835	0.9841	0.9841
	0.3700	0.4910	0.5730	0.4471	0.4777
	0.8250	0.8730	0.9030	0.8501	0.8588
	0.6280	0.6795	0.7600	0.7080	0.6992
	0.9530	0.9835	0.9865	0.9626	0.9637
4.00	0.5010	0.6105	0.6845	0.6007	0.6054
	0.8150	0.8625	0.9005	0.8546	0.8649
	0.9310	0.9635	0.9755	0.9516	0.9561
	1.0000	1.0000	1.0000	0.9989	0.9992
	0.5220	0.6540	0.7240	0.5917	0.6314
	0.9480	0.9600	0.9715	0.9503	0.9558
	0.7970	0.8670	0.9055	0.8572	0.8542
	0.9930	0.9955	0.9970	0.9952	0.9954

Table 3.

The p Values of all the Test Procedures for Testing λ1 with α = 0.05.

Test→	$ℚ_{L T}$	$ℚ_{P B}$	$ℚ_{C T}$	$ℚ_{M T}$	$ℚ_{U}$	$ℚ_{G D}$
λ1	.8444	.8493	.6920	.8290	.6132	.7280

4. Real-life Applications

In this section, two real-life applications are considered, and the corresponding datasets are utilized to demonstrate the proposed test methodologies.

Example 4.1: In this example, a real data set related to measurements of the acceleration due to gravity is analysed to demonstrate the practical usefulness of the suggested test methods. The data set is obtained from Cressie,^[29] which describes the measurements of the acceleration of gravity conducted by Heyl and Cook^[30], expressed as deviations from 9.8 m/s² at Washington, D.C. In that study, eight distinct series of observations were examined and presented in Table 1 of Cressie.^[29] To illustrate the performance of the proposed test procedures, two of these series are selected from Table 1 and reproduced below:

Data-1: 105, 83, 76, 75, 51, 76, 93, 75, 62.

Data-2: 84, 86, 85, 82, 77, 76, 77, 80, 83, 81, 78, 78, 78.

Utilizing the Shapiro–Wilk test, it is seen that both the data sets follow a normal distribution at α = 0.05 level, and the corresponding p values are obtained as .7309 and .2702, respectively. Using the F-test, as the ratio of the sample variances (Data-1 to Data-2) is greater than F_8,13, _α _/ ₂, we conclude that both population variances are different.

Again, the equality of the mean parameters is checked at α = 0.05 significance level considering Welch’s t-test, and it holds true with a p value of 0.5829. Considering these two data sets, the hypothesis (i) H₀ : λ₁ = 209.4191 versus $H_{a} : λ_{1} \neq 209.4191$ is tested and the p values are tabulated in Table 3. In this data set, the parameters are completely unknown, and we estimate them using the MLEs. For this test, the null value λ₀ is chosen to value different from its MLE. The resulting p values suggest that the null hypothesis is not rejected by any of the test procedures.

Example 4.2: In this example, a summary data from Eberhardt et al.^[31] are considered to illustrate the practical applicability of the suggested test methods. The data set pertains to measurements of selenium concentration in non-fat milk powder (expressed in ng/g) obtained using four different methods, and was previously analysed by Krishnamoorthy and Lu ^[12] within the framework of the ‘common mean’ model.

Table 4.

The p Values of all the Test Procedures for Testing λ1 with α = 0.05.

Test→	$ℚ_{L T}$	$ℚ_{P B}$	$ℚ_{C T}$	$ℚ_{M T}$	$ℚ_{U}$	$ℚ_{G D}$
λ1	.4328	.4425	.3560	.3908	.2664	.3548

For the present analysis, two groups have been selected, with summary statistics given by $m = 12, n = 14, \bar{x} = 109.75, \bar{y} = 109.5, s_{x}^{2} = 228.2282$ and $s_{v}^{2} = 35.6564$ . Using these statistics, we try to test the hypothesis (i) H₀:λ = 14.0693 versus $H_{a} : λ \neq 14.0693$ and the corresponding p values of the tests are presented in Table 4. In this data set, the parameters are completely unknown and are estimated using the MLEs. For testing the hypothesis (i), in the null hypothesis, the value of λ₀ is considered as a value different from the MLE. The resulting p values suggest that the null hypothesis is not rejected by any of the test procedures.

5. Concluding Remarks

In this article, we studied the problem of hypothesis testing about the variances of two normal populations under the constraint of a common mean. The main purpose is to propose some test procedures based on information from the other population through common mean estimators. Several test procedures are proposed, such as the LRT, the PBLRT, the CAT along with its modified version, and tests based on the generalized p value framework. It is noteworthy that, with the exception of the generalized test $ℚ_{U}$ , all the tests contain information from the second population while testing the variance of the first population. A Monte Carlo simulation study has been conducted, and the performances of the proposed test procedures are compared numerically.

Interestingly, all the generalized tests, including the test $ℚ_{U}$ , which is based on information only from the first population, attain the size value. However, in terms of power function, they compete with each other. Moreover, the simulation study reveals that the modified $CAT ℚ_{M T}$ erforms better among all the proposed tests for testing the hypothesis about λ₁. Finally, we considered two real-life examples and implemented our test methodologies using the data sets.

The theoretical and computational results regarding the performances of all the test procedures presented in this study remain valid for testing hypotheses on the standard deviation, precision and other powers of the variances under the common mean set-up for two normal populations.

Footnotes

Acknowledgements

The authors would like to express gratitude to the reviewers and the editor-in-chief for their insightful comments that led to substantial improvement of the manuscript. The first author (Pravash Jena) gratefully acknowledges the support of the Odisha State Higher Education Council (OSHEC), Odisha, India, under the Mukhyamantri Research and Innovation Program [MRI/24EM/MT/89]. The second author (Manas Ranjan Tripathy) gratefully recognizes the financial assistance provided by the Science and Engineering Research Board (SERB), Department of Science and Technology [CRG/2023/002586].

The Link for R-Codes

https://github.com/manasmath/Testing-Variances-for-Normal-Populations-with-Common-Mean

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors received no financial support for the research, authorship and/or publication of this article.

ORCID iD

Manas Ranjan Tripathy

References

Graybill

and Deal

RB.

Combining unbiased estimators. Biometrics 1959; 15(4): 543–550.

Kubokawa

Estimation of a common mean of two normal distributions. Tsukuba J Math 1987; 11(1): 157–175.

Pal

and Sinha

BK.

Estimation of a common mean of several normal populations. Far East J Math Sci 1996; 4: 97–110.

Pal

, Lin

, Chang

, . A revisit to the common mean problem: comparing the maximum likelihood estimator with the Graybill–Deal estimator. Comput Stat Data Anal 2007; 51(12): 5673–5681.

Tripathy

and Kumar

Estimating a common mean of two normal populations. J Stat Theory Appl 2010; 9(2): 197–215.

Rukhin

AL.

Estimation of the common mean from heterogeneous normal observations with unknown variances. J R Stat Soc Ser B Stat Methodol 2017; 79(5): 1601–1618.

Kelleher

A Bayes method for estimating the common mean of two normal populations. Commun Stat Theory Methods 1996; 25(9): 2141–2157.

Mitra

and Sinha

BK.

On some aspects of estimation of a common mean of two independent normal populations. J Stat Plan Inference 2007; 137(1): 184–193.

Fairweather

WR.

A method of obtaining an exact confidence interval for the common mean of several normal populations. J R Stat Soc Ser C Appl Stat 1972; 21(3): 229–233.

10.

Jordan

and Krishnamoorthy

Exact confidence intervals for the common mean of several normal populations. Biometrics 1996; 52(1): 77–86.

11.

Cohen

and Sackrowitz

Testing hypotheses about the common mean of normal distributions. J Stat Plan Inference 1984; 9(2): 207–227.

12.

Krishnamoorthy

and Lu

Inferences on the common mean of several normal populations based on the generalized variable method. Biometrics 2003; 59(2): 237–247.

13.

Chang

and Pal

Testing on the common mean of several normal distributions. Comput Stat Data Anal 2008; 53(2): 321–333.

14.

and Williamson

PP.

Testing on the common mean of normal distributions using Bayesian methods. J Stat Comput Simul 2014; 84(6): 1363–1380.

15.

Malekzadeh

and Kharrati-Kopaei

Inferences on the common mean of several normal populations under heteroscedasticity. Comput Stat 2018; 33(3): 1367–1384.

16.

Strawderman

WE.

Minimax estimation of powers of the variance of a normal population under squared error loss. Ann Stat 1974; 2(1): 190–198.

17.

Maruyama

and Strawderman

WE.

A new class of minimax generalized Bayes estimators of a normal variance. J Stat Plan Inference 2006; 136(11): 3822–3836.

18.

Zou

, Zeng

, Wan

, . Stein-type improved estimation of standard error under asymmetric LINEX loss function. Statistics 2009; 43(2): 121–129.

19.

Bobotas

and Kourouklis

On the estimation of a normal precision and a normal variance ratio. Stat Methodol 2010; 7(4): 445–463.

20.

Tripathy

, Kumar

, and Pal

Estimating common standard deviation of two normal populations with ordered means. Stat Methods Appl 2013; 22(3): 305–318.

21.

Patra

, Kayal

, and Kumar

Minimax estimation of the common variance and precision of two normal populations with ordered restricted means. Stat Pap 2019; 62: 209–233.

22.

Jena

, Tripathy

, and Kumar

Point and interval estimation of powers of scale parameters for two normal populations with a common mean. Stat Pap 2023; 64(5): 1775–1804.

23.

Chang

, Pal

, and Lin

JJ.

A note on comparing several Poisson means. Commun Stat Simul Comput 2010; 39(8): 1605–1627.

24.

Pal

, Lim

, and Ling

CH.

A computational approach to statistical inferences. J Appl Probab Stat 2007; 2(1): 13–35.

25.

Tsui

and Weerahandi

Generalized p-values in significance testing of hypotheses in the presence of nuisance parameters. J Am Stat Assoc 1989; 84(406): 602–607.

26.

Khatun

, Tripathy

, and Pal

Hypothesis testing and interval estimation for quantiles of two normal populations with a common mean. Commun Stat Theory Methods 2022; 51(16): 5692–5713.

27.

Hannig

On generalized fiducial inference. Stat Sin 2009; 19(2): 491–544.

28.

Hannig

, Iyer

, Lai

, . Generalized fiducial inference: a review and new results. J Am Stat Assoc 2016; 111(515): 1346–1361.

29.

Cressie

Jackknifing in the presence of inhomogeneity. Technometrics 1997; 39(1): 45–51.

30.

Heyl

and Cook

GS.

The value of gravity at Washington. J Res U S Bur Stand 1936; 17: 805–839.

31.

Eberhardt

, Reeve

, and Spiegelman

CH.

A minimax approach to combining means, with practical examples. Chemom Intell Lab Syste 1989; 5(2): 129–148.