The asymptotic solution to the problem of comparing the means of two heteroscedastic populations, based on two random samples from the populations, hinges on the pivot underpinning the construction of the confidence interval and the test statistic being asymptotically standard normal, which is known to happen if the two samples are independent and the ratio of the sample sizes converges to a finite positive number. This restriction on the asymptotic behavior of the ratio of the sample sizes carries the risk of rendering the asymptotic justification of the finite sample approximation invalid. It turns out that neither the restriction on the asymptotic behavior of the ratio of the sample sizes nor the assumption of cross sample independence is necessary for the pivotal convergence in question to take place. If the joint distribution of the standardized sample means converges to a spherically symmetric distribution, then that distribution must be bivariate standard normal (which can happen without the assumption of cross sample independence), and the aforesaid pivotal convergence holds.
The objective of this note is to critically examine the asymptotic solution to the problem of comparing the means of two populations with finite, unequal variances. Let be a random sample from the first population with mean and variance , and a random sample from the second population with mean and variance . When all the parameters , , , and are unknown, under the assumption of independence of the two samples, the traditional asymptotic % confidence interval for is
where, for , and , and denotes the th percentile of the standard normal distribution. The test statistic for testing the null hypothesis is
which is calibrated on the standard normal scale for the calculation of the observed level of significance and the rejection regions at various levels of significance.
The simple idea underlying this widely used method is frequently presented in undergraduate (e.g., Wackerly, Mendenhall, and Scheaffer [2008]) and beginning graduate (e.g., Casella and Berger [2002]) textbooks on mathematical statistics. As Casella and Berger (2002, p. 492) explain, the idea is to obtain a point estimator of the parameter of interest with variance such that is asymptotically standard normal and, if the calculation of involves an unknown parameter other than , a consistent estimator of , so that is also asymptotically standard normal by Slutsky's theorem (Mukhopadhyay [2000, Theorem 5.3.3]); can then be used as a pivot for , and the tools for inference on the normal mean can be applied. However, Mukhopadhyay (2000) seems to be the only one to explicitly consider the two sample problem from this point of view. Assuming that the two samples are independent and the ratio , he asserts (p. 544) the asymptotic standard normality of the pivot
underpinning the formulas of the confidence interval in (1.1) and the test statistic in (1.2).
While asymptotically studying a statistical problem using two samples, many authors (see, among others, DasGupta [2008], Pyke and Shorack [1968], Ramdas, Trillos and Cuturi [2017], & van der Vaart and Wellner [1996]) require that as , though this requirement risks rendering the asymptotic justification of the finite sample approximation invalid. For example, the two-sample problem with (think and ) is a fairly common design, at which rate ; the finite sample approximation involved in using the pivot to draw inference on in such a design may not have a rigorous asymptotic justification.
To investigate if we can derive the asymptotic standard normality of without the restriction that the ratio of the sample sizes converges to a finite positive number, we formulate the question in terms of weak convergence. For a separable metric space , let denote the Borel -algebra of and the set of probability measures on . Endowed with the topology of weak convergence, is metrizable as a separable metric space (Parthasarathy [1967, Theorem II.6.2]). Since all the random elements under consideration are Borel measurable, convergence in distribution is equivalent to weak convergence of the induced probability measures (van der Vaart and Wellner [1996, p. 18]). With
the asymptotic standard normality of is equivalent to the convergence of the double sequence to in , where denotes the standard normal measure on . Faced with the question of convergence of a double sequence, we examine both the iterated and double limits of .
Before proceeding further, let us spell out without any ambiguity what the assumption of independence of the two samples means in this context. Hereinafter, iid will abbreviate independent and identically distributed. We are making the following three assumptions:
The assumption in (1.7) is often stated as the ’s being independent of the ’s. Note that the triplet of assumptions (1.5), (1.6), and (1.7) is equivalent to the pair of assumptions:
We are not aware of any result on the iterated limits of . Proposition 1 observes that both iterated limits of equal under only (1.5) and (1.6), that is, without (1.7). As far as the double limit of is concerned, the only published result we are aware of is Mukhopadhyay (2000) cited above. However, by requiring that as , Mukhopadhyay (2000) does not obtain the double limit of .
We obtain the double limit of by using the fact that a double sequence is a net; see Appendix B for the definition of a net and related details. Let denote the set of natural numbers; then is a directed set under the partial ordering defined by the condition that if and only if . A double sequence taking values in converges to as if and only if the corresponding net converges to . Thus, our objective reduces to obtaining
Proposition 2 shows that (1.5) and (1.6) are not sufficient for (1.10). Proposition 3 shows that (1.10) is implied by the convergence of the joint distribution of the standardized sample means to a spherically symmetric distribution, which implies that the limiting spherically symmetric distribution is bivariate standard normal (see Remark 1). Corollary 1 and Remarks 3 and 4 investigate the question of necessity of the convergence of the joint distribution of the standardized sample means to the bivariate standard normal distribution for (1.10), to which Proposition 4 furnishes a partial answer, and Remark 6 outlines the setup wherein a complete answer is obtainable. Detailed statements of these results and their proofs constitute Section 2. The technical results that we draw upon for the proofs of our results are assembled in eight lemmas spread over three Appendices, A, B, and C. Since none of these lemmas contains any original result of significance, we skip their proofs, and refer the reader to the Appendix of Majumdar and Majumdar (2017b).
Results and Proofs
In what follows, unless otherwise specified, we will assume that the index runs from to and are as in (1.5) and (1.6).
Proposition 1. With as in (1.4),
For subsequent use, let us introduce the following notations. For , let denote and denote . For , let denote the centered normal measure with variance , so that and is the point mass at . For , let denote the bivariate normal distribution with means , variances , and correlation coefficient . When (respectively, ), the support of the bivariate normal distribution is the one dimensional subspace represented by the straight line (respectively, ); obviously, in either of these cases, does not have a density with respect to the two dimensional Lebesgue measure. Since we do not make any use of the Lebesgue density of any member of the family of bivariate normal distributions, we need not accord any special treatment to either of these two cases. In this context, it may be noted that we conceptualize a bivariate normal distribution as an element such that every linear functional on (endowed with ) induces a normal distribution on the line. Thus, is the product measure , the bivariate standard normal distribution. Let denote the order preserving and cofinal map given by . Further, let denote the order preserving and cofinal map that maps to its th coordinate . For , let
Proposition 2. Let be iid, with being the common distribution. Assume . Then (1.5) and (1.6) hold, but (1.10) does not.
Definition 1. For , let denote the set of isometries on (Axler [2015, Defnition 7.37]). An element is called spherically symmetric if the Pettis (that is, coordinate wise) integral of the identity function with respect to is , and, for every , , where is the measure induced by , that is, for any . Let denote the set of spherically symmetric elements in .
Proposition 3. For , let denote the standardized sample mean
and the measure induced by . Then (1.10) is implied by
Remark 1. The sufficient condition for (1.10) stated in (2.2) is equivalent to
Since , (2.3) implies (2.2). To show the converse, it suffices to assume (2.2) and show , equivalently, for every unit vector . Since given any two unit vectors there exists an isometry mapping one to the other, by the spherical symmetry of ,
Since, by the continuous mapping theorem for nets (Lemma C3) and (2.2),
it suffices to show that .
For , let denote the measure induced by , which, by (1.5) or (1.6), converges to by the Central Limit Theorem (CLT, hereinafter; Dudley [1989, Theorem 9.5.6]), that is,
Since is order preserving and cofinal, is a subnet of ; since convergence in is metrizable and any subnet of a convergent net has the same limit (Lemma B1),
Since , the converse follows from (2.4).
Remark 2. Note that (1.7) implies
since the product of two weakly convergent nets of probability measures converges to the product of the limits (Lemma C4), (2.5) implies (2.3) via (2.4). Thus, the folklore sufficient conditions (1.5), (1.6), and (1.7), equivalently, (1.8) and (1.9), do imply (2.3), and consequently (1.10), without requiring to converge to .
Remark 1. Does (1.10) imply (2.3)? While we cannot construct a counterexample where (1.10) holds, above and beyond (1.5) and (1.6), but (2.3) does not, because of Corollary 1 we do not believe that the affirmative answer holds.
Corollary 1. Recall the definition of from (1.3) and define
let denote the measure induced by on . Then (2.3) implies
Remark 4. If (1.10) were to imply (2.3), by Corollary 1, (1.10) would have to imply (2.6) as well. We see no reason why (1.10), under only (1.5) and (1.6), would imply (2.6) (or vice-versa). However, Proposition 4 does connect (1.10), (2.3), and (2.6).
Proposition 4. If
exists, and (1.10) and (2.6) hold, then (2.3) holds.
Remark 5. By (2.4) and Proposition 9.3.4 of Dudley (1989), is uniformly tight. Consequently, , being contained in , is uniformly tight as well. By Tychonoff's theorem and Bonferroni's inequality,
Can we establish Proposition 4 without assuming (2.7), substituting it by the conclusion drawn in (2.8)? We do not think so, though a counterexample eludes us.
Lemma 1 is a vital cog in the wheel of our investigation of whether (1.10) implies (2.3).
Lemma 1. With denoting the measure induced by (the non-studentized two-sample pivot)
on , (1.10) is equivalent to
Remark 6. Proposition 4 strengthens our belief that (1.10) does not imply (2.3), but we are, as mentioned above, unable to construct a counterexample. One of the major obstacles to constructing such a counterexample is the fact that it is simply impossible to get a handle on the asymptotic distribution of either or unless we are willing to assume some specific dependence structure (including independence) for the sequence . If we assume, above and beyond (1.5) and (1.6), that
then Theorem 1 of Majumdar and Majumdar (2017a) shows that (1.10) implies (2.3) (which renders Proposition 4 moot), by showing that the convergence of the Cesaro means of the sequence of cross-sample correlation coefficients to is a sufficient condition for (2.3) that turns out to be necessary for (2.9), equivalently, by Lemma 1, (1.10).
The assumption in (2.10), being the assumption in (1.8) with the identically distributed requirement removed, is weaker than (1.8). The aforesaid convergence of Cesaro means assumption is substantially weaker than (1.9). It is easy to see that if we combine the convergence of Cesaro means assumption with (1.5), (1.6), and (2.10), the resulting collection is weaker than the pair of assumptions in (1.8) and (1.9). All we have to do is to consider a pair of dependent but uncorrelated random variables and a sequence of iid copies of the resulting random vector. By Theorem 1 of Majumdar and Majumdar (2017a), (2.3) can hold without (1.7).
We now present the proofs of the results stated previously.
Proof of Proposition 1. The key to the proof is the algebraic representation
where
Now, let us fix and let . Since converges (in probability) to (Lemma C1), converges to and converges to . By the CLT and Slutsky's theorem, the first term in RHS(2.11) converges in distribution (and, by Theorem 4.2.9 of Fabian and Hannan [1985], in probability) to . Since the second term in RHS(2.11) converges in distribution to , another application of Slutsky's theorem leads to the conclusion that, for fixed , as
Since converges in probability to (Lemma C1), by the CLT and Slutsky's theorem, converges, as , in distribution to .
The same argument, with and interchanged, shows that if we fix and let , converges in distribution to , which, as , converges in distribution to . □
Proof of Proposition 2. Clearly, is an iid collection of standard Normal random variables, showing that (1.5) and (1.6) hold, with and . The measure induced by on is for every , implying that the measure induced by is for every . Since converges in probability to as (Lemma C1) and , by Slutsky's theorem the subnet of converges to , implying, by Lemma B1 and the assumption , that (1.10) does not hold. □
Proof of Proposition 3. By Lemma B2, to show (1.10) it suffices to show that given an arbitrary subnet of , there exists a further subnet such that . For , (2.12) implies
Let denote the one-point compactification of (Dudley [1989, Theorem 2.8.1]). Since every net taking values in a compact set has a convergent subnet (Lemma B3), every subnet of has a further subnet such that
Since convergence in probability on Euclidian spaces is metrizable (Dudley [1989, Theorem 9.2.2]) and is order preserving and cofinal, by Lemma C1 and Lemma B1, converges in probability to ; consequently, by Lemma B1, (2.14), and (2.15), in probability
Note that depends on the subnet through ; implies and , implies and , and, in general
By (2.2) and Lemma B1, converges to . Since by (2.11), where , by Slutsky's theorem for nets (Lemma C2) and the continuous mapping theorem for nets (Lemma C3), (2.16) implies
Since given any two unit vectors there exists an isometry mapping one to the other, by (2.17) and the spherical symmetry of , RHS(2.18) does not depend on the subnet . By Lemma B2, exists for some . Since an iterated limit exists and equals the double limit if the latter and the inner limit of the former exists (Lemma A1), by (2.13),; since by Proposition 1, (1.10) follows. □
Remark 7. DasGupta (2008, p. 403) considers the Behrens-Fisher problem of comparing the means of two independent heteroscedastic normal populations and the two sample t-statistic that uses the pooled variance
for studentization, that is
as a potential pivot. He observes that if the ratio converges to , that is, the design is asymptotically balanced, then the asymptotic distribution of is standard normal. As outlined below, that observation is a consequence of (1.10), showing that neither the normality of the populations nor their cross-sample independence is necessary for it.
Note that , where
Given an arbitrary subnet of , there exists a further subnet such that (2.15) holds. Since is order preserving and cofinal, and convergence in probability on Euclidian spaces is metrizable, by Lemma C1 and Lemma B1, in probability
consequently, from Lemma C1 and Lemma B1 again, in probability
Let denote the distribution induced by . Since , by Slutsky's theorem for nets, (1.10) implies
For asymptotically balanced designs, =1 for every subnet , and Dasgupta's observation follows. However, is an inferior choice for a pivot compared to , as asymptotic standard normality of holds for all designs by (1.10), whereas that of holds only for asymptotically balanced designs.
Proof of Corollary 1. The proof of Proposition 3 applies verbatim once we replace in that proof by and note that . □
Proof of Proposition 4. As observed in Remark 1, , that is, (2.3) holds, if and only if for every unit vector .
Corresponding to every , there exist two unit vectors: in the first quadrant and in the fourth quadrant. Since is invariant under the map , it suffices to show that for every ,
Given , let , where is interpreted to be . We will show that for every , there exists a directed set and an order preserving and cofinal such that the subnet of converges to , after using this fact to establish (2.19).
Let be the subnet of that converges to . That, as in (2.16), implies, in probability (and hence, in distribution)
By Lemma B1 and (2.7),
using and from the proofs of Proposition 3, and Corollary 1, Lemma C2, and Lemma C3,
whence (2.19) follows from (2.6) and (1.10) by Lemma B1.
If , let and define , where is the integer part function on . Clearly, is order preserving. To show that is cofinal, given choose such that ; since , . Since , the convergence of to follows.
If , let and define . Clearly, is order preserving. To show that is cofinal, given choose such that ; trivially, , implying . Since , the convergence of to follows.
If , let and define , where
is nondecreasing in , implying that is order preserving. To show that is cofinal, given choose to equal , where
since , is positive and is well-defined. Also, implies the quadratic (in ) is convex, so that every greater than , the bigger root of the quadratic, satisfies the inequality
equivalently, the inequality
Since , , implying . Finally, from the definition of ,
implying
since and , . □
Proof of Lemma 1. For , let
so that
By Lemmas C2 and C3, it suffices to show that converges in distribution to . Since convergence in distribution is metrizable, by Lemma B2 it suffices to show that given an arbitrary subnet of , there exists a further subnet such that converges in distribution to 1. Recall from (2.15) the existence of a subnet that converges to . Since converges (in distribution) to and converges in probability (and hence, in distribution) to , the assertion follows from (2.19) (for , use the last representation) and Lemmas C2 and C3. □
Remark 8. To what extent can the findings of this note be extended to the –sample problem? Obviously, there is no unique parameter of interest in the –sample problem that can be interpreted as the natural extension of and consequently, there is no natural extension of the pivot to a pivot . That said, Majumdar and Majumdar (2017a) conjectures in Remark 6 that if the assumption
which is an extension of (2.10), holds, then the assumption of entry wise convergence of the Cesaro means of the sequence of dispersion matrices of to the identity matrix is necessary and sufficient for the convergence of the joint distribution of the standardized sample means from random samples to the –variate standard normal distribution. If that conjecture is correct and we are to prove it by extending the tools we have developed for the two sample problem, then we have to formulate appropriate extensions of the findings of this note.
The only approach towards that goal that we can think of involves consideration of the entire collection of contrasts. At this point in time we do not have any idea regarding the feasibility of this approach, but it is definitely worth investigating.
Double and iterated limits
Given a metric space , a –valued double sequence is defined to be a function . We write as ; recall that converges to as if, for every , there exists such that and imply .
Now suppose that for every fixed value of , exists. Then, is a –valued sequence. If exists, then is an iterated limit of the double sequence and we write . As illustrated in Section 8.20 of Apostol (1974), the existence of one iterated limit does not imply the existence of the other one (with , let ), the existence of both iterated limits does not imply their equality (with , let ), and the equality of the two iterated limits does not imply the existence of the double limit (with , let ). However, an iterated limit exists and equals the double limit if the latter and the inner limit of the former exists.
Lemma A1. If the double limit of , as , exists and is equal to , then existence of for each fixed implies .
Nets and subnets
A set endowed with a reflexive, anti symmetric, and transitive binary relation is called a partially ordered set. The pair is called a directed set if, for each , there exists such that and .
Given a metric space and a directed set , a –valued net is defined to be a function ; we write the net as . Recall that the net converges to if, for every , there exists such that implies . It is worth recalling here that a –valued sequence is a particular –valued net.
Let and be directed sets. Let be order preserving, that is, , and cofinal, that is, for each , there exists such that . Then the composite function , where , defines a net in , is called a subnet of , and is written as .
Lemma B1. Let be a directed set and a net taking values in that converges to . Then every subnet of converges to .
Lemma B2. Let be a directed set and a net taking values in . Then converges to if and only if every subnet of has a further subnet that converges to .
Lemma B3. is compact if and only if every net in has a convergent subnet.
Miscellaneous results from probability
We have used Lemmas C1, C2, C3, and C4 of this subsection in the note.
Lemma C1. [Consistency of sample standard deviation] As , converges almost surely (and hence, in probability) to .
Lemma C2. [Slutsky's theorem for nets] Let and be metric spaces and a directed set. Let and be nets of random elements taking values in and , respectively, such that weakly and weakly, where is a separable random element, that is, there exists a Borel measurable separable subset of such that , and is a constant. Then, as a net of random elements taking values in , weakly.
Lemma C3. [Continuous mapping theorem for nets] Let be a net of random elements taking values in such that in distribution. Let be a continuous function. Then in distribution in .
Lemma C4. Let and be separable metric spaces such that and are two nets converging weakly to and , respectively. Then the net of product measures converges weakly to .
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
DasGuptaA.Asymptotic theory of statistics and probability. New York, NY: Springer; 2008.
5.
DudleyRM.Real analysis and probability(1st ed.). Pacific Grove, CA: Wadsworth and Brooks/Cole; 1989.
6.
FabianVHannanJ.Introduction to probability and mathematical statistics. New York, NY: Wiley; 1985.
7.
MajumdarRMajumdarS.Necessary and suffcient condition for asymptotic normality of standardized sample means. 2017a; Retrieved from https://arxiv.org/abs/1710.07275.
8.
MajumdarRMajumdarS.On asymptotic standard normality of the two sample pivot. 2017b; Retrieved from https://arxiv.org/abs/1710.08051.
9.
MukhopadhyayN.Probability and statistical inference. New York, NY: Marcel Dekker; 2000.
10.
ParthasarathyKR.Probability measures on metric spaces. New York, NY: Academic Press; 1967.
11.
PykeRShorackGR.Weak convergence of a two-sample empirical process and a new approach to Chernoff-Savage theorems. The Annals of Mathematical Statistics, 39(3):755–771; 1968.
12.
RamdasATrillosNCCuturiM.On Wasserstein two-sample testing and related families of nonparametric tests. Entropy, 19(2): 47; 2017.
13.
van der VaartAWWellnerJA.Weak convergence and empirical processes. New York, NY: Springer; 1996.
14.
WackerlyDDMendenhallWScheafferRL.Mathematical statistics with applications. Belmont, CA: Brooks/Cole; 2008.