• 8. Hypothesis Testing

4. Tests in the Two-Sample Normal Model

In this section, we will study hypothesis tests in the two-sample normal model and in the bivariate normal model. This section parallels the section on Estimation in the Two Sample Normal Model in the chapter on Interval Estimation .

The Two-Sample Normal Model

Suppose that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) is a random sample of size \(m\) from the normal distribution with mean \(\mu\) and standard deviation \(\sigma\), and that \(\bs{Y} = (Y_1, Y_2, \ldots, Y_n)\) is a random sample of size \(n\) from the normal distribution with mean \(\nu\) and standard deviation \(\tau\). Moreover, suppose that the samples \(\bs{X}\) and \(\bs{Y}\) are independent .

This type of situation arises frequently when the random variables represent a measurement of interest for the objects of the population, and the samples correspond to two different treatments. For example, we might be interested in the blood pressure of a certain population of patients. The \(\bs{X}\) vector records the blood pressures of a control sample, while the \(\bs{Y}\) vector records the blood pressures of the sample receiving a new drug. Similarly, we might be interested in the yield of an acre of corn. The \(\bs{X}\) vector records the yields of a sample receiving one type of fertilizer, while the \(\bs{Y}\) vector records the yields of a sample receiving a different type of fertilizer.

Usually our interest is in a comparison of the parameters (either the mean or variance) for the two sampling distributions. In this section we will construct tests for the for the difference of the means and the ratio of the variances. As with previous estimation problems we have studied, the procedures vary depending on what parameters are known or unknown. Also as before, key elements in the construction of the tests are the sample means and sample variances and the special properties of these statistics when the sampling distribution is normal.

We will use the following notation for the sample mean and sample variance of a generic sample \(\bs{U} = (U_1, U_2, \ldots, U_k)\): \[ M(\bs{U}) = \frac{1}{k} \sum_{i=1}^k U_i, \quad S^2(\bs{U}) = \frac{1}{k - 1} \sum_{i=1}^k [U_i - M(\bs{U})]^2 \]

Tests of the Difference in the Means with Known Standard Deviations

Our first discussion concerns tests for the difference in the means \(\nu - \mu\) under the assumption that the standard deviations \(\sigma\) and \(\tau\) are known. This is often, but not always, an unrealistic assumption. In some statistical problems, the variances are stable, and are at least approximately known, while the means may be different because of different treatments. Also this is a good place to start because the analysis is fairly easy.

For a conjectured difference of the means \( \delta \in \R \), define the test statistic \[ Z = \frac{[M(\bs{Y}) - M(\bs{X})] - \delta}{\sqrt{\sigma^2 / m + \tau^2 / n}} \]

  • If \( \nu - \mu = \delta \) then \( Z \) has the standard normal distribution.
  • If \( \nu - \mu \ne \delta \) then \(Z\) has the normal distribution with mean \([(\nu - \mu) - \delta] \big/ {\sqrt{\sigma^2 / m + \tau^2 / n}}\) and variance 1.

From properties of normal samples, \( M(\bs{X}) \) has a normal distribution with mean \( \mu \) and variance \( \sigma^2 / m \) and similarly \( M(\bs{Y}) \) has a normal distribution with mean \( \nu \) and variance \( \tau^2 / n \). Since the samples are independent, \( M(\bs{X}) \) and \( M(\bs{Y}) \) are independent, so \( M(\bs{Y}) - M(\bs{X}) \) has a normal distribution with mean \( \nu - \mu \) and variance \( \sigma^2 / m + \sigma^2 / n \). The final result then follows since \( Z \) is a linear function of \( M(\bs{Y}) - M(\bs{X}) \).

Of course (b) actually subsumes (a), but we separate them because the two cases play an impotrant role in the hypothesis tests. In part (b), the non-zero mean can be viewed as a non-centrality parameter .

As usual, for \(p \in (0, 1)\), let \(z(p)\) denote the quantile of order \(p\) for the standard normal distribution. For selected values of \(p\), \(z(p)\) can be obtained from the quantile app or from most statistical software packages. Recall also by symmetry that \(z(1 - p) = -z(p)\).

For every \( \alpha \in (0, 1) \), the following tests have significance level \(\alpha\):

  • Reject \(H_0: \nu - \mu = \delta\) versus \(H_1: \nu - \mu \ne \delta\) if and only if \(Z \lt -z(1 - \alpha / 2)\) or \(Z \gt z(1 - \alpha / 2)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \gt \delta + z(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \) or \( M(\bs{Y}) - M(\bs{X}) \lt \delta - z(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \).
  • Reject \(H_0: \nu - \mu \ge \delta\) versus \(H_1: \nu - \mu \lt \delta\) if and only if \(Z \lt -z(1 - \alpha)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \lt \delta - z(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \).
  • Reject \(H_0: \nu - \mu \le \delta\) versus \(H_1: \nu - \mu \gt \delta\) if and only if \(Z \gt z( 1 - \alpha)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \gt \delta + z(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \).

This follows the same logic that we have seen before. In part (a), \( H_0 \) is a simple hypothesis, and under this hypothesis \( Z \) has the standard normal distribution. Thus, if \( H_0 \) is true then the probability of falsely rejecting \( H_0 \) is \( \alpha \) by definition of the quantiles. In parts (b) and (c), \( H_0 \) specifies a range of values of \( \nu - \mu \), and under \( H_0 \), \( Z \) has a nonstandard normal distribution, by . But the largest type 1 error probability is \( \alpha \) and occurs when \( \nu - \mu = \delta \). The decision rules in terms of \( M(\bs{Y}) - M(\bs{X}) \) are equivalent to those in terms of \( Z \) by simple algebra.

For each of the tests above, we fail to reject \(H_0\) at significance level \(\alpha\) if and only if \(\delta\) is in the corresponding \(1 - \alpha\) level confidence interval.

  • \( [M(\bs{Y}) - M(\bs{X})] - z(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \le \delta \le [M(\bs{Y}) - M(\bs{X})] + z(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • \( \delta \le [M(\bs{Y}) - M(\bs{X})] + z(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • \( \delta \ge [M(\bs{Y}) - M(\bs{X})] - z(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)

These results follow from . In each case, we start with the inequality that corresponds to not rejecting the null hypothesis and solve for \( \delta \).

Tests of the Difference of the Means with Unknown Standard Deviations

Next we will construct tests for the difference in the means \(\nu - \mu\) under the more realistic assumption that the standard deviations \(\sigma\) and \(\tau\) are unknown. In this case, it is more difficult to find a suitable test statistic, but we can do the analysis in the special case that the standard deviations are the same. Thus, we will assume that \(\sigma = \tau\), and the common value \(\sigma\) is unknown. This assumption is reasonable if there is an inherent variability in the measurement variables that does not change even when different treatments are applied to the objects in the population.

Recall that the pooled estimate of the common variance \(\sigma^2\) is the weighted average of the sample variances, with the degrees of freedom as the weight factors: \[ S^2(\bs{X}, \bs{Y}) = \frac{(m - 1) S^2(\bs{X}) + (n - 1) S^2(\bs{Y})}{m + n - 2} \] The statistic \( S^2(\bs{X}, \bs{Y}) \) is an unbiased and consistent estimator of the common variance \( \sigma^2 \).

For a conjectured \( \delta \in \R \) define the test statistc \[ T = \frac{[M(\bs{Y}) - M(\bs{X})] - \delta}{S(\bs{X}, \bs{Y}) \sqrt{1 / m + 1 / n}} \]

  • If \( \nu - \mu = \delta \) then \( T \) has the \(t\) distribution with \(m + n - 2\) degrees of freedom,
  • If \( \nu - \mu \ne \delta \) then \( T \) has a non-central \( t \) distribution with \( m + n - 2 \) degrees of freedom and non-centrality parameter \[ \frac{(\nu - \mu) - \delta}{\sigma \sqrt{1/m + 1 /n}} \]

Part (b) actually subsumes part (a), since the ordinary \( t \) distribution is a special case of the non-central \( t \) distribution, with non-centrality parameter 0. With some basic algebra, we can write \( T \) in the form \[ T = \frac{Z + a}{\sqrt{V \big/ (m + n - 2)}}\] where \( Z \) is the standard score of \( M(\bs{Y}) - M(\bs{X}) \), \( a \) is the non-centrality parameter given in the theorem, and \( V = \frac{m + n - 2}{\sigma^2} S^2(\bs{X}, \bs{Y}) \). So \( Z \) has the standard normal distribution, \( V \) has the chi-square distribution with \( m + n - 2 \) degrees of freedom, and \( Z \) and \( V \) are independent. Thus by definition, \( T \) has the non-central \( t \) distribution with \( m + n - 2 \) degrees of freedom and non-centrality parameter \( a \).

As usual, for \(k \gt 0\) and \(p \in (0, 1)\), let \(t_k(p)\) denote the quantile of order \(p\) for the \(t\) distribution with \(k\) degrees of freedom. For selected values of \(k\) and \(p\), values of \(t_k(p)\) can be computed from the quantile app , or from most statistical software packages. Recall also that, by symmetry, \(t_k(1 - p) = -t_k(p)\).

The following tests have significance level \(\alpha\):

  • Reject \(H_0: \nu - \mu = \delta\) versus \(H_1: \nu - \mu \ne \delta\) if and only if \(T \lt -t_{m + n - 2}(1 - \alpha / 2)\) or \(T \gt t_{m + n - 2}(1 - \alpha / 2)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \gt \delta + t_{m+n-2}(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \) or \( M(\bs{Y}) - M(\bs{X}) \lt \delta - t_{m+n-2}(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • Reject \(H_0: \nu - \mu \ge \delta\) versus \(H_1: \nu - \mu \lt \delta\) if and only if \(T \le -t_{m-n+2}(1 - \alpha)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \lt \delta - t_{m+n-2}(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • Reject \(H_0: \nu - \mu \le \delta\) versus \(H_1: \nu - \mu \gt \delta\) if and only if \(T \ge t_{m-n+2}(1 - \alpha)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \gt \delta + t_{m+n-2}(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)

This follows the same logic that we have seen before. In part (a), \( H_0 \) is a simple hypothesis, and under this hypothesis \( T \) has the \( t \) distribution with \( m + n - 2 \) degrees of freedom. Thus, if \( H_0 \) is true then the probability of falsely rejecting \( H_0 \) is \( \alpha \) by definition of the quantiles. In parts (b) and (c), \( H_0 \) specifies a range of values of \( \nu - \mu \), and under \( H_0 \), \( T \) has a non-central \( t \) distribution by . But the largest type 1 error probability is \( \alpha \) and occurs when \( \nu - \mu = \delta \). The decision rules in terms of \( M(\bs{Y}) - M(\bs{X}) \) are equivalent to those in terms of \( T \) by simple algebra.

  • \( [M(\bs{Y}) - M(\bs{X})] - t_{m+n-2}(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \le \delta \le [M(\bs{Y}) - M(\bs{X})] + t_{m+n-2}(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • \( \delta \le [M(\bs{Y}) - M(\bs{X})] + t_{m+n-2}(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • \( \delta \ge [M(\bs{Y}) - M(\bs{X})] - t_{m+n-2}(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)

Tests of the Ratio of the Variances

Next we will construct tests for the ratio of the distribution variances \(\tau^2 / \sigma^2\). So the basic assumption is that the variances, and of course the means \(\mu\) and \(\nu\) are unknown.

For a conjectured \( \rho \in (0, \infty) \), define the test statistics \[ F = \frac{S^2(\bs{X})}{S^2(\bs{Y})} \rho \]

  • If \( \tau^2 / \sigma^2 = \rho \) then \( F \) has the \(F\) distribution with \(m - 1\) degrees of freedom in the numerator and \(n - 1\) degrees of freedom in the denominator.
  • If \( \tau^2 / \sigma^2 \ne \rho \) then \( F \) has a scaled \( F \) distribution with \( m - 1 \) degrees of freedom in the numerator, \( n - 1 \) degrees of freedom in the denominator, and scale factor \( \rho \frac{\sigma^2}{\tau^2} \).

Part (b) actually subsumes part (a) when \( \rho = \tau^2 / \rho^2 \), so we will just prove (b). Note that \[ F = \left(\frac{S^2(\bs{X}) \big/ \sigma^2}{S^2(\bs{Y}) \big/ \tau^2}\right) \rho \frac{\sigma^2}{\tau^2} \] But \( S^2(\bs{X}) \big/ \sigma^2 \) has the chi-square distribution with \( m - 1 \) degrees of freedom, \( S^2(\bs{Y}) \big/ \tau^2 \) has the chi-square distribution with \( n - 1 \) degrees of freedom, and the variables are independent. Hence the ratio has the \( F \) distribution with \( m - 1 \) degrees of freedom in the numerator and \( n - 1 \) degrees of freedom in the denominator

The following tests have significance level \( \alpha \):

  • Reject \(H_0: \tau^2 / \sigma^2 = \rho\) versus \(H_1: \tau^2 / \sigma^2 \ne \rho\) if and only if \(F \gt f_{m-1, n-1}(1 - \alpha / 2)\) or \(F \lt f_{m-1, n-1}(\alpha / 2 )\).
  • Reject \(H_0: \tau^2 / \sigma^2 \le \rho\) versus \(H_1: \tau^2 / \sigma^2 \gt \rho\) if and only if \(F \lt f_{m-1, n-1}(\alpha)\).
  • Reject \(H_0: \tau^2 / \sigma^2 \ge \rho\) versus \(H_1: \tau^2 / \sigma^2 \lt \rho\) if and only if \(F \gt f_{m-1, n-1}(1 - \alpha)\).

The proof is the usual argument. In part (a), \( H_0 \) is a simple hypothesis, and under this hypothesis \( F \) has the \( f \) distribution with \( m - 1 \) degrees of freedom in the numerator \( n - 1 \) degrees of freedom in the denominator. Thus, if \( H_0 \) is true then the probability of falsely rejecting \( H_0 \) is \( \alpha \) by definition of the quantiles. In parts (b) and (c), \( H_0 \) specifies a range of values of \( \tau^2 / \sigma^2 \), and under \( H_0 \), \( F \) has a scaled \( F \) distribution by thoerem . But the largest type 1 error probability is \( \alpha \) and occurs when \( \tau^2 / \sigma^2 = \rho \).

For each of the tests above, we fail to reject \(H_0\) at significance level \(\alpha\) if and only if \(\rho_0\) is in the corresponding \(1 - \alpha\) level confidence interval.

  • \( \frac{S^2(\bs{Y})}{S^2(\bs{X})} F_{m-1,n-1}(\alpha / 2) \le \rho \le \frac{S^2(\bs{Y})}{S^2(\bs{X})} F_{m-1,n-1}(1 - \alpha / 2) \)
  • \(\rho \le \frac{S^2(\bs{Y})}{S^2(\bs{X})} F_{m-1,n-1}(\alpha) \)
  • \( \rho \ge \frac{S^2(\bs{Y})}{S^2(\bs{X})} F_{m-1,n-1}(1 - \alpha) \)

These results follow from . In each case, we start with the inequality that corresponds to not rejecting the null hypothesis and solve for \( \rho \).

Tests in the Bivariate Normal Model

In this subsection, we consider a model that is superficially similar to the two-sample normal model, but is actually much simpler. Suppose that \[ ((X_1, Y_1), (X_2, Y_2), \ldots, (X_n, Y_n)) \] is a random sample of size \(n\) from the bivariate normal distribution of \((X, Y)\) with \(\E(X) = \mu\), \(\E(Y) = \nu\), \(\var(X) = \sigma^2\), \(\var(Y) = \tau^2\), and \(\cov(X, Y) = \delta\).

Thus, instead of a pair of samples , we have a sample of pairs . The fundamental difference is that in this model, variables \( X \) and \( Y \) are measured on the same objects in a sample drawn from the population, while in the previous model, variables \( X \) and \( Y \) are measured on two distinct samples drawn from the population. The bivariate model arises, for example, in before and after experiments , in which a measurement of interest is recorded for a sample of \(n\) objects from the population, both before and after a treatment. For example, we could record the blood pressure of a sample of \(n\) patients, before and after the administration of a certain drug.

We will use our usual notation for the sample means and variances of \(\bs{X} = (X_1, X_2, \ldots, X_n)\) and \(\bs{Y} = (Y_1, Y_2, \ldots, Y_n)\) in definition . Recall also that the sample covariance of \( (\bs{X}, \bs{Y}) \) is \[ S(\bs{X}, \bs{Y}) = \frac{1}{n - 1} \sum_{i=1}^n [X_i - M(\bs{X})][Y_i - M(\bs{Y})] \] (not to be confused with the pooled estimate of the standard deviation in definition ).

The sequence of differences \(\bs{Y} - \bs{X} = (Y_1 - X_1, Y_2 - X_2, \ldots, Y_n - X_n)\) is a random sample of size \(n\) from the distribution of \(Y - X\). The sampling distribution is normal with

  • \(\E(Y - X) = \nu - \mu\)
  • \(\var(Y - X) = \sigma^2 + \tau^2 - 2 \, \delta\)

The sample mean and variance of the sample of differences are

  • \(M(\bs{Y} - \bs{X}) = M(\bs{Y}) - M(\bs{X})\)
  • \(S^2(\bs{Y} - \bs{X}) = S^2(\bs{X}) + S^2(\bs{Y}) - 2 \, S(\bs{X}, \bs{Y})\)

The sample of differences \(\bs{Y} - \bs{X}\) fits the normal model for a single variable. The section on tests in the mormal ,odel could be used to perform tests for the distribution mean \(\nu - \mu \) and the distribution variance \(\sigma^2 + \tau^2 - 2 \delta\).

Computational Exercises

A new drug is being developed to reduce a certain blood chemical. A sample of 36 patients are given a placebo while a sample of 49 patients are given the drug. The statistics (in mg) are \(m_1 = 87\), \(s_1\ = 4\), \(m_2 = 63\), \(s_2 = 6\). Test the following at the 10% significance level:

  • \(H_0: \sigma_1 = \sigma_2\) versus \(H_1: \sigma_1 \ne \sigma_2\).
  • \(H_0: \mu_1 \le \mu_2\) versus \(H_1: \mu_1 \gt \mu_2\) (assuming that \(\sigma_1 = \sigma_2\)).
  • Based on (b), is the drug effective?
  • Test statistic 0.4, critical values 0.585, 1.667. Reject \(H_0\).
  • Test statistic 1.0, critical values \(\pm 1.6625\). Fail to reject \(H_0\).
  • Probably not

A company claims that an herbal supplement improves intelligence. A sample of 25 persons are given a standard IQ test before and after taking the supplement. The before and after statistics are \(m_1 = 105\), \(s_1 = 13\), \(m_2 = 110\), \(s_2 = 17\), \(s_{1, \, 2} = 190\). At the 10% significance level, do you believe the company's claim?

Test statistic 2.8, critical value 1.3184. Reject \(H_0\).

In Fisher's iris data , consider the petal length variable for the samples of Versicolor and Virginica irises. Test the following at the 10% significance level:

  • \(H_0: \mu_1 \le \mu_2\) versus \(\mu_1 \gt \mu_2\) (assuming that \(\sigma_1 = \sigma_2\)).
  • Test statistic 1.1, critical values 0.6227, 1.6072. Fail to reject \(H_0\).
  • Test statistic \(-11.4\), critical value \(-1.6602\). Reject \(H_0\).

A plant has two machines that produce a circular rod whose diameter (in cm) is critical. A sample of 100 rods from the first machine as mean 10.3 and standard deviation 1.2. A sample of 100 rods from the second machine has mean 9.8 and standard deviation 1.6. Test the following hypotheses at the 10% level.

  • \(H_0: \mu_1 = \mu_2\) versus \(H_1: \mu_1 \ne \mu_2\) (assuming that \(\sigma_1 = \sigma_2\)).
  • Test statistic 0.56, critical values 0.7175, 1.3942. Reject \(H_0\).
  • Test statistic \(-4.97\), critical values \(\pm 1.645\). Reject \(H_0\).

An Introduction to Bayesian Thinking

Chapter 5 hypothesis testing with normal populations.

In Section 3.5 , we described how the Bayes factors can be used for hypothesis testing. Now we will use the Bayes factors to compare normal means, i.e., test whether the mean of a population is zero or compare the means of two groups of normally-distributed populations. We divide this mission into three cases: known variance for a single population, unknown variance for a single population using paired data, and unknown variance using two independent groups.

Also note that some of the examples in this section use an updated version of the bayes_inference function. If your local output is different from what is seen in this chapter, or the provided code fails to run for you please make sure that you have the most recent version of the package.

5.1 Bayes Factors for Testing a Normal Mean: variance known

Now we show how to obtain Bayes factors for testing hypothesis about a normal mean, where the variance is known . To start, let’s consider a random sample of observations from a normal population with mean \(\mu\) and pre-specified variance \(\sigma^2\) . We consider testing whether the population mean \(\mu\) is equal to \(m_0\) or not.

Therefore, we can formulate the data and hypotheses as below:

Data \[Y_1, \cdots, Y_n \mathrel{\mathop{\sim}\limits^{\rm iid}}\textsf{Normal}(\mu, \sigma^2)\]

  • \(H_1: \mu = m_0\)
  • \(H_2: \mu \neq m_0\)

We also need to specify priors for \(\mu\) under both hypotheses. Under \(H_1\) , we assume that \(\mu\) is exactly \(m_0\) , so this occurs with probability 1 under \(H_1\) . Now under \(H_2\) , \(\mu\) is unspecified, so we describe our prior uncertainty with the conjugate normal distribution centered at \(m_0\) and with a variance \(\sigma^2/\mathbf{n_0}\) . This is centered at the hypothesized value \(m_0\) , and it seems that the mean is equally likely to be larger or smaller than \(m_0\) , so a dividing factor \(n_0\) is given to the variance. The hyper parameter \(n_0\) controls the precision of the prior as before.

In mathematical terms, the priors are:

  • \(H_1: \mu = m_0 \text{ with probability 1}\)
  • \(H_2: \mu \sim \textsf{Normal}(m_0, \sigma^2/\mathbf{n_0})\)

Bayes Factor

Now the Bayes factor for comparing \(H_1\) to \(H_2\) is the ratio of the distribution, the data under the assumption that \(\mu = m_0\) to the distribution of the data under \(H_2\) .

\[\begin{aligned} \textit{BF}[H_1 : H_2] &= \frac{p(\text{data}\mid \mu = m_0, \sigma^2 )} {\int p(\text{data}\mid \mu, \sigma^2) p(\mu \mid m_0, \mathbf{n_0}, \sigma^2)\, d \mu} \\ \textit{BF}[H_1 : H_2] &=\left(\frac{n + \mathbf{n_0}}{\mathbf{n_0}} \right)^{1/2} \exp\left\{-\frac 1 2 \frac{n }{n + \mathbf{n_0}} Z^2 \right\} \\ Z &= \frac{(\bar{Y} - m_0)}{\sigma/\sqrt{n}} \end{aligned}\]

The term in the denominator requires integration to account for the uncertainty in \(\mu\) under \(H_2\) . And it can be shown that the Bayes factor is a function of the observed sampled size, the prior sample size \(n_0\) and a \(Z\) score.

Let’s explore how the hyperparameters in \(n_0\) influences the Bayes factor in Equation (5.1) . For illustration we will use the sample size of 100. Recall that for estimation, we interpreted \(n_0\) as a prior sample size and considered the limiting case where \(n_0\) goes to zero as a non-informative or reference prior.

\[\begin{equation} \textsf{BF}[H_1 : H_2] = \left(\frac{n + \mathbf{n_0}}{\mathbf{n_0}}\right)^{1/2} \exp\left\{-\frac{1}{2} \frac{n }{n + \mathbf{n_0}} Z^2 \right\} \tag{5.1} \end{equation}\]

Figure 5.1 shows the Bayes factor for comparing \(H_1\) to \(H_2\) on the y-axis as \(n_0\) changes on the x-axis. The different lines correspond to different values of the \(Z\) score or how many standard errors \(\bar{y}\) is from the hypothesized mean. As expected, larger values of the \(Z\) score favor \(H_2\) .

Vague prior for mu: n=100

Figure 5.1: Vague prior for mu: n=100

But as \(n_0\) becomes smaller and approaches 0, the first term in the Bayes factor goes to infinity, while the exponential term involving the data goes to a constant and is ignored. In the limit as \(n_0 \rightarrow 0\) under this noninformative prior, the Bayes factor paradoxically ends up favoring \(H_1\) regardless of the value of \(\bar{y}\) .

The takeaway from this is that we cannot use improper priors with \(n_0 = 0\) , if we are going to test our hypothesis that \(\mu = n_0\) . Similarly, vague priors that use a small value of \(n_0\) are not recommended due to the sensitivity of the results to the choice of an arbitrarily small value of \(n_0\) .

This problem arises with vague priors – the Bayes factor favors the null model \(H_1\) even when the data are far away from the value under the null – are known as the Bartlett’s paradox or the Jeffrey’s-Lindleys paradox.

Now, one way to understand the effect of prior is through the standard effect size

\[\delta = \frac{\mu - m_0}{\sigma}.\] The prior of the standard effect size is

\[\delta \mid H_2 \sim \textsf{Normal}(0, \frac{1}{\mathbf{n_0}})\]

This allows us to think about a standardized effect independent of the units of the problem. One default choice is using the unit information prior, where the prior sample size \(n_0\) is 1, leading to a standard normal for the standardized effect size. This is depicted with the blue normal density in Figure 5.2 . This suggested that we expect that the mean will be within \(\pm 1.96\) standard deviations of the hypothesized mean with probability 0.95 . (Note that we can say this only under a Bayesian setting.)

In many fields we expect that the effect will be small relative to \(\sigma\) . If we do not expect to see large effects, then we may want to use a more informative prior on the effect size as the density in orange with \(n_0 = 4\) . So they expected the mean to be within \(\pm 1/\sqrt{n_0}\) or five standard deviations of the prior mean.

Prior on standard effect size

Figure 5.2: Prior on standard effect size

Example 1.1 To illustrate, we give an example from parapsychological research. The case involved the test of the subject’s claim to affect a series of randomly generated 0’s and 1’s by means of extra sensory perception (ESP). The random sequence of 0’s and 1’s are generated by a machine with probability of generating 1 being 0.5. The subject claims that his ESP would make the sample mean differ significantly from 0.5.

Therefore, we are testing \(H_1: \mu = 0.5\) versus \(H_2: \mu \neq 0.5\) . Let’s use a prior that suggests we do not expect a large effect which leads the following solution for \(n_0\) . Assume we want a standard effect of 0.03, there is a 95% chance that it is between \((-0.03/\sigma, 0.03/\sigma)\) , with \(n_0 = (1.96\sigma/0.03)^2 = 32.7^2\) .

Figure 5.3 shows our informative prior in blue, while the unit information prior is in orange. On this scale, the unit information prior needs to be almost uniform for the range that we are interested.

Prior effect in the extra sensory perception test

Figure 5.3: Prior effect in the extra sensory perception test

A very large data set with over 104 million trials was collected to test this hypothesis, so we use a normal distribution to approximate the distribution the sample mean.

  • Sample size: \(n = 1.0449 \times 10^8\)
  • Sample mean: \(\bar{y} = 0.500177\) , standard deviation \(\sigma = 0.5\)
  • \(Z\) -score: 3.61

Now using our prior in the data, the Bayes factor for \(H_1\) to \(H_2\) was 0.46, implying evidence against the hypothesis \(H_1\) that \(\mu = 0.5\) .

  • Informative \(\textit{BF}[H_1:H_2] = 0.46\)
  • \(\textit{BF}[H_2:H_1] = 1/\textit{BF}[H_1:H_2] = 2.19\)

Now, this can be inverted to provide the evidence in favor of \(H_2\) . The evidence suggests that the hypothesis that the machine operates with a probability that is not 0.5, is 2.19 times more likely than the hypothesis the probability is 0.5. Based on the interpretation of Bayes factors from Table 3.5 , this is in the range of “not worth the bare mention”.

To recap, we present expressions for calculating Bayes factors for a normal model with a specified variance. We show that the improper reference priors for \(\mu\) when \(n_0 = 0\) , or vague priors where \(n_0\) is arbitrarily small, lead to Bayes factors that favor the null hypothesis regardless of the data, and thus should not be used for hypothesis testing.

Bayes factors with normal priors can be sensitive to the choice of the \(n_0\) . While the default value of \(n_0 = 1\) is reasonable in many cases, this may be too non-informative if one expects more effects. Wherever possible, think about how large an effect you expect and use that information to help select the \(n_0\) .

All the ESP examples suggest weak evidence and favored the machine generating random 0’s and 1’s with a probability that is different from 0.5. Note that ESP is not the only explanation – a deviation from 0.5 can also occur if the random number generator is biased. Bias in the stream of random numbers in our pseudorandom numbers has huge implications for numerous fields that depend on simulation. If the context had been about detecting a small bias in random numbers what prior would you use and how would it change the outcome? You can experiment it in R or other software packages that generate random Bernoulli trials.

Next, we will look at Bayes factors in normal models with unknown variances using the Cauchy prior so that results are less sensitive to the choice of \(n_0\) .

5.2 Comparing Two Paired Means using Bayes Factors

We previously learned that we can use a paired t-test to compare means from two paired samples. In this section, we will show how Bayes factors can be expressed as a function of the t-statistic for comparing the means and provide posterior probabilities of the hypothesis that whether the means are equal or different.

Example 5.1 Trace metals in drinking water affect the flavor, and unusually high concentrations can pose a health hazard. Ten pairs of data were taken measuring the zinc concentration in bottom and surface water at ten randomly sampled locations, as listed in Table 5.1 .

Water samples collected at the the same location, on the surface and the bottom, cannot be assumed to be independent of each other. However, it may be reasonable to assume that the differences in the concentration at the bottom and the surface in randomly sampled locations are independent of each other.

Table 5.1: Zinc in drinking water
location bottom surface difference
1 0.430 0.415 0.015
2 0.266 0.238 0.028
3 0.567 0.390 0.177
4 0.531 0.410 0.121
5 0.707 0.605 0.102
6 0.716 0.609 0.107
7 0.651 0.632 0.019
8 0.589 0.523 0.066
9 0.469 0.411 0.058
10 0.723 0.612 0.111

To start modeling, we will treat the ten differences as a random sample from a normal population where the parameter of interest is the difference between the average zinc concentration at the bottom and the average zinc concentration at the surface, or the main difference, \(\mu\) .

In mathematical terms, we have

  • Random sample of \(n= 10\) differences \(Y_1, \ldots, Y_n\)
  • Normal population with mean \(\mu \equiv \mu_B - \mu_S\)

In this case, we have no information about the variability in the data, and we will treat the variance, \(\sigma^2\) , as unknown.

The hypothesis of the main concentration at the surface and bottom are the same is equivalent to saying \(\mu = 0\) . The second hypothesis is that the difference between the mean bottom and surface concentrations, or equivalently that the mean difference \(\mu \neq 0\) .

In other words, we are going to compare the following hypotheses:

  • \(H_1: \mu_B = \mu_S \Leftrightarrow \mu = 0\)
  • \(H_2: \mu_B \neq \mu_S \Leftrightarrow \mu \neq 0\)

The Bayes factor is the ratio between the distributions of the data under each hypothesis, which does not depend on any unknown parameters.

\[\textit{BF}[H_1 : H_2] = \frac{p(\text{data}\mid H_1)} {p(\text{data}\mid H_2)}\]

To obtain the Bayes factor, we need to use integration over the prior distributions under each hypothesis to obtain those distributions of the data.

\[\textit{BF}[H_1 : H_2] = \iint p(\text{data}\mid \mu, \sigma^2) p(\mu \mid \sigma^2) p(\sigma^2 \mid H_2)\, d \mu \, d\sigma^2\]

This requires specifying the following priors:

  • \(\mu \mid \sigma^2, H_2 \sim \textsf{Normal}(0, \sigma^2/n_0)\)
  • \(p(\sigma^2) \propto 1/\sigma^2\) for both \(H_1\) and \(H_2\)

\(\mu\) is exactly zero under the hypothesis \(H_1\) . For \(\mu\) in \(H_2\) , we start with the same conjugate normal prior as we used in Section 5.1 – testing the normal mean with known variance. Since we assume that \(\sigma^2\) is known, we model \(\mu \mid \sigma^2\) instead of \(\mu\) itself.

The \(\sigma^2\) appears in both the numerator and denominator of the Bayes factor. For default or reference case, we use the Jeffreys prior (a.k.a. reference prior) on \(\sigma^2\) . As long as we have more than two observations, this (improper) prior will lead to a proper posterior.

After integration and rearranging, one can derive a simple expression for the Bayes factor:

\[\textit{BF}[H_1 : H_2] = \left(\frac{n + n_0}{n_0} \right)^{1/2} \left( \frac{ t^2 \frac{n_0}{n + n_0} + \nu } { t^2 + \nu} \right)^{\frac{\nu + 1}{2}}\]

This is a function of the t-statistic

\[t = \frac{|\bar{Y}|}{s/\sqrt{n}},\]

where \(s\) is the sample standard deviation and the degrees of freedom \(\nu = n-1\) (sample size minus one).

As we saw in the case of Bayes factors with known variance, we cannot use the improper prior on \(\mu\) because when \(n_0 \to 0\) , then \(\textit{BF}[H1:H_2] \to \infty\) favoring \(H_1\) regardless of the magnitude of the t-statistic. Arbitrary, vague small choices for \(n_0\) also lead to arbitrary large Bayes factors in favor of \(H_1\) . Another example of the Barlett’s or Jeffreys-Lindley paradox.

Sir Herald Jeffrey discovered another paradox testing using the conjugant normal prior, known as the information paradox . His thought experiment assumed that our sample size \(n\) and the prior sample size \(n_0\) . He then considered what would happen to the Bayes factor as the sample mean moved further and further away from the hypothesized mean, measured in terms standard errors with the t-statistic, i.e., \(|t| \to \infty\) . As the t-statistic or information about the mean moved further and further from zero, the Bayes factor goes to a constant depending on \(n, n_0\) rather than providing overwhelming support for \(H_2\) .

The bounded Bayes factor is

\[\textit{BF}[H_1 : H_2] \to \left( \frac{n_0}{n_0 + n} \right)^{\frac{n - 1}{2}}\]

Jeffrey wanted a prior with \(\textit{BF}[H_1 : H_2] \to 0\) (or equivalently, \(\textit{BF}[H_2 : H_1] \to \infty\) ), as the information from the t-statistic grows, indicating the sample mean is as far as from the hypothesized mean and should favor \(H_2\) .

To resolve the paradox when the information the t-statistic favors \(H_2\) but the Bayes factor does not, Jeffreys showed that no normal prior could resolve the paradox .

But a Cauchy prior on \(\mu\) , would resolve it. In this way, \(\textit{BF}[H_2 : H_1]\) goes to infinity as the sample mean becomes further away from the hypothesized mean. Recall that the Cauchy prior is written as \(\textsf{C}(0, r^2 \sigma^2)\) . While Jeffreys used a default of \(r = 1\) , smaller values of \(r\) can be used if smaller effects are expected.

The combination of the Jeffrey’s prior on \(\sigma^2\) and this Cauchy prior on \(\mu\) under \(H_2\) is sometimes referred to as the Jeffrey-Zellener-Siow prior .

However, there is no closed form expressions for the Bayes factor under the Cauchy distribution. To obtain the Bayes factor, we must use the numerical integration or simulation methods.

We will use the function from the package to test whether the mean difference is zero in Example 5.1 (zinc), using the JZS (Jeffreys-Zellener-Siow) prior.

hypothesis testing two normal distributions

With equal prior probabilities on the two hypothesis, the Bayes factor is the posterior odds. From the output, we see this indicates that the hypothesis \(H_2\) , the mean difference is different from 0, is almost 51 times more likely than the hypothesis \(H_1\) that the average concentration is the same at the surface and the bottom.

To sum up, we have used the Cauchy prior as a default prior testing hypothesis about a normal mean when variances are unknown. This does require numerical integration, but it is available in the function from the package. If you expect that the effect sizes will be small, smaller values of \(r\) are recommended.

It is often important to quantify the magnitude of the difference in addition to testing. The Cauchy Prior provides a default prior for both testing and inference; it avoids problems that arise with choosing a value of \(n_0\) (prior sample size) in both cases. In the next section, we will illustrate using the Cauchy prior for comparing two means from independent normal samples.

5.3 Comparing Independent Means: Hypothesis Testing

In the previous section, we described Bayes factors for testing whether the mean difference of paired samples was zero. In this section, we will consider a slightly different problem – we have two independent samples, and we would like to test the hypothesis that the means are different or equal.

Example 5.2 We illustrate the testing of independent groups with data from a 2004 survey of birth records from North Carolina, which are available in the package.

The variable of interest is – the weight gain of mothers during pregnancy. We have two groups defined by the categorical variable, , with levels, younger mom and older mom.

Question of interest : Do the data provide convincing evidence of a difference between the average weight gain of older moms and the average weight gain of younger moms?

We will view the data as a random sample from two populations, older and younger moms. The two groups are modeled as:

\[\begin{equation} \begin{aligned} Y_{O,i} & \mathrel{\mathop{\sim}\limits^{\rm iid}} \textsf{N}(\mu + \alpha/2, \sigma^2) \\ Y_{Y,i} & \mathrel{\mathop{\sim}\limits^{\rm iid}} \textsf{N}(\mu - \alpha/2, \sigma^2) \end{aligned} \tag{5.2} \end{equation}\]

The model for weight gain for older moms using the subscript \(O\) , and it assumes that the observations are independent and identically distributed, with a mean \(\mu+\alpha/2\) and variance \(\sigma^2\) .

For the younger women, the observations with the subscript \(Y\) are independent and identically distributed with a mean \(\mu-\alpha/2\) and variance \(\sigma^2\) .

Using this representation of the means in the two groups, the difference in means simplifies to \(\alpha\) – the parameter of interest.

\[(\mu + \alpha/2) - (\mu - \alpha/2) = \alpha\]

You may ask, “Why don’t we set the average weight gain of older women to \(\mu+\alpha\) , and the average weight gain of younger women to \(\mu\) ?” We need the parameter \(\alpha\) to be present in both \(Y_{O,i}\) (the group of older women) and \(Y_{Y,i}\) (the group of younger women).

We have the following competing hypotheses:

  • \(H_1: \alpha = 0 \Leftrightarrow\) The means are not different.
  • \(H_2: \alpha \neq 0 \Leftrightarrow\) The means are different.

In this representation, \(\mu\) represents the overall weight gain for all women. (Does the model in Equation (5.2) make more sense now?) To test the hypothesis, we need to specify prior distributions for \(\alpha\) under \(H_2\) (c.f. \(\alpha = 0\) under \(H_1\) ) and priors for \(\mu,\sigma^2\) under both hypotheses.

Recall that the Bayes factor is the ratio of the distribution of the data under the two hypotheses.

\[\begin{aligned} \textit{BF}[H_1 : H_2] &= \frac{p(\text{data}\mid H_1)} {p(\text{data}\mid H_2)} \\ &= \frac{\iint p(\text{data}\mid \alpha = 0,\mu, \sigma^2 )p(\mu, \sigma^2 \mid H_1) \, d\mu \,d\sigma^2} {\int \iint p(\text{data}\mid \alpha, \mu, \sigma^2) p(\alpha \mid \sigma^2) p(\mu, \sigma^2 \mid H_2) \, d \mu \, d\sigma^2 \, d \alpha} \end{aligned}\]

As before, we need to average over uncertainty and the parameters to obtain the unconditional distribution of the data. Now, as in the test about a single mean, we cannot use improper or non-informative priors for \(\alpha\) for testing.

Under \(H_2\) , we use the Cauchy prior for \(\alpha\) , or equivalently, the Cauchy prior on the standardized effect \(\delta\) with the scale of \(r\) :

\[\delta = \alpha/\sigma^2 \sim \textsf{C}(0, r^2)\]

Now, under both \(H_1\) and \(H_2\) , we use the Jeffrey’s reference prior on \(\mu\) and \(\sigma^2\) :

\[p(\mu, \sigma^2) \propto 1/\sigma^2\]

While this is an improper prior on \(\mu\) , this does not suffer from the Bartlett’s-Lindley’s-Jeffreys’ paradox as \(\mu\) is a common parameter in the model in \(H_1\) and \(H_2\) . This is another example of the Jeffreys-Zellner-Siow prior.

As in the single mean case, we will need numerical algorithms to obtain the Bayes factor. Now the following output illustrates testing of Bayes factors, using the Bayes inference function from the package.

hypothesis testing two normal distributions

We see that the Bayes factor for \(H_1\) to \(H_2\) is about 5.7, with positive support for \(H_1\) that there is no difference in average weight gain between younger and older women. Using equal prior probabilities, the probability that there is a difference in average weight gain between the two groups is about 0.15 given the data. Based on the interpretation of Bayes factors from Table 3.5 , this is in the range of “positive” (between 3 and 20).

To recap, we have illustrated testing hypotheses about population means with two independent samples, using a Cauchy prior on the difference in the means. One assumption that we have made is that the variances are equal in both groups . The case where the variances are unequal is referred to as the Behren-Fisher problem, and this is beyond the scope for this course. In the next section, we will look at another example to put everything together with testing and discuss summarizing results.

5.4 Inference after Testing

In this section, we will work through another example for comparing two means using both hypothesis tests and interval estimates, with an informative prior. We will also illustrate how to adjust the credible interval after testing.

Example 5.3 We will use the North Carolina survey data to examine the relationship between infant birth weight and whether the mother smoked during pregnancy. The response variable, , is the birth weight of the baby in pounds. The categorical variable provides the status of the mother as a smoker or non-smoker.

We would like to answer two questions:

Is there a difference in average birth weight between the two groups?

If there is a difference, how large is the effect?

As before, we need to specify models for the data and priors. We treat the data as a random sample for the two populations, smokers and non-smokers.

The birth weights of babies born to non-smokers, designated by a subgroup \(N\) , are assumed to be independent and identically distributed from a normal distribution with mean \(\mu + \alpha/2\) , as in Section 5.3 .

\[Y_{N,i} \mathrel{\mathop{\sim}\limits^{\rm iid}}\textsf{Normal}(\mu + \alpha/2, \sigma^2)\]

While the birth weights of the babies born to smokers, designated by the subgroup \(S\) , are also assumed to have a normal distribution, but with mean \(\mu - \alpha/2\) .

\[Y_{S,i} \mathrel{\mathop{\sim}\limits^{\rm iid}}\textsf{Normal}(\mu - \alpha/2, \sigma^2)\]

The difference in the average birth weights is the parameter \(\alpha\) , because

\[(\mu + \alpha/2) - (\mu - \alpha/2) = \alpha\] .

The hypotheses that we will test are \(H_1: \alpha = 0\) versus \(H_2: \alpha \ne 0\) .

We will still use the Jeffreys-Zellner-Siow Cauchy prior. However, since we may expect the standardized effect size to not be as strong, we will use a scale of \(r = 0.5\) rather than 1.

Therefore, under \(H_2\) , we have \[\delta = \alpha/\sigma \sim \textsf{C}(0, r^2), \text{ with } r = 0.5.\]

Under both \(H_1\) and \(H_2\) , we will use the reference priors on \(\mu\) and \(\sigma^2\) :

\[\begin{aligned} p(\mu) &\propto 1 \\ p(\sigma^2) &\propto 1/\sigma^2 \end{aligned}\]

The input to the base inference function is similar, but now we will specify that \(r = 0.5\) .

hypothesis testing two normal distributions

We see that the Bayes factor is 1.44, which weakly favors there being a difference in average birth weights for babies whose mothers are smokers versus mothers who did not smoke. Converting this to a probability, we find that there is about a 60% chance of the average birth weights are different.

While looking at evidence of there being a difference is useful, Bayes factors and posterior probabilities do not convey any information about the magnitude of the effect. Reporting a credible interval or the complete posterior distribution is more relevant for quantifying the magnitude of the effect.

Using the function, we can generate samples from the posterior distribution under \(H_2\) using the option.

The 2.5 and 97.5 percentiles for the difference in the means provide a 95% credible interval of 0.023 to 0.57 pounds for the difference in average birth weight. The MCMC output shows not only summaries about the difference in the mean \(\alpha\) , but the other parameters in the model.

In particular, the Cauchy prior arises by placing a gamma prior on \(n_0\) and the conjugate normal prior. This provides quantiles about \(n_0\) after updating with the current data.

The row labeled effect size is the standardized effect size \(\delta\) , indicating that the effects are indeed small relative to the noise in the data.

Estimates of effect under H2

Figure 5.4: Estimates of effect under H2

Figure 5.4 shows the posterior density for the difference in means, with the 95% credible interval indicated by the shaded area. Under \(H_2\) , there is a 95% chance that the average birth weight of babies born to non-smokers is 0.023 to 0.57 pounds higher than that of babies born to smokers.

The previous statement assumes that \(H_2\) is true and is a conditional probability statement. In mathematical terms, the statement is equivalent to

\[P(0.023 < \alpha < 0.57 \mid \text{data}, H_2) = 0.95\]

However, we still have quite a bit of uncertainty based on the current data, because given the data, the probability of \(H_2\) being true is 0.59.

\[P(H_2 \mid \text{data}) = 0.59\]

Using the law of total probability, we can compute the probability that \(\mu\) is between 0.023 and 0.57 as below:

\[\begin{aligned} & P(0.023 < \alpha < 0.57 \mid \text{data}) \\ = & P(0.023 < \alpha < 0.57 \mid \text{data}, H_1)P(H_1 \mid \text{data}) + P(0.023 < \alpha < 0.57 \mid \text{data}, H_2)P(H_2 \mid \text{data}) \\ = & I( 0 \text{ in CI }) P(H_1 \mid \text{data}) + 0.95 \times P(H_2 \mid \text{data}) \\ = & 0 \times 0.41 + 0.95 \times 0.59 = 0.5605 \end{aligned}\]

Finally, we get that the probability that \(\alpha\) is in the interval, given the data, averaging over both hypotheses, is roughly 0.56. The unconditional statement is the average birth weight of babies born to nonsmokers is 0.023 to 0.57 pounds higher than that of babies born to smokers with probability 0.56. This adjustment addresses the posterior uncertainty and how likely \(H_2\) is.

To recap, we have illustrated testing, followed by reporting credible intervals, and using a Cauchy prior distribution that assumed smaller standardized effects. After testing, it is common to report credible intervals conditional on \(H_2\) . We also have shown how to adjust the probability of the interval to reflect our posterior uncertainty about \(H_2\) . In the next chapter, we will turn to regression models to incorporate continuous explanatory variables.

9.3 Distribution Needed for Hypothesis Testing

Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's t -distribution . (Remember, use a Student's t -distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.) We perform tests of a population proportion using a normal distribution (usually n is large).

Assumptions

When you perform a hypothesis test of a single population mean μ using a Student's t -distribution (often called a t -test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed . You use the sample standard deviation to approximate the population standard deviation. Note that if the sample size is sufficiently large, a t -test will work even if the population is not approximately normally distributed.

When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z -test), you take a simple random sample from the population. The population you are testing is normally distributed or your sample size is sufficiently large. You know the value of the population standard deviation which, in reality, is rarely known.

When you perform a hypothesis test of a single population proportion p , you take a simple random sample from the population. You must meet the conditions for a binomial distribution , which are the following: there are a certain number n of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success p . The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must both be greater than five ( np > 5 and nq > 5). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with μ = p and σ = p q n σ = p q n . Remember that q = 1 – p .

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/9-3-distribution-needed-for-hypothesis-testing

© Jan 23, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

9.2: Comparing Two Independent Population Means (Hypothesis test)

  • Last updated
  • Save as PDF
  • Page ID 125735

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

  • The two independent samples are simple random samples from two distinct populations.
  • if the sample sizes are small, the distributions are important (should be normal)
  • if the sample sizes are large, the distributions are not important (need not be normal)

The test comparing two independent population means with unknown and possibly unequal population standard deviations is called the Aspin-Welch \(t\)-test. The degrees of freedom formula was developed by Aspin-Welch.

The comparison of two population means is very common. A difference between the two samples depends on both the means and the standard deviations. Very different means can occur by chance if there is great variation among the individual samples. In order to account for the variation, we take the difference of the sample means, \(\bar{X}_{1} - \bar{X}_{2}\), and divide by the standard error in order to standardize the difference. The result is a t-score test statistic.

Because we do not know the population standard deviations, we estimate them using the two sample standard deviations from our independent samples. For the hypothesis test, we calculate the estimated standard deviation, or standard error , of the difference in sample means , \(\bar{X}_{1} - \bar{X}_{2}\).

The standard error is:

\[\sqrt{\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}}\]

The test statistic ( t -score) is calculated as follows:

\[\dfrac{(\bar{x}-\bar{x}) - (\mu_{1} - \mu_{2})}{\sqrt{\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}}}\]

  • \(s_{1}\) and \(s_{2}\), the sample standard deviations, are estimates of \(\sigma_{1}\) and \(\sigma_{1}\), respectively.
  • \(\sigma_{1}\) and \(\sigma_{2}\) are the unknown population standard deviations.
  • \(\bar{x}_{1}\) and \(\bar{x}_{2}\) are the sample means. \(\mu_{1}\) and \(\mu_{2}\) are the population means.

The number of degrees of freedom (\(df\)) requires a somewhat complicated calculation. However, a computer or calculator calculates it easily. The \(df\) are not always a whole number. The test statistic calculated previously is approximated by the Student's t -distribution with \(df\) as follows:

Degrees of freedom

\[df = \dfrac{\left(\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}\right)^{2}}{\left(\dfrac{1}{n_{1}-1}\right)\left(\dfrac{(s_{1})^{2}}{n_{1}}\right)^{2} + \left(\dfrac{1}{n_{2}-1}\right)\left(\dfrac{(s_{2})^{2}}{n_{2}}\right)^{2}}\]

We can also use a conservative estimation of degree of freedom by taking DF to be the smallest of \(n_{1}-1\) and \(n_{2}-1\)

When both sample sizes \(n_{1}\) and \(n_{2}\) are five or larger, the Student's t approximation is very good. Notice that the sample variances \((s_{1})^{2}\) and \((s_{2})^{2}\) are not pooled. (If the question comes up, do not pool the variances.)

It is not necessary to compute the degrees of freedom by hand. A calculator or computer easily computes it.

Example \(\PageIndex{1}\): Independent groups

The average amount of time boys and girls aged seven to 11 spend playing sports each day is believed to be the same. A study is done and data are collected, resulting in the data in Table \(\PageIndex{1}\). Each populations has a normal distribution.

Table \(\PageIndex{1}\)
 
Girls 9 2 0.8660.866
Boys 16 3.2 1.00

Is there a difference in the mean amount of time boys and girls aged seven to 11 play sports each day? Test at the 5% level of significance.

The population standard deviations are not known. Let g be the subscript for girls and b be the subscript for boys. Then, \(\mu_{g}\) is the population mean for girls and \(\mu_{b}\) is the population mean for boys. This is a test of two independent groups, two population means.

Random variable: \(\bar{X}_{g} - \bar{X}_{b} =\) difference in the sample mean amount of time girls and boys play sports each day.

  • \(H_{0}: \mu_{g} = \mu_{b}\)  
  • \(H_{0}: \mu_{g} - \mu_{b} = 0\)
  • \(H_{a}: \mu_{g} \neq \mu_{b}\)  
  • \(H_{a}: \mu_{g} - \mu_{b} \neq 0\)

The words "the same" tell you \(H_{0}\) has an "=". Since there are no other words to indicate \(H_{a}\), assume it says "is different." This is a two-tailed test.

Distribution for the test: Use \(t_{df}\) where \(df\) is calculated using the \(df\) formula for independent groups, two population means. Using a calculator, \(df\) is approximately 18.8462. Do not pool the variances.

Calculate the p -value using a Student's t -distribution: \(p\text{-value} = 0.0054\)

This is a normal distribution curve representing the difference in the average amount of time girls and boys play sports all day. The mean is equal to zero, and the values -1.2, 0, and 1.2 are labeled on the horizontal axis. Two vertical lines extend from -1.2 and 1.2 to the curve. The region to the left of x = -1.2 and the region to the right of x = 1.2 are shaded to represent the p-value. The area of each region is 0.0028.

\[s_{g} = 0.866\]

\[s_{b} = 1\]

\[\bar{x}_{g} - \bar{x}_{b} = 2 - 3.2 = -1.2\]

Half the \(p\text{-value}\) is below –1.2 and half is above 1.2.

Make a decision: Since \(\alpha > p\text{-value}\), reject \(H_{0}\). This means you reject \(\mu_{g} = \mu_{b}\). The means are different.

Press STAT . Arrow over to TESTS and press 4:2-SampTTest . Arrow over to Stats and press ENTER . Arrow down and enter 2 for the first sample mean, \(\sqrt{0.866}\) for Sx1, 9 for n1, 3.2 for the second sample mean, 1 for Sx2, and 16 for n2. Arrow down to μ1: and arrow to does not equal μ2. Press ENTER . Arrow down to Pooled: and No . Press ENTER . Arrow down to Calculate and press ENTER . The \(p\text{-value}\) is \(p = 0.0054\), the dfs are approximately 18.8462, and the test statistic is -3.14. Do the procedure again but instead of Calculate do Draw.

Conclusion: At the 5% level of significance, the sample data show there is sufficient evidence to conclude that the mean number of hours that girls and boys aged seven to 11 play sports per day is different (mean number of hours boys aged seven to 11 play sports per day is greater than the mean number of hours played by girls OR the mean number of hours girls aged seven to 11 play sports per day is greater than the mean number of hours played by boys).

Exercise \(\PageIndex{1}\)

Two samples are shown in Table. Both have normal distributions. The means for the two populations are thought to be the same. Is there a difference in the means? Test at the 5% level of significance.

Table \(\PageIndex{2}\)
 
Population A 25 5 1
Population B 16 4.7 1.2

The \(p\text{-value}\) is \(0.4125\), which is much higher than 0.05, so we decline to reject the null hypothesis. There is not sufficient evidence to conclude that the means of the two populations are not the same.

When the sum of the sample sizes is larger than \(30 (n_{1} + n_{2} > 30)\) you can use the normal distribution to approximate the Student's \(t\).

Example \(\PageIndex{2}\)

A study is done by a community group in two neighboring colleges to determine which one graduates students with more math classes. College A samples 11 graduates. Their average is four math classes with a standard deviation of 1.5 math classes. College B samples nine graduates. Their average is 3.5 math classes with a standard deviation of one math class. The community group believes that a student who graduates from college A has taken more math classes, on the average. Both populations have a normal distribution. Test at a 1% significance level. Answer the following questions.

  • Is this a test of two means or two proportions?
  • Are the populations standard deviations known or unknown?
  • Which distribution do you use to perform the test?
  • What is the random variable?
  • What are the null and alternate hypotheses? Write the null and alternate hypotheses in words and in symbols.
  • Is this test right-, left-, or two-tailed?
  • What is the \(p\text{-value}\)?
  • Do you reject or not reject the null hypothesis?
  • Student's t
  • \(\bar{X}_{A} - \bar{X}_{B}\)
  • \(H_{0}: \mu_{A} \leq \mu_{B}\) and \(H_{a}: \mu_{A} > \mu_{B}\)

alt

  • h. Do not reject.
  • i. At the 1% level of significance, from the sample data, there is not sufficient evidence to conclude that a student who graduates from college A has taken more math classes, on the average, than a student who graduates from college B.

Exercise \(\PageIndex{2}\)

A study is done to determine if Company A retains its workers longer than Company B. Company A samples 15 workers, and their average time with the company is five years with a standard deviation of 1.2. Company B samples 20 workers, and their average time with the company is 4.5 years with a standard deviation of 0.8. The populations are normally distributed.

  • Are the population standard deviations known?
  • Conduct an appropriate hypothesis test. At the 5% significance level, what is your conclusion?
  • They are unknown.
  • The \(p\text{-value} = 0.0878\). At the 5% level of significance, there is insufficient evidence to conclude that the workers of Company A stay longer with the company.

Example \(\PageIndex{3}\)

A professor at a large community college wanted to determine whether there is a difference in the means of final exam scores between students who took his statistics course online and the students who took his face-to-face statistics class. He believed that the mean of the final exam scores for the online class would be lower than that of the face-to-face class. Was the professor correct? The randomly selected 30 final exam scores from each group are listed in Table \(\PageIndex{3}\) and Table \(\PageIndex{4}\).

Table \(\PageIndex{3}\): Online Class
67.6 41.2 85.3 55.9 82.4 91.2 73.5 94.1 64.7 64.7
70.6 38.2 61.8 88.2 70.6 58.8 91.2 73.5 82.4 35.5
94.1 88.2 64.7 55.9 88.2 97.1 85.3 61.8 79.4 79.4
Table \(\PageIndex{4}\): Face-to-face Class
77.9 95.3 81.2 74.1 98.8 88.2 85.9 92.9 87.1 88.2
69.4 57.6 69.4 67.1 97.6 85.9 88.2 91.8 78.8 71.8
98.8 61.2 92.9 90.6 97.6 100 95.3 83.5 92.9 89.4

Is the mean of the Final Exam scores of the online class lower than the mean of the Final Exam scores of the face-to-face class? Test at a 5% significance level. Answer the following questions:

  • Are the population standard deviations known or unknown?
  • What are the null and alternative hypotheses? Write the null and alternative hypotheses in words and in symbols.
  • Is this test right, left, or two tailed?
  • At the ___ level of significance, from the sample data, there ______ (is/is not) sufficient evidence to conclude that ______.

(See the conclusion in Example, and write yours in a similar fashion)

Be careful not to mix up the information for Group 1 and Group 2!

  • Student's \(t\)
  • \(\bar{X}_{1} - \bar{X}_{2}\)
  • \(H_{0}: \mu_{1} = \mu_{2}\) Null hypothesis: the means of the final exam scores are equal for the online and face-to-face statistics classes.
  • \(H_{a}: \mu_{1} < \mu_{2}\) Alternative hypothesis: the mean of the final exam scores of the online class is less than the mean of the final exam scores of the face-to-face class.
  • left-tailed

This is a normal distribution curve with mean equal to zero. A vertical line near the tail of the curve to the left of zero extends from the axis to the curve. The region under the curve to the left of the line is shaded representing p-value = 0.0011.

Figure \(\PageIndex{3}\).

  • Reject the null hypothesis

At the 5% level of significance, from the sample data, there is (is/is not) sufficient evidence to conclude that the mean of the final exam scores for the online class is less than the mean of final exam scores of the face-to-face class.

First put the data for each group into two lists (such as L1 and L2). Press STAT. Arrow over to TESTS and press 4:2SampTTest. Make sure Data is highlighted and press ENTER. Arrow down and enter L1 for the first list and L2 for the second list. Arrow down to \(\mu_{1}\): and arrow to \(\neq \mu_{1}\) (does not equal). Press ENTER. Arrow down to Pooled: No. Press ENTER. Arrow down to Calculate and press ENTER.

Cohen's Standards for Small, Medium, and Large Effect Sizes

Cohen's \(d\) is a measure of effect size based on the differences between two means. Cohen’s \(d\), named for United States statistician Jacob Cohen, measures the relative strength of the differences between the means of two populations based on sample data. The calculated value of effect size is then compared to Cohen’s standards of small, medium, and large effect sizes.

Table \(\PageIndex{5}\): Cohen's Standard Effect Sizes
Small 0.2
medium 0.5
Large 0.8

Cohen's \(d\) is the measure of the difference between two means divided by the pooled standard deviation: \(d = \dfrac{\bar{x}_{2}-\bar{x}_{2}}{s_{\text{pooled}}}\) where \(s_{pooled} = \sqrt{\dfrac{(n_{1}-1)s^{2}_{1} + (n_{2}-1)s^{2}_{2}}{n_{1}+n_{2}-2}}\)

Example \(\PageIndex{4}\)

Calculate Cohen’s d for Example. Is the size of the effect small, medium, or large? Explain what the size of the effect means for this problem.

\(\mu_{1} = 4 s_{1} = 1.5 n_{1} = 11\)

\(\mu_{2} = 3.5 s_{2} = 1 n_{2} = 9\)

\(d = 0.384\)

The effect is small because 0.384 is between Cohen’s value of 0.2 for small effect size and 0.5 for medium effect size. The size of the differences of the means for the two colleges is small indicating that there is not a significant difference between them.

Example \(\PageIndex{5}\)

Calculate Cohen’s \(d\) for Example. Is the size of the effect small, medium or large? Explain what the size of the effect means for this problem.

\(d = 0.834\); Large, because 0.834 is greater than Cohen’s 0.8 for a large effect size. The size of the differences between the means of the Final Exam scores of online students and students in a face-to-face class is large indicating a significant difference.

Example 10.2.6

Weighted alpha is a measure of risk-adjusted performance of stocks over a period of a year. A high positive weighted alpha signifies a stock whose price has risen while a small positive weighted alpha indicates an unchanged stock price during the time period. Weighted alpha is used to identify companies with strong upward or downward trends. The weighted alpha for the top 30 stocks of banks in the northeast and in the west as identified by Nasdaq on May 24, 2013 are listed in Table and Table, respectively.

Northeast
94.2 75.2 69.6 52.0 48.0 41.9 36.4 33.4 31.5 27.6
77.3 71.9 67.5 50.6 46.2 38.4 35.2 33.0 28.7 26.5
76.3 71.7 56.3 48.7 43.2 37.6 33.7 31.8 28.5 26.0
West
126.0 70.6 65.2 51.4 45.5 37.0 33.0 29.6 23.7 22.6
116.1 70.6 58.2 51.2 43.2 36.0 31.4 28.7 23.5 21.6
78.2 68.2 55.6 50.3 39.0 34.1 31.0 25.3 23.4 21.5

Is there a difference in the weighted alpha of the top 30 stocks of banks in the northeast and in the west? Test at a 5% significance level. Answer the following questions:

  • Calculate Cohen’s d and interpret it.
  • Student’s-t
  • \(H_{0}: \mu_{1} = \mu_{2}\) Null hypothesis: the means of the weighted alphas are equal.
  • \(H_{a}: \mu_{1} \neq \mu_{2}\) Alternative hypothesis : the means of the weighted alphas are not equal.
  • \(p\text{-value} = 0.8787\)
  • Do not reject the null hypothesis

This is a normal distribution curve with mean equal to zero. Both the right and left tails of the curve are shaded. Each tail represents 1/2(p-value) = 0.4394.

Figure \(\PageIndex{4}\).

  • \(d = 0.040\), Very small, because 0.040 is less than Cohen’s value of 0.2 for small effect size. The size of the difference of the means of the weighted alphas for the two regions of banks is small indicating that there is not a significant difference between their trends in stocks.
  • Data from Graduating Engineer + Computer Careers. Available online at www.graduatingengineer.com
  • Data from Microsoft Bookshelf .
  • Data from the United States Senate website, available online at www.Senate.gov (accessed June 17, 2013).
  • “List of current United States Senators by Age.” Wikipedia. Available online at en.Wikipedia.org/wiki/List_of...enators_by_age (accessed June 17, 2013).
  • “Sectoring by Industry Groups.” Nasdaq. Available online at www.nasdaq.com/markets/barcha...&base=industry (accessed June 17, 2013).
  • “Strip Clubs: Where Prostitution and Trafficking Happen.” Prostitution Research and Education, 2013. Available online at www.prostitutionresearch.com/ProsViolPosttrauStress.html (accessed June 17, 2013).
  • “World Series History.” Baseball-Almanac, 2013. Available online at http://www.baseball-almanac.com/ws/wsmenu.shtml (accessed June 17, 2013).

Two population means from independent samples where the population standard deviations are not known

  • Random Variable: \(\bar{X}_{1} - \bar{X}_{2} =\) the difference of the sampling means
  • Distribution: Student's t -distribution with degrees of freedom (variances not pooled)

Formula Review

Standard error: \[SE = \sqrt{\dfrac{(s_{1}^{2})}{n_{1}} + \dfrac{(s_{2}^{2})}{n_{2}}}\]

Test statistic ( t -score): \[t = \dfrac{(\bar{x}_{1}-\bar{x}_{2}) - (\mu_{1}-\mu_{2})}{\sqrt{\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}}}\]

Degrees of freedom:

\[df = \dfrac{\left(\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}\right)^{2}}{\left(\dfrac{1}{n_{1} - 1}\right)\left(\dfrac{(s_{1})^{2}}{n_{1}}\right)^{2}} + \left(\dfrac{1}{n_{2} - 1}\right)\left(\dfrac{(s_{2})^{2}}{n_{2}}\right)^{2}\]

  • \(s_{1}\) and \(s_{2}\) are the sample standard deviations, and n 1 and n 2 are the sample sizes.
  • \(x_{1}\) and \(x_{2}\) are the sample means.

OR use the   DF to be the smallest of \(n_{1}-1\) and \(n_{2}-1\)

Cohen’s \(d\) is the measure of effect size:

\[d = \dfrac{\bar{x}_{1} - \bar{x}_{2}}{s_{\text{pooled}}}\]

\[s_{\text{pooled}} = \sqrt{\dfrac{(n_{1} - 1)s^{2}_{1} + (n_{2} - 1)s^{2}_{2}}{n_{1} + n_{2} - 2}}\]

  • The domain of the random variable (RV) is not necessarily a numerical set; the domain may be expressed in words; for example, if \(X =\) hair color, then the domain is {black, blond, gray, green, orange}.
  • We can tell what specific value x of the random variable \(X\) takes only after performing the experiment.

hypothesis testing two normal distributions

  • The Open University
  • Guest user / Sign out
  • Study with The Open University

My OpenLearn Profile

Personalise your OpenLearn profile, save your favourite content and get recognition for your learning

About this free course

Become an ou student, download this course, share this free course.

Data analysis: hypothesis testing

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

4.1 The normal distribution

Here, you will look at the concept of normal distribution and the bell-shaped curve. The peak point (the top of the bell) represents the most probable occurrences, while other possible occurrences are distributed symmetrically around the peak point, creating a downward-sloping curve on either side of the peak point.

Cartoon showing a bell-shaped curve.

The cartoon shows a bell-shaped curve. The x-axis is titled ‘How high the hill is’ and the y-axis is titled ‘Number of hills’. The top of the bell-shaped curve is labelled ‘Average hill’, but on the lower right tail of the bell-shaped curve is labelled ‘Big hill’.

In order to test hypotheses, you need to calculate the test statistic and compare it with the value in the bell curve. This will be done by using the concept of ‘normal distribution’.

A normal distribution is a probability distribution that is symmetric about the mean, indicating that data near the mean are more likely to occur than data far from it. In graph form, a normal distribution appears as a bell curve. The values in the x-axis of the normal distribution graph represent the z-scores. The test statistic that you wish to use to test the set of hypotheses is the z-score . A z-score is used to measure how far the observation (sample mean) is from the 0 value of the bell curve (population mean). In statistics, this distance is measured by standard deviation. Therefore, when the z-score is equal to 2, the observation is 2 standard deviations away from the value 0 in the normal distribution curve.

A symmetrical graph reminiscent of a bell showing normal distribution.

A symmetrical graph reminiscent of a bell. The top of the bell-shaped curve appears where the x-axis is at 0. This is labelled as Normal distribution.

Previous

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Hypothesis test between two normal distributions

Let $T_1,T_2,\ldots ,T_ n$ be i.i.d. observations, each drawn from a common normal distribution with mean zero. With probability $1/2$ this normal distribution has variance $1$, and with probability $1/2$ it has variance $4$. Based on the observed values $t_1,t_2,\ldots ,t_ n$, we use the MAP rule to decide whether the normal distribution from which they were drawn has variance $1$ or variance $4$. The MAP rule decides that the underlying normal distribution has variance $1$ if and only if$$\left| c_1 \sum _{i=1}^{n} t_ i^2 + c_2 \sum _{i=1}^{n} t_ i \right| < 1.$$

Find the values of $c_1\geq 0$ and $c_2\geq 0$ such that this is true. Express your answer in terms of $n$.

I'm finding it conceptually difficult to understand the nature of the posterior distribution of the variance of a normal distribution, given the observed $t_ i$.I'd greatly appreciate it if someone would please provide some helpful hint. Thanks.

  • probability-distributions
  • parameter-estimation

Q Yang's user avatar

Using Bayes' Rule, the posterior distribution of the variance is given by the product of the likelihood and the prior distribution of the variance divided by the marginal likelihood $$p(\sigma^2|t_1,t_2,\cdots,t_n)=\frac{p(t_1,t_2,\cdots,t_n|\sigma^2)p(\sigma^2)}{ p(t_1,t_2,\cdots,t_n)}=\frac{p(t_1,t_2,\cdots,t_n|\sigma^2)p(\sigma^2)}{\int p(t_1,t_2,\cdots,t_n|\sigma^2)p(\sigma^2)d(\sigma^2)} $$ As the observed values $t_i$ are i.i.d. (with zero mean) the likelihood function $p(t_1,t_2,...,t_n|\sigma^2)$ is given by $$p(t_1,t_2,...,t_n|\sigma^2)=\prod_{i=1}^n\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{t_i^2}{2\sigma^2}\right)=\frac{1}{\sigma^n(2\pi)^{n/2}}\exp\left(-\frac{1}{2\sigma^2}\sum_{i=1}^nt_i^2\right)$$ while the prior on the variance is simply the following probability mass function $$p(\sigma^2=1)=\frac{1}{2},\ p(\sigma^2=4)=\frac{1}{2}$$ Given the discrete nature of the prior of the variance, the marginal likelihood simplifies to $$\begin{align}p(t_1,t_2,\cdots,t_n)&=p(t_1,t_2,\cdots,t_n|\sigma^2=1)p(\sigma^2=1)+p(t_1,t_2,\cdots,t_n|\sigma^2=4)p(\sigma^2=4)\\&=\frac{1}{2}\left[\frac{1}{(2\pi)^{n/2}}\exp\left(-\frac{1}{2}\sum_{i=1}^nt_i^2\right)+\frac{1}{2^n(2\pi)^{n/2}}\exp\left(-\frac{1}{8}\sum_{i=1}^nt_i^2\right)\right]\end{align}$$ In order to use the MAP rule to determine whether the samples were drawn from a distribution where the variance was $1$, we need to satisfy the following inequality (have a look at Equations $3.4$ and $3.5$ in here ):- $$\frac{p(t_1,t_2\cdots,t_n|\sigma^2=4)p(\sigma^2=4)}{p(t_1,t_2,\cdots,t_n)}<\frac{p(t_1,t_2\cdots,t_n|\sigma^2=1)p(\sigma^2=1)}{p(t_1,t_2,\cdots,t_n)}$$ which (given the prior of the variance) simplifies to $$p(t_1,t_2\cdots,t_n|\sigma^2=4)<p(t_1,t_2\cdots,t_n|\sigma^2=1)$$ which is the condition that $$\frac{1}{2^n(2\pi)^{n/2}}\exp\left(-\frac{1}{8}\sum_{i=1}^nt_i^2\right)<\frac{1}{(2\pi)^{n/2}}\exp\left(-\frac{1}{2}\sum_{i=1}^nt_i^2\right)$$ Taking the logarithm of both sides and simplifying results in the following inequality $$\frac{3}{8n\log 2}\sum_{i=1}^nt_i^2<1$$ Thus, we have $c_1=\frac{3}{8n\log 2}$ and $c_2=0$.

Alijah Ahmed's user avatar

  • 1 $\begingroup$ Such a detailed explanation is really informative, thanks a lot. $\endgroup$ –  Q Yang Commented Apr 8, 2015 at 2:05
  • $\begingroup$ You are welcome. $\endgroup$ –  Alijah Ahmed Commented Apr 8, 2015 at 6:20
  • $\begingroup$ @AlijahAhmed, what does it take for you to come up with the answer? The problem is untrivial. $\endgroup$ –  muxo Commented Mar 22, 2020 at 10:55
  • $\begingroup$ beautifully explained. thank you :) $\endgroup$ –  silent_dev Commented Nov 2, 2020 at 15:46

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged probability-distributions parameter-estimation ..

  • Featured on Meta
  • Upcoming sign-up experiments related to tags

Hot Network Questions

  • Show or movie where a blue alien is discovered by kids in a storm drain or sewer and cuts off its own foot to escape
  • Where in "À la recherche du temps perdu" does the main character indicate that he would be named after the author?
  • Is this homebrew "Firemind Dragonborn" race overpowered?
  • What did Jesus mean about the Temple as the Father's House in John 2:16?
  • why std::is_same<int, *(int*)>::value is false?
  • What is the difference between NP=EXP and ETH, and what does the community believe about their truth?
  • Why does setting a variable readonly in the outer scope prevents defining a local variable with the same name?
  • How can student grades from different countries (e.g., India and China) be compared when applying for academic positions?
  • How are real numbers defined in elementary recursive arithmetic?
  • Why is MSS important? Why can't we just rely on the MTU?
  • What is the appropriate behavior when management decisions lead to tasks no longer being done?
  • How can I make a data randomly like form of LaTeX
  • 18650 battery holder corrosion
  • Looking at buying house with mechanical septic system. What questions should I be asking?
  • Roadmap for self study in philosophy
  • How to implement separate compilation?
  • The smell of wet gypsum
  • In the US, are employers liable for torts caused by their employees working from home
  • Curve Tangent direction always changing depending on extrude
  • Why is nonzero net charge density incompatible with the cosmological principle?
  • What is this tool shown to us?
  • Why aren't the plains people conquering the city people?
  • Composition of vectorvalued functions
  • What is the time-travel story where "ugly chickens" are trapped in the past and brought to the present to become ingredients for a soup?

hypothesis testing two normal distributions

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

10.2: Two Population Means with Unknown Standard Deviations

  • Last updated
  • Save as PDF
  • Page ID 6975

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

  • The two independent samples are simple random samples from two distinct populations.
  • if the sample sizes are small, the distributions must be normal.
  • if the sample sizes are large, knowing the distributions is not important.

The test comparing two independent population means with unknown and possibly unequal population standard deviations is called the Aspin-Welch \(t\)-test. The degrees of freedom formula was developed by Aspin-Welch.

The comparison of two population means is very common. A difference between the two samples depends on both the means and the standard deviations. Very different means can occur by chance if there is great variation among the individual samples. In order to account for the variation, we take the difference of the sample means, \(\bar{X}_{1} - \bar{X}_{2}\), and divide by the standard error in order to standardize the difference. The result is a t-score test statistic.

Because we do not know the population standard deviations, we estimate them using the two sample standard deviations from our independent samples. For the hypothesis test, we calculate the estimated standard deviation, or standard error , of the difference in sample means , \(\bar{X}_{1} - \bar{X}_{2}\).

The standard error is:

\[\sqrt{\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}}\]

The test statistic ( t -score) is calculated as follows:

\[\dfrac{(\bar{x_{1}}-\bar{x_{2}}) - (\mu_{1} - \mu_{2})}{\sqrt{\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}}}\]

  • \(s_{1}\) and \(s_{2}\), the sample standard deviations, are estimates of \(\sigma_{1}\) and \(\sigma_{1}\), respectively.
  • \(\sigma_{1}\) and \(\sigma_{2}\) are the unknown population standard deviations.
  • \(\bar{x}_{1}\) and \(\bar{x}_{2}\) are the sample means. \(\mu_{1}\) and \(\mu_{2}\) are the population means.

The number of degrees of freedom (\(df\)) requires a somewhat complicated calculation. However, a computer or calculator calculates it easily. The \(df\) are not always a whole number. The test statistic calculated previously is approximated by the Student's t -distribution with \(df\) as follows:

Degrees of freedom

\[df = \dfrac{\left(\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}\right)^{2}}{\left(\dfrac{1}{n_{1}-1}\right)\left(\dfrac{(s_{1})^{2}}{n_{1}}\right)^{2} + \left(\dfrac{1}{n_{2}-1}\right)\left(\dfrac{(s_{2})^{2}}{n_{2}}\right)^{2}}\]

When both sample sizes \(n_{1}\) and \(n_{2}\) are five or larger, the Student's t approximation is very good. Notice that the sample variances \((s_{1})^{2}\) and \((s_{2})^{2}\) are not pooled. (If the question comes up, do not pool the variances.)  It is not necessary to compute the degrees of freedom by hand. A calculator or computer easily computes it.

Two Independent Samples with statistics Calculator

Enter in the statistics, the tail type and the confidence level and hit Calculate and the test statistic, t, the p-value, p, the confidence interval's lower bound, LB, and the upper bound, UB will be shown.  Be sure to enter the confidence level as a decimal, e.g., 95% has a CL of 0.95.

  Sample Size Sample Mean Sample Standard Deviation
First Sample
Second Sample
   \(\lt\)
   \(\gt\)
   \(\neq\)
CL:
t: p: LB: UB:

Example \(\PageIndex{1}\): Independent groups

The average amount of time boys and girls aged seven to 11 spend playing sports each day is believed to be the same. A study is done and data are collected, resulting in the data in Table \(\PageIndex{1}\). Each populations has a normal distribution.

Table \(\PageIndex{1}\)
 
Girls 9 2 0.866
Boys 16 3.2 1.00

Is there a difference in the mean amount of time boys and girls aged seven to 11 play sports each day? Test at the 5% level of significance.

The population standard deviations are not known. Let g be the subscript for girls and b be the subscript for boys. Then, \(\mu_{g}\) is the population mean for girls and \(\mu_{b}\) is the population mean for boys. This is a test of two independent groups, two population means.

Random variable: \(\bar{X}_{g} - \bar{X}_{b} =\) difference in the sample mean amount of time girls and boys play sports each day.

  • \(H_{0}: \mu_{g} = \mu_{b}\ \) or \(\mu_{g} - \mu_{b} = 0\)  
  • \(H_{a}: \mu_{g} \neq \mu_{b} \) or \( \mu_{g} - \mu_{b} \neq 0\)

The words "the same" tell you \(H_{0}\) has an "=". Since there are no other words to indicate \(H_{a}\), assume it says "is different." This is a two-tailed test.

Distribution for the test: Use \(t_{df}\) where \(df\) is calculated using the \(df\) formula for independent groups, two population means. Using a calculator, \(df\) is approximately 18.8462. Do not pool the variances.

Calculate the test statistic and the  p -value using a Student's t -distribution: \(t = -3.1424 \),  \(p\text{-value} = 0.0054\)

This is a normal distribution curve representing the difference in the average amount of time girls and boys play sports all day. The mean is equal to zero, and the values -1.2, 0, and 1.2 are labeled on the horizontal axis. Two vertical lines extend from -1.2 and 1.2 to the curve. The region to the left of x = -1.2 and the region to the right of x = 1.2 are shaded to represent the p-value. The area of each region is 0.0028.

\[s_{g} = 0.866\]

\[s_{b} = 1\]

\[\bar{x}_{g} - \bar{x}_{b} = 2 - 3.2 = -1.2\]

Half the \(p\text{-value}\) is below –1.2 and half is above 1.2.

Make a decision: Since \(\alpha > p\text{-value}\), reject \(H_{0}\). This means you reject \(\mu_{g} = \mu_{b}\). The means are different.

Press STAT . Arrow over to TESTS and press 4:2-SampTTest . Arrow over to Stats and press ENTER . Arrow down and enter 2 for the first sample mean, \(\sqrt{0.866}\)  for Sx1, 9 for n1, 3.2 for the second sample mean, 1 for Sx2, and 16 for n2. Arrow down to μ1: and arrow to does not equal μ2. Press ENTER . Arrow down to Pooled: and No . Press ENTER . Arrow down to Calculate and press ENTER . The \(p\text{-value}\) is \(p = 0.0054\), the dfs are approximately 18.8462, and the test statistic is -3.14. Do the procedure again but instead of Calculate do Draw.

Conclusion: At the 5% level of significance, the sample data show there is sufficient evidence to conclude that the mean number of hours that girls and boys aged seven to 11 play sports per day is different (mean number of hours boys aged seven to 11 play sports per day is greater than the mean number of hours played by girls OR the mean number of hours girls aged seven to 11 play sports per day is greater than the mean number of hours played by boys).

Exercise 10.2.1

Two samples are shown in Table \(\PageIndex{2}\) . Both have normal distributions. The means for the two populations are thought to be the same. Is there a difference in the means? Test at the 5% level of significance.

Table \(\PageIndex{2}\)
 
Population A 25 5 1
Population B 16 4.7 1.2

The \(p\text{-value}\) is \(0.4125\), which is much higher than 0.05, so we decline to reject the null hypothesis. There is not sufficient evidence to conclude that the means of the two populations are not the same.

Note on Sample Sizes

When the sum of the sample sizes is larger than \(30 (n_{1} + n_{2} > 30)\) you can use the normal distribution to approximate the Student's \(t\).

Example \(\PageIndex{2}\)

A study is done by a community group in two neighboring colleges to determine which one graduates students with more math classes. College A samples 11 graduates. Their average is four math classes with a standard deviation of 1.5 math classes. College B samples nine graduates. Their average is 3.5 math classes with a standard deviation of one math class. The community group believes that a student who graduates from college A has taken more math classes, on the average. Both populations have a normal distribution. Test at a 1% significance level. Answer the following questions.

  • Is this a test of two means or two proportions?
  • Are the populations standard deviations known or unknown?
  • Which distribution do you use to perform the test?
  • What is the random variable?
  • What are the null and alternate hypotheses? Write the null and alternate hypotheses in words and in symbols.
  • Is this test right-, left-, or two-tailed?
  • What is the \(p\text{-value}\)?
  • Do you reject or not reject the null hypothesis?
  • Student's t
  • \(\bar{X}_{A} - \bar{X}_{B}\)
  • \(H_{0}: \mu_{A} = \mu_{B}\) and \(H_{a}: \mu_{A} > \mu_{B}\)

10.2.2.png

Figure 10.2.2.

Therefore, it is a right-tailed test.

Do not reject.

At the 1% level of significance, from the sample data, there is not sufficient evidence to conclude that a student who graduates from college A has taken more math classes, on the average, than a student who graduates from college B.

Exercise \(\PageIndex{2}\)

A study is done to determine if Company A retains its workers longer than Company B. Company A samples 15 workers, and their average time with the company is five years with a standard deviation of 1.2. Company B samples 20 workers, and their average time with the company is 4.5 years with a standard deviation of 0.8. The populations are normally distributed.

  • Are the population standard deviations known?
  • Conduct an appropriate hypothesis test. At the 5% significance level, what is your conclusion?
  • They are unknown.
  • The \(p\text{-value} = 0.0878\). At the 5% level of significance, there is insufficient evidence to conclude that the workers of Company A stay longer with the company.

Two Independent Samples with data Calculator

Type in the values from the two data sets separated by commas, for example, 2,4,5,8,11,2.  Then enter the tail type and the confidence level and hit Calculate and the test statistic, t, the p-value, p, the confidence interval's lower bound, LB, and the upper bound, UB will be shown.  Be sure to enter the confidence level as a decimal, e.g., 95% has a CL of 0.95.

   \(\lt\)
   \(\gt\)
   \(\neq\)
t: p LB UB

Example \(\PageIndex{3}\)

A professor at a large community college wanted to determine whether there is a difference in the means of final exam scores between students who took his statistics course online and the students who took his face-to-face statistics class. He believed that the mean of the final exam scores for the online class would be lower than that of the face-to-face class. Was the professor correct? The randomly selected 30 final exam scores from each group are listed in Table \(\PageIndex{3}\) and Table \(\PageIndex{4}\).

Table \(\PageIndex{3}\): Online Class
67.6 41.2 85.3 55.9 82.4 91.2 73.5 94.1 64.7 64.7
70.6 38.2 61.8 88.2 70.6 58.8 91.2 73.5 82.4 35.5
94.1 88.2 64.7 55.9 88.2 97.1 85.3 61.8 79.4 79.4
Table \(\PageIndex{4}\): Face-to-face Class
77.9 95.3 81.2 74.1 98.8 88.2 85.9 92.9 87.1 88.2
69.4 57.6 69.4 67.1 97.6 85.9 88.2 91.8 78.8 71.8
98.8 61.2 92.9 90.6 97.6 100 95.3 83.5 92.9 89.4

Is the mean of the Final Exam scores of the online class lower than the mean of the Final Exam scores of the face-to-face class? Test at a 5% significance level. Answer the following questions:

  • Are the population standard deviations known or unknown?
  • What are the null and alternative hypotheses? Write the null and alternative hypotheses in words and in symbols.
  • Is this test right, left, or two tailed?
  • At the ___ level of significance, from the sample data, there ______ (is/is not) sufficient evidence to conclude that ______.

(See the conclusion in Example, and write yours in a similar fashion)

Be careful not to mix up the information for Group 1 and Group 2!

  • Student's \(t\)
  • \(\bar{X}_{1} - \bar{X}_{2}\)
  • \(H_{0}: \mu_{1} = \mu_{2}\) Null hypothesis: the means of the final exam scores are equal for the online and face-to-face statistics classes.
  • \(H_{a}: \mu_{1} < \mu_{2}\) Alternative hypothesis: the mean of the final exam scores of the online class is less than the mean of the final exam scores of the face-to-face class.
  • left-tailed

This is a normal distribution curve with mean equal to zero. A vertical line near the tail of the curve to the left of zero extends from the axis to the curve. The region under the curve to the left of the line is shaded representing p-value = 0.0011.

Figure 10.2.3.

  • Reject the null hypothesis

At the 5% level of significance, from the sample data, there is (is/is not) sufficient evidence to conclude that the mean of the final exam scores for the online class is less than the mean of final exam scores of the face-to-face class.

First put the data for each group into two lists (such as L1 and L2). Press STAT. Arrow over to TESTS and press 4:2SampTTest. Make sure Data is highlighted and press ENTER. Arrow down and enter L1 for the first list and L2 for the second list. Arrow down to \(\mu_{1}\): and arrow to \(\neq \mu_{1}\) (does not equal). Press ENTER. Arrow down to Pooled: No. Press ENTER. Arrow down to Calculate and press ENTER.

Confidence Intervals for the Difference Between Two Population Means (Independent Samples)

Hypothesis tests are used to provide evidence that one population mean is larger, smaller, or different from another population mean. Recall that if the interest is to estimate how much larger a population mean is from a given value, then a confidence interval is used.  In same way that confidence intervals for a population mean is constructed, a confidence interval for the difference between two population means can also be constructed if the samples are independent of each other.  The center of the confidence interval will be the difference between the sample means and the margin of error will be the product of the corresponding value of t and the standard error.  Putting this together gives the following formula.

Confidence Interval for the Difference Formula

\(\left(\bar{x_2}-\bar{x_1}-t_{\frac{\alpha}{2}}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}},\bar{x_2}-\bar{x_1}+t_{\frac{\alpha}{2}}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\right)\)

Although one can always theoretically use the formula, in practical applications technology is used.  When using the TI84+, for example, the menu item to go to is the 2-SampTInt.  Then just enter the sample means, sample standard deviations, sample sizes, confidence level, and hit ENTER and the confidence interval will appear.  If data is provided instead of statistics, then enter the data in L1 and L2 and within the 2-SampTInt select Data and just indicate that the first data set is L1 and the second is L2.  Enter in the confidence level, hit ENTER, and the confidence interval will appear.  The computer calculators built into this section will also compute the confidence interval without too much difficulty.

Example \(\PageIndex{4}\)

It is well known that college graduates make more money on average than students who do not graduate from college, but a college was interested in how much more money on average their graduates made compared to those who did not complete their degree.  Notice that a confidence interval rather than a hypothesis test will be helpful in providing information about this.  Suppose a college tracked a total of 300 students who entered the college.  242 of these students successfully completed their degree and ten years after they entered college their mean salary was $72,148 and their standard deviation was $22,263.  The 58 students who did not successfully complete their degree had a mean salary of $47,972 ten years after they entered the college and their standard deviation was $16,845.

Answer the following questions:

  • Is this a confidence interval for the difference of two means or two proportions?
  • What is the lower bound for the 95% confidence interval?
  • What is the upper bound for the 95% confidence interval?
  • State and interpret the 95% confidence interval.
  • Interpret the lower bound for the 95% confidence interval.
  • Interpret the upper bound for the 95% confidence interval.
  • Explain what it means to be 95% confident in the context of the study.
  • The 95% confidence interval is [18955, 29397].  With 95% confidence, the population mean salary ten years after entering college is between $18,955 and 29,397 higher for those who successfully complete their degree compared to those who do not complete their degree.
  • With 95% confidence, we can state that ten years after entering college, those who complete their degree, on average, make at least $18,955 more money per year than those who do not complete their degree.
  • With 95% confidence, we can state that ten years after entering college, those who complete their degree, on average, have a salary that is at most $29,397 higher than those who do not complete their degree.
  • If many samples of 242 students who completed their degree and 58 students who did not complete their degree were looked at, then a different confidence interval would result from each of these samples.  95% of these confidence intervals will contain the true population mean difference in salaries and 5% of these confidence intervals will fail to contain the true population mean difference in salaries.

Exercise \(\PageIndex{5}\)

How much more exercise do Californians (CA) get each week compared to Alabamians (AL)?  The table below shows the results of a study on exercise.  Assume that the distributions of exercise time are normal for both California and Alabama.  Come up with and interpret the 90% confidence interval for the difference.

CA 162 134 97 168 88 72 297 141 130
AL 37 111 28 75 92 187 12 62  

The 90% confidence interval is [15.74, 119.71].  With 90% confidence it can be concluded that the population mean amount of exercise that Californians get each week is between 15.74 and 119.71 minutes longer than the population mean amount of exercise that Alabamians get each week.

Cohen's Standards for Small, Medium, and Large Effect Sizes

Note:  Cohen's Standards are covered in relatively few elementary statistic classes, so this topic may be optional

Cohen's \(d\) is a measure of effect size based on the differences between two means. Cohen’s \(d\), named for United States statistician Jacob Cohen, measures the relative strength of the differences between the means of two populations based on sample data. The calculated value of effect size is then compared to Cohen’s standards of small, medium, and large effect sizes.

Table \(\PageIndex{5}\): Cohen's Standard Effect Sizes
Small 0.2
medium 0.5
Large 0.8

Cohen's \(d\) is the measure of the difference between two means divided by the pooled standard deviation: \(d = \dfrac{\bar{x}_{2}-\bar{x}_{2}}{s_{\text{pooled}}}\) where \(s_{pooled} = \sqrt{\dfrac{(n_{1}-1)s^{2}_{1} + (n_{2}-1)s^{2}_{2}}{n_{1}+n_{2}-2}}\)

Example \(\PageIndex{6}\)

Calculate Cohen’s d for Example. Is the size of the effect small, medium, or large? Explain what the size of the effect means for this problem.

\(\mu_{1} = 4 s_{1} = 1.5 n_{1} = 11\)

\(\mu_{2} = 3.5 s_{2} = 1 n_{2} = 9\)

\(d = 0.384\)

The effect is small because 0.384 is between Cohen’s value of 0.2 for small effect size and 0.5 for medium effect size. The size of the differences of the means for the two colleges is small indicating that there is not a significant difference between them.

Example \(\PageIndex{7}\)

Calculate Cohen’s \(d\) for Example. Is the size of the effect small, medium or large? Explain what the size of the effect means for this problem.

\(d = 0.834\); Large, because 0.834 is greater than Cohen’s 0.8 for a large effect size. The size of the differences between the means of the Final Exam scores of online students and students in a face-to-face class is large indicating a significant difference.

Example 10.2.8

Weighted alpha is a measure of risk-adjusted performance of stocks over a period of a year. A high positive weighted alpha signifies a stock whose price has risen while a small positive weighted alpha indicates an unchanged stock price during the time period. Weighted alpha is used to identify companies with strong upward or downward trends. The weighted alpha for the top 30 stocks of banks in the northeast and in the west as identified by Nasdaq on May 24, 2013 are listed in Table and Table, respectively.

Northeast
94.2 75.2 69.6 52.0 48.0 41.9 36.4 33.4 31.5 27.6
77.3 71.9 67.5 50.6 46.2 38.4 35.2 33.0 28.7 26.5
76.3 71.7 56.3 48.7 43.2 37.6 33.7 31.8 28.5 26.0
West
126.0 70.6 65.2 51.4 45.5 37.0 33.0 29.6 23.7 22.6
116.1 70.6 58.2 51.2 43.2 36.0 31.4 28.7 23.5 21.6
78.2 68.2 55.6 50.3 39.0 34.1 31.0 25.3 23.4 21.5

Is there a difference in the weighted alpha of the top 30 stocks of banks in the northeast and in the west? Test at a 5% significance level. Answer the following questions:

  • Calculate Cohen’s d and interpret it.
  • Student’s-t
  • \(H_{0}: \mu_{1} = \mu_{2}\) Null hypothesis: the means of the weighted alphas are equal.
  • \(H_{a}: \mu_{1} \neq \mu_{2}\) Alternative hypothesis : the means of the weighted alphas are not equal.
  • \(p\text{-value} = 0.8787\)
  • Do not reject the null hypothesis

Figure 10.2.4.

  • \(d = 0.040\), Very small, because 0.040 is less than Cohen’s value of 0.2 for small effect size. The size of the difference of the means of the weighted alphas for the two regions of banks is small indicating that there is not a significant difference between their trends in stocks.
  • Data from Graduating Engineer + Computer Careers. Available online at http://www.graduatingengineer.com
  • Data from Microsoft Bookshelf .
  • Data from the United States Senate website, available online at www.Senate.gov (accessed June 17, 2013).
  • “List of current United States Senators by Age.” Wikipedia. Available online at http://en.wikipedia.org/wiki/List_of...enators_by_age (accessed June 17, 2013).
  • “Sectoring by Industry Groups.” Nasdaq. Available online at http://www.nasdaq.com/markets/barcha...&base=industry (accessed June 17, 2013).
  • “Strip Clubs: Where Prostitution and Trafficking Happen.” Prostitution Research and Education, 2013. Available online at www.prostitutionresearch.com/ProsViolPosttrauStress.html (accessed June 17, 2013).
  • “World Series History.” Baseball-Almanac, 2013. Available online at http://www.baseball-almanac.com/ws/wsmenu.shtml (accessed June 17, 2013).

Chapter Review

Two population means from independent samples where the population standard deviations are not known

  • Random Variable: \(\bar{X}_{1} - \bar{X}_{2} =\) the difference of the sampling means
  • Distribution: Student's t -distribution with degrees of freedom (variances not pooled)

Formula Review

Standard error: \[SE = \sqrt{\dfrac{(s_{1}^{2})}{n_{1}} + \dfrac{(s_{2}^{2})}{n_{2}}}\]

Test statistic ( t -score): \[t = \dfrac{(\bar{x}_{1}-\bar{x}_{2}) - (\mu_{1}-\mu_{2})}{\sqrt{\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}}}\]

Degrees of freedom:

  • \(s_{1}\) and \(s_{2}\) are the sample standard deviations, and n 1 and n 2 are the sample sizes.
  • \(x_{1}\) and \(x_{2}\) are the sample means.

Confidence Interval:

Cohen’s \(d\) is the measure of effect size:

\[d = \dfrac{\bar{x}_{1} - \bar{x}_{2}}{s_{\text{pooled}}}\]

\[s_{\text{pooled}} = \sqrt{\dfrac{(n_{1} - 1)s^{2}_{1} + (n_{2} - 1)s^{2}_{2}}{n_{1} + n_{2} - 2}}\]

  • The domain of the random variable (RV) is not necessarily a numerical set; the domain may be expressed in words; for example, if \(X =\) hair color, then the domain is {black, blond, gray, green, orange}.
  • We can tell what specific value x of the random variable \(X\) takes only after performing the experiment.

Contributors

Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/[email protected] .

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Two-sample test for multivariate normal distributions under the assumption that means are the same

Let $\{x_i\}_{i=1}^n$ be a sample from a multivariate Gaussian distribution ${\cal N}(0, \Sigma_X)$ and $\{y_i\}_{i=1}^m$ be a sample from ${\cal N}(0, \Sigma_Y)$.

Are there hypothesis tests for $\Sigma_X = \Sigma_Y$? Pointers to relevant literature would be very appreciated.

  • hypothesis-testing
  • multivariate-analysis

chl's user avatar

The Mauchly's test allows to test if a given covariance matrix is proportional to a reference (identity or other) and is available through mauchly.test() under R. It is mostly used in repeated-measures design (to test (1) if the dependent variable VC matrices are equal or homogeneous, and (2) whether the correlations between the levels of the within-subjects variable are comparable--altogether, this is known as the sphericity assumption ).

Box’s M statistic is used (in MANOVA or LDA) to test for homogeneity of covariance matrices, but as it is very sensitive to normality it will often reject the null ( R code not available in standard packages).

Covariance structure models as found in Structural Equation Modeling are also an option for more complex stuff (although in multigroup analysis testing for the equality of covariances makes little sense if the variances are not equal), but I have no references to offer actually.

I guess any textbook on multivariate data analysis would have additional details on these procedures. I also found this article for the case where normality assumption is not met:

Aslam, S and Rocke, DM. A robust testing procedure for the equality of covariance matrices , Computational Statistics & Data Analysis 49 (2005) 863-874

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged hypothesis-testing multivariate-analysis or ask your own question .

  • Featured on Meta
  • Upcoming sign-up experiments related to tags

Hot Network Questions

  • What is the time-travel story where "ugly chickens" are trapped in the past and brought to the present to become ingredients for a soup?
  • Why was the 1540 a computer in its own right?
  • What are the approaches of protecting against partially initialized objects?
  • How to implement separate compilation?
  • Scheme interpreter in C
  • End Punctuation Checking using LuaLaTeX (like Reg-Ex)
  • What was God's original plan for humanity prior to the fall?
  • I feel like doing a PhD is my only option but I am not excited about it. What can I do to fix my life?
  • Is there any position where giving checkmate by En Passant is a brilliant move on Chess.com?
  • SetOptions does not work for ContourStyle of ContourPlot3D?
  • Is this homebrew "Firemind Dragonborn" race overpowered?
  • If the supernatural were real, would we be able to study it scientifically?
  • Eye Spy: Where are the other two (of six) vehicles docked to the ISS in this Maxar image (& what are they?)
  • Why is nonzero net charge density incompatible with the cosmological principle?
  • If humans colonized Earth 100,000 years ago, would we know it?
  • Should I tell my class I am neurodiverse?
  • *Friendly* coloring of a digraph
  • Tips on removing solder stuck in very small hole
  • I buy retrocomputing devices, but they set my allergies off. Any advice on sanitizing new acquisitions?
  • Looking at buying house with mechanical septic system. What questions should I be asking?
  • Maximum Power Transfer Theorem Question
  • How can student grades from different countries (e.g., India and China) be compared when applying for academic positions?
  • Are fiber glass perfboards more durable than phenolic perfboards?
  • What is the story about Merlin, which includes Gorlois' captain Brithael?

hypothesis testing two normal distributions

Normal Hypothesis Testing ( Edexcel A Level Maths: Statistics )

Revision note.

Amber

Normal Hypothesis Testing

How is a hypothesis test carried out with the normal distribution.

  • The population mean is tested by looking at the mean of a sample taken from the population
  • A hypothesis test is used when the value of the assumed population mean is questioned
  • Make sure you clearly define µ before writing the hypotheses, if it has not been defined in the question
  • The null hypothesis will always be H 0 : µ = ...
  • The alternative hypothesis will depend on if it is a one-tailed or two-tailed test
  • The alternative hypothesis, H 1   will be H 1 :   µ > ... or  H 1 :   µ < ...
  • The alternative hypothesis, H 1   will be H 1 :   µ ≠ ..
  • Remember that the variance of the sample mean distribution will be the variance of the population distribution divided by n
  • the mean of the sample mean distribution will be the same as the mean of the population distribution
  • The normal distribution will be used to calculate the probability of the observed value of the test statistic taking the observed value or a more extreme value
  • either calculating the probability of the test statistic taking the observed or a more extreme value ( p – value ) and comparing this with the significance level
  • Finding the critical region can be more useful for considering more than one observed value or for further testing

How is the critical value found in a hypothesis test for the mean of a normal distribution?

  • The probability of the observed value being within the critical region, given a true null hypothesis will be the same as the significance level
  • To find the critical value(s) find the distribution of the sample means, assuming H 0 is true, and use the inverse normal function on your calculator
  • For a two-tailed test you will need to find both critical values, one at each end of the distribution

Can I use the standard normal distribution, Z , to perform a hypothesis test?

  • Find the critical value(s) for the Z distribution using the percentage points table
  • If the z-value is further away from 0 than the critical value then reject H 0
  • Step 1.  Find the distribution of the sample means, assuming H 0 is true
  • Step 3.  Use percentage points table to find the z - value for which the probability of Z being equal to or more extreme than the value is equal to the significance level
  • Step 4.  Equate this value to your expression found in step 2
  • The symmetry of the normal distribution means that the z - values will have the same absolute value
  • Check that the two critical values are the same distance from the mean

What steps should I follow when carrying out a hypothesis test for the mean of a normal distribution?

  • Following these steps will help when carrying out a hypothesis test for the mean of a normal distribution:
  • Step 2.  Write the null and alternative hypotheses clearly using the form

H 0 : μ = ...

H 1 : μ ... ...

  • Step 4.    Calculate either the critical value(s) or the p – value (probability of the observed value) for the test
  • Step 5.    Compare the observed value of the test statistic with the critical value(s) or the p - value with the significance level
  • Step 6.    Decide whether there is enough evidence to reject H 0 or whether it has to be accepted
  • Step 7.  Write a conclusion in context
  • Alternatively, if you have used the standard normal distribution method then in steps 4 and 5 you could compare the z – value corresponding to the observed value with the z – value corresponding to the critical value

Worked example

5-3-2-hypothesis-nd-we-solution-part-1

  • Use a diagram to help, especially if looking for the critical value and comparing this with an observed value of a test statistic

You've read 0 of your 0 free revision notes

Get unlimited access.

to absolutely everything:

  • Downloadable PDFs
  • Unlimited Revision Notes
  • Topic Questions
  • Past Papers
  • Model Answers
  • Videos (Maths and Science)

Join the 100,000 + Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Author: Amber

Amber gained a first class degree in Mathematics & Meteorology from the University of Reading before training to become a teacher. She is passionate about teaching, having spent 8 years teaching GCSE and A Level Mathematics both in the UK and internationally. Amber loves creating bright and informative resources to help students reach their potential.

  • Statistical Analysis
  • Mathematical Sciences
  • Hypothesis Testing

Hypothesis testing for normal distributions: a unified framework and new developments

  • January 2020
  • Statistics and its Interface 13(2):167-179
  • 13(2):167-179
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Tiejun Tong at Hong Kong Baptist University

  • Hong Kong Baptist University

Average type I errors of our proposed test with one mean known under the exact null distribution (EX) and the approximate null distribution (AP) and the P&N test.

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Ladislav Vagner
  • Jiahao Yang
  • Jianjun Chen
  • Jiajun Zhang
  • ANN OPER RES

Suparat Niwitpong

  • Roger S. Pinkham
  • S. S. Wilks
  • TECHNOMETRICS
  • Josef Schmee
  • T. W. Anderson

Sung Huhn Kim

  • G. K. Robinson

Youn-Min Chou

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

COMMENTS

  1. 5.3.2 Normal Hypothesis Testing

    A two-tailed test would test to see if the value of µ has changed. The alternative hypothesis, H 1 will be H 1 : µ ≠ .. To carry out a hypothesis test with the normal distribution, the test statistic will be the sample mean,

  2. Normal Distribution Hypothesis Tests

    When to do a Normal Hypothesis Test. There are two types of hypothesis tests you need to know about: binomial distribution hypothesis tests and normal distribution hypothesis tests.In binomial hypothesis tests, you are testing the probability parameter p.In normal hypothesis tests, you are testing the mean parameter \mu.This gives us a key difference that we can use to determine what test to ...

  3. Tests in the Two-Sample Normal Model

    In this section, we will study hypothesis tests in the two-sample normal model and in the bivariate normal model. ... From properties of normal samples, \( M(\bs{X}) \) has a normal distribution with mean \( \mu \) and variance \( \sigma^2 / m \) and similarly \( M(\bs{Y}) \) has a normal distribution with mean \( \nu \) and variance \( \tau^2 ...

  4. PDF Chapter 9 Chapter 9: Hypothesis Testing

    Chapter 9 9.6 Comparing the Means of Two Normal Distributions Power function is now a function of 3 parameters: ˇ( 1; 2;˙2j ) The two-sample t-test is a likelihood ratio test (see p. 592) Important difference: Paired t test vs. two sample t test Two-sample t test with unequal variances Proposed test-statistics do not have known distribution, but

  5. PDF Hypothesis Testing Based on Two Samples

    3 intuitively, if the null hypothesis was correct, the samples means X„ m and Y„n should be very close. Therefore, we can use T as the test statistic, and the distribution of the statistic T is tm+n¡2 when H0 is correct. Now suppose we want to test the hypotheses at the signiflcant level fi, that is under the null hypothesis „1 = „2, the probability of rejecting H0 is fi.

  6. 9.2: Tests in the Normal Model

    From these basic statistics we can construct the test statistics that will be used to construct our hypothesis tests. The following results were established in the section on Special Properties of the Normal Distribution. Define Z = M − μ σ /√n, T = M − μ S /√n, V = n − 1 σ2 S2. Z has the standard normal distribution.

  7. PDF Hypothesis testing: two samples

    Pearson chi2 Goodness of Fit Test. Assume there is a sample of size n from a population with k classes (e.g. 6 M&M colors) Null hypothesis H 0: class i has frequency f. in the population. Alternative hypothesis H 1: some population frequencies are inconsistent with f.

  8. PDF Hypothesis testing for normal distributions: a unified framework and

    A unified framework of hypothesis testing for normal distributions. (I): two-sample mean test; (II): two-sample variance test; (III): one-sample mean test, (IV): one-sample variance test; (V): the equality test of two normal distributions with one mean known; (VI): the equality test of two normal distributions with one variance known.

  9. PDF 10-1 Introduction

    A special case of the two-sample t-tests of Section 10-3 occurs when the observations on the two populations of interest are collected in pairs. 1j Each pair of observations, say (X 2j , X ), is taken under homogeneous conditions, but these conditions may change from one pair to another. The test procedure consists of analyzing the differences ...

  10. Chapter 5 Hypothesis Testing with Normal Populations

    5.1 Bayes Factors for Testing a Normal Mean: variance known. Now we show how to obtain Bayes factors for testing hypothesis about a normal mean, where the variance is known.To start, let's consider a random sample of observations from a normal population with mean \(\mu\) and pre-specified variance \(\sigma^2\).We consider testing whether the population mean \(\mu\) is equal to \(m_0\) or not.

  11. Two-sample hypothesis testing

    In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, ... For example, in many situations it may be assumed that the underlying distributions are normal distributions. In other cases the data are categorical, ...

  12. 9.4: Distribution Needed for Hypothesis Testing

    If you are testing a single population mean, the distribution for the test is for means: X¯ ∼ N(μx, σx n−−√) (9.4.1) (9.4.1) X ¯ ∼ N ( μ x, σ x n) or. tdf (9.4.2) (9.4.2) t d f. The population parameter is μ μ. The estimated value (point estimate) for μ μ is x¯ x ¯, the sample mean. If you are testing a single population ...

  13. 10: Hypothesis Testing with Two Samples

    Either the matched pairs have differences that come from a population that is normal or the number of difference; 10.6: Hypothesis Testing for Two Means and Two Proportions (Worksheet) A statistics Worksheet: The student will select the appropriate distributions to use in each case. The student will conduct hypothesis tests and interpret the ...

  14. 8.1.3: Distribution Needed for Hypothesis Testing

    If you are testing a single population mean, the distribution for the test is for means: X¯ ∼ N(μx, σx n−−√) (8.1.3.1) (8.1.3.1) X ¯ ∼ N ( μ x, σ x n) or. tdf (8.1.3.2) (8.1.3.2) t d f. The population parameter is μ μ. The estimated value (point estimate) for μ μ is x¯ x ¯, the sample mean. If you are testing a single ...

  15. Statistical test for the difference in mean for two normal distributions

    It sounds like you've measured two samples and want to hypothesis test if they are different from each other. In which case, use chi-squared test if population standard deviation is known (unlikely), in R it's chisq.test.Or if population standard deviation is unknown (very common) then use the t test instead. In R it's just t.test(vector1, vector2).If you go with the T test then your null ...

  16. PDF §5.1 HYPOTHESIS TESTS USING NORMAL DISTRIBUTIONS

    Definition (Normal Distribution) The normal distributions are a family of distribution curves. Each member of the family is specified by two parameters: the standard deviation, denoted by σ. normal distribution follows a bell-shaped curve. For shorthand we often use the notation N (μ, σ) to specify a normal distribution with parameters μ ...

  17. 9.3 Distribution Needed for Hypothesis Testing

    Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's t-distribution. (Remember, use a Student's t-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.)

  18. 9.2: Comparing Two Independent Population Means (Hypothesis test)

    This is a test of two independent groups, two population means. Random variable: ˉXg − ˉXb = difference in the sample mean amount of time girls and boys play sports each day. H0: μg = μb. H0: μg − μb = 0. Ha: μg ≠ μb.

  19. Data analysis: hypothesis testing: 4.1 The normal distribution

    A normal distribution is a probability distribution that is symmetric about the mean, indicating that data near the mean are more likely to occur than data far from it. In graph form, a normal distribution appears as a bell curve. The values in the x-axis of the normal distribution graph represent the z-scores. The test statistic that you wish ...

  20. Hypothesis test between two normal distributions

    Based on the observed values t1, t2, …, tn, we use the MAP rule to decide whether the normal distribution from which they were drawn has variance 1 or variance 4. The MAP rule decides that the underlying normal distribution has variance 1 if and only if |c1 n ∑ i = 1t2i + c2 n ∑ i = 1ti| < 1. Find the values of c1 ≥ 0 and c2 ≥ 0 such ...

  21. hypothesis testing

    Allingham and Rayner[1] suggest a test based off a Wald test for the differences (rather than the ratio) which has much better level-robustness than the F-test on the heavier-tailed-than-normal distributions considered (often almost as good as the Levene on level, but erring on the conservative side while Levene tends to exceed the level) with ...

  22. 10.2: Two Population Means with Unknown Standard Deviations

    Distribution for the test: Use tdf where df is calculated using the df formula for independent groups, two population means. Using a calculator, df is approximately 18.8462. Do not pool the variances. Calculate the test statistic and the p-value using a Student's t-distribution: t = − 3.1424 , p-value = 0.0054.

  23. Likelihood Ratio Test for Common Variance from Two Normal Distribution

    This case is simple enough that there might be a tractable exact distribution -- especially as it's a textbook example. Since the data are Normal with variance $\sigma^2_0$ under the null, the residual sum of squares after estimating two means has a null $\sigma^2_0\chi^2_{n+m-2}$ distribution.

  24. hypothesis testing

    The Mauchly's test allows to test if a given covariance matrix is proportional to a reference (identity or other) and is available through mauchly.test() under R. It is mostly used in repeated-measures design (to test (1) if the dependent variable VC matrices are equal or homogeneous, and (2) whether the correlations between the levels of the within-subjects variable are comparable--altogether ...

  25. 5.3.2 Normal Hypothesis Testing

    How is the critical value found in a hypothesis test for the mean of a normal distribution? The critical value(s) will be the boundary of the critical region. The probability of the observed value being within the critical region, given a true null hypothesis will be the same as the significance level; For an % significance level: In a one-tailed test the critical region will consist of % in ...

  26. Hypothesis testing for normal distributions: a unified framework and

    Hypothesis testing for normal distributions: a unifie d framework and new developments 169 2.1.3 Equality tests of two no rmal distributions In this section, we review the equality tests of tw o ...