Comparing Two Means Hypothesis Testing

When facing different scenarios, you will need to adapt your hypothesis testing method. One scenario that frequently arises is one where you wish to test whether there is a difference between two means. You might have done this already using the normal distribution. But what happens if you don't know the variances of these populations and your sample sizes are small?

Get started

Millions of flashcards designed to help you ace your studies

Sign up for free

Need help?
Meet our AI Assistant

Upload Icon

Create flashcards automatically from your own documents.

   Upload Documents
Upload Dots

FC Phone Screen

Need help with
Comparing Two Means Hypothesis Testing?
Ask our AI Assistant

Review generated flashcards

Sign up for free
You have reached the daily AI limit

Start learning or create your own AI flashcards

StudySmarter Editorial Team

Team Comparing Two Means Hypothesis Testing Teachers

  • 11 minutes reading time
  • Checked by StudySmarter Editorial Team
Save Article Save Article
Contents
Contents

Jump to a key chapter

    That's where the \(t\)-distribution comes in. This article will take you through a hypothesis test for the difference in means of two independent, normally distributed populations.

    Comparing Two Means: Hypothesis Testing Overview

    The \(t\)-distribution can also be used to test the means of two independent normal distributions when the variances are unknown and the sample sizes are small. To do so, you will need to assume the populations have the same variance and therefore need to use a pooled estimate of variance.

    For a reminder on the \(t\)-distribution and its properties, see the article T-distribution.

    Unlike the paired \(t\)-test, where you are comparing the results of an experiment before and after some treatment, here you are comparing two independent normal distributions.

    Describe the kind of hypothesis test would you use in the following scenarios.

    1. A mobile phone company has released a new software update. They have asked you to find statistical evidence to support their claim that the software update has improved battery life.

    2. A pet store sells Welsh Corgi puppies from two different breeders. They wish to determine whether there is a significant difference between the weights of the puppies from each breeder.

    Solution

    1. In order to conduct this experiment, you would need to collect samples of information on phone battery life before and after the software update. Since the samples will be taken from the same population after a change has been made, they are not independent. Therefore, you need to use a paired t-test.

    2. In this case, you would be required to take samples of weights from two different breeders and therefore two independent distributions. You should assume that the populations have the same variances, therefore you will need to use a pooled estimate of variance to find the t-value and not a paired t-test.

    Hypothesis testing for the difference of two means

    The hypothesis test for the difference of two means follows these steps:

    1. Find the null hypothesis and alternative hypothesis, \(H_0\) and \(H_1\).

    2. Determine the significance level from the questions, \(\alpha\).

    3. Determine the number of degrees of freedom, \(\upsilon\).

    4. Find the critical region.

    5. Calculate the pooled estimate of the variance, \(s^2_p\).

    6. Calculate \(t\).

    7. Compare the value of \(t\) with your critical region and state your conclusion, addressing whether the result is significant, and what this means in the context of the question.

    Next let's take a look at the hypotheses you will need to do the test.

    Null hypothesis for comparing two means

    While comparing two means, your null hypothesis will state that the difference between the two populations you are testing is equal to zero. In other words, the null hypothesis is that there is no difference in the population means.

    Samples are taken from two distributions, \(X\) and \(Y\), under the assumption that they are independent and normally distributed.

    To perform a hypothesis test for the difference between the means of these distributions, use the following null hypothesis,

    \[H_0:\, \mu _x =\mu _y.\]

    What about the alternative hypothesis?

    Alternative hypothesis for comparing two means

    The alternative hypothesis for comparing two means will depend on whether you wish to test whether one particular distribution is greater than the other (a one-tailed test), or simply whether there is any difference at all (a two-tailed test).

    When using a two-tailed test, remember to divide the significance level between the two tails!

    Remember to read the question carefully to determine which sort of alternative hypothesis to use.

    Samples are taken from two distributions, \(X\) and \(Y\), under the assumption that they are independent and normally distributed.

    In the case that you wish to test whether the means are different (that is a two-tailed test), you will have the following alternative hypothesis,

    \[H_1:\, \mu _x \neq \mu _y.\]

    In the case that you wish to test whether the mean of \(X\) is greater than the mean of \(Y\) (that is a one-tailed test), you will have the following alternative hypothesis,

    \[H_1:\, \mu _x > \mu _y.\]

    Next let's see some of the calculations involved.

    Comparing Two Population Means Hypothesis Testing: Calculations

    When testing for the difference between means, there are some extra calculations that you'll need to perform to find the pooled estimate of the variance and the value of \(t\) that you wish to test.

    Using sample variances, \(s^2_x\) and \(s^2_y\), and the size of each sample, \(n_x\) and \(n_y\), the pooled estimate of the variance is given by the formula

    \[s^2_p=\frac{(n_x-1)s^2_x+(n_y-1)s^2_y}{(n_x-1)+(n_y-1)}.\]

    Once you have found \(s^2_p\), you will need to find the \(t\)-critical value that goes with it.

    Given samples means and variances \(\bar{x}\), \(\bar{y}\), \(s^2_x\) and \(s^2_y\) and the pooled estimate of variance \(s^2_p\), the \(t\)-critical value, \(t^*\) is:

    \[t^*=\frac{(\bar{x}-\bar{y})-(\mu _x - \mu _y)}{\sqrt{s^2_p\left(\dfrac{1}{n_x}+\dfrac{1}{n_y}\right)}}.\]

    Hypothesis Testing Two Population Means Examples

    Next, let's look at a couple of examples on how to use and calculate these statistics within an actual hypothesis test.

    A pet store sells Welsh Corgi puppies on behalf of two puppy breeders, \(X\) and \(Y\). They have sampled the weights of puppies from each breeder.

    Hypothesis Test for the Difference Between Two Means six Corgi puppies all in a row StudySmarterFig. 1 - Puppies always make math better!

    Weights of puppies from breeder \(X\) in kilograms: \(5.44,5.32,5.21,5.67.\)

    Weights of puppies from breeder \(Y\) in kilograms: \(5.02,4.99,5.42,5.21,5.11.\)

    The pet store wishes to know whether there is a statistically significant difference between the weights of the puppies from each breeder.

    a. If you wanted to test the difference in the weights of the puppies, what assumptions need to be made?

    b. Test whether the mean weights of puppies from the two breeders is different at the \(10\%\) confidence level.

    Solution

    a. In order to test the difference in the weights of the puppies, the assumptions to be made are that the samples of puppies are normally distributed, independent and have the same variances.

    b. The test is two-tailed, so the hypotheses are,

    \[ \begin{align} &H_0:\, \mu _x=\mu _y \\ &H_1: \,\mu _x \neq \mu _y.\end{align}\]

    This is a two-tailed test since the alternative hypothesis is that the mean weights are different. The significance level is \(10\)%, so the critical region will have the probability of \(0.05\) in each tail of the distribution.

    The number of degrees of freedom is

    \[\upsilon = (4-1)+(5-1)=7.\]

    To find degrees of freedom in this case, you need to add together the degrees of freedom from each sample. Or, you can use the formula \(\upsilon = n_x+n_y-2\).

    The critical value can be found using a calculator or probability tables:

    \[t_{\upsilon =7}(0.05)=1.895.\]

    Next, find the pooled estimate of variance. You should have \(\bar{x}=5.41\) and \(\bar{y}=5.17.\)

    The samples variances are \(s^2_x=0.038866667 \) and \(s^2_y=0.03015\).

    Therefore, the pooled estimate of variance is,

    \[\begin{align} s^2_p &= \frac{(n_x-1)s^2_x+(n_y-1)s^2_y}{(n_x-1)+(n_y-1)} \\&= \frac{(4-1)0.038867 +(5-1)0.03015 }{(4-1)+(5-1)} \\&=0.033886 \text{ to 5 s.f.} \end{align}\]

    Your value of \(t^*\) is then:

    \[\begin{align} t&=\frac{(\bar{x}-\bar{y})-(\mu _x - \mu _y)}{\sqrt{s^2_p\left(\dfrac{1}{n_x}+\dfrac{1}{n_y}\right)}}\\&=\dfrac{(5.41-5.17)-(0)}{\sqrt{0.033886\left(\dfrac{1}{4}+\dfrac{1}{5}\right)}}\\&=1.9435\end{align}\]

    Since \(t^*=1.9435>1.895=t_\upsilon\), your value of \(t^*\) falls within the critical region. Therefore, at the \(10\)% significance level, you can reject the null hypothesis.

    In conclusion, there is evidence to suggest there is a difference between the means of the weights of Welsh Corgi puppies from the two breeders.

    This second example is slightly different to the first. The method will need to be adapted slightly.

    A food delivery service, \(A\), claims that their average food delivery time is more than \(5\) minutes faster than the delivery time of their competitor, \(B\).

    A random sample of delivery times from each company is collected:

    • Food delivery time for \(A\), in minutes: \(22,16,45,23,39,32.\)
    • Food delivery time for \(B\), in minutes: \(34,42,63,18,25,46,47.\)

    Food delivery service \(B\) hires you to test whether this claim is statistically significant at the \(10\%\) significance level. Complete a hypothesis test for the difference between means and explain what this means for the two food delivery services.

    Solution

    Since the samples are independent the null hypothesis would normally be that the two means are the same. However the claim is that service \(A\) averages \(5\) minutes faster than their competitor, so the null hypothesis is instead \(\mu _A=\mu _B -5 \). Since you are only interested in whether the food delivery time is greater for one service, the hypotheses are:

    \[ \begin{align} &H_0:\,\mu _A=\mu _B -5 \\ &H_1: \,\mu_A < \mu _B-5. \end{align}\]

    This is a one-tailed test.The significance level is \(10\)%, so the critical region will have the probability of \(0.10\) in the left tail of the distribution.

    The number of degrees of freedom are

    \[\upsilon = (6-1)+(7-1)=11.\]

    The critical value can be found using a calculator or probability tables,

    \[t_{\upsilon =11}(0.10)=1.363.\]

    Since you are only interested in whether \(\mu _a\) is less than \(\mu _b -5\), the critical value is \(t_\upsilon = -1.363\).

    If the alternative hypothesis had been greater than, you would have used \(t_\upsilon = 1.363\) instead.

    Next, find the pooled estimate of variance. You have \(\bar{a}=29.5\) and \(\bar{b}=39.3\). The samples variances are \(s^2_a=123.50 \) and \(s^2_b=226.57\). Therefore, the pooled estimate of variance is:

    \[\begin{align} s^2_p &= \frac{(n_a-1)s^2_a+(n_b-1)s^2_b}{(n_a-1)+(n_b-1)} \\&= \frac{(6-1)123.50 +(7-1)226.57 }{(6-1)+(7-1)} \\&=179.72\text{ to 5 s.f.} \end{align}\]

    The value of \(t^*\) is therefore,

    \[\begin{align} t^*&=\frac{(\bar{a}-\bar{b})-(\mu _a - \mu _b)}{\sqrt{s^2_p\left(\dfrac{1}{n_a}+\dfrac{1}{n_b}\right)}}\\&=\dfrac{(29.5 -39.3)-(-5)}{\sqrt{179.72 \left(\dfrac{1}{6}+\dfrac{1}{7}\right)}}\\&=-0.64357.\end{align}\]

    Since the null hypothesis states that \(\mu _x=\mu _y-5\), you will have \(\mu _x-\mu _y=-5\).

    Since \(t^*=-0.64357>-1.363=t_\upsilon \), the value of \(t\) falls within the acceptance region. Therefore, at the \(10\%\) significance level, you fail to reject the null hypothesis.

    This means that there is not sufficient evidence to suggest that delivery service \(A\) has a delivery time better than \(5\) minutes faster than delivery service \(B\).

    For a more detailed explanation of the pooled estimate of variance, check out the article Pooled Estimate of Variance.

    Comparing Two Means Hypothesis Testing - Key takeaways

    • The \(t\)-distribution can be used to test the means of two independent normal distributions when the variances are unknown
    • The assumptions are that the populations are independent, normal and have the same variance
    • The pooled estimate of variance formula is \[s^2_p=\frac{(n_x-1)s^2_x+(n_y-1)s^2_y}{(n_x-1)+(n_y-1)}.\]
    • The \(t^*\) value is \[t^*=\dfrac{(\bar{x}-\bar{y})-(\mu _x - \mu _y)}{\sqrt{s^2_p\left(\dfrac{1}{n_x}+\dfrac{1}{n_y}\right)}}.\]
    Frequently Asked Questions about Comparing Two Means Hypothesis Testing

    How do you compare two mean values? 

    It depends on if the samples are independent or not.  If they are not independent then you can use a paired t-test.  If they are independent then you can use a test for the difference of two means.

    Which hypothesis test is appropriate for comparing two sample means? 

    If the two samples are independent, then the null hypothesis is that the difference in their means is zero.

    Are 2 means significantly different? 

    The two means are significantly different if the \(t\)-critical value is outside the significance value selected for the hypothesis test.

    What is the null hypothesis for the comparison of two means using at test? 

    Assuming that the samples are independent, the null hypothesis will be that the difference in the means is zero.  The alternative hypothesis will depend on whether you want to see if one mean is larger that the other, or if they are just different from each other. 

    What is a comparison of means test? 

    A comparison of means test is a kind of hypothesis test done when you have two independent samples and it uses a pooled estimate of variance.

    Save Article

    Test your knowledge with multiple choice flashcards

    With sample variances \(s^2_x\) and \(s^2_y\), with sample sizes \(n_x\) and \(n_y\), the pooled estimate of the variance is:

    If you have the sample means and variances \(\bar{x}\), \(\bar{y}\), \(s^2_x\) and \(s^2_y\) and the pooled estimate of variance \(s^2_p\), the value of \(t\) is:

    Which of the following statement are true for paired \(t\)-tests and the test for the difference between two means?

    Next

    Discover learning materials with the free StudySmarter app

    Sign up for free
    1
    About StudySmarter

    StudySmarter is a globally recognized educational technology company, offering a holistic learning platform designed for students of all ages and educational levels. Our platform provides learning support for a wide range of subjects, including STEM, Social Sciences, and Languages and also helps students to successfully master various tests and exams worldwide, such as GCSE, A Level, SAT, ACT, Abitur, and more. We offer an extensive library of learning materials, including interactive flashcards, comprehensive textbook solutions, and detailed explanations. The cutting-edge technology and tools we provide help students create their own learning materials. StudySmarter’s content is not only expert-verified but also regularly updated to ensure accuracy and relevance.

    Learn more
    StudySmarter Editorial Team

    Team Math Teachers

    • 11 minutes reading time
    • Checked by StudySmarter Editorial Team
    Save Explanation Save Explanation

    Study anywhere. Anytime.Across all devices.

    Sign-up for free

    Sign up to highlight and take notes. It’s 100% free.

    Join over 22 million students in learning with our StudySmarter App

    The first learning app that truly has everything you need to ace your exams in one place

    • Flashcards & Quizzes
    • AI Study Assistant
    • Study Planner
    • Mock-Exams
    • Smart Note-Taking
    Join over 22 million students in learning with our StudySmarter App
    Sign up with Email