Hypothesis Test of Two Population Proportions

Suppose you did a survey of employees at corporations in your country and found that out of \(1300\) full-time employees and \(290\) part-time employees, that \(40\%\) of the full-time employees and \(38\%\) of the part-time employees were putting aside at least twelve percent of their earnings as savings.  Could you draw any conclusions about the differences in savings habits between full-time and part-time employees?  Hypothesis testing to the rescue! This is an example of two population proportions, and here you will see how to do a hypothesis test and draw conclusions from this kind of sampling.

Get started

Millions of flashcards designed to help you ace your studies

Sign up for free

Achieve better grades quicker with Premium

PREMIUM
Karteikarten Spaced Repetition Lernsets AI-Tools Probeklausuren Lernplan Erklärungen Karteikarten Spaced Repetition Lernsets AI-Tools Probeklausuren Lernplan Erklärungen
Kostenlos testen

Geld-zurück-Garantie, wenn du durch die Prüfung fällst

Review generated flashcards

Sign up for free
You have reached the daily AI limit

Start learning or create your own AI flashcards

StudySmarter Editorial Team

Team Hypothesis Test of Two Population Proportions Teachers

  • 14 minutes reading time
  • Checked by StudySmarter Editorial Team
Save Article Save Article
Contents
Contents

Jump to a key chapter

    Hypothesis Test for the Difference of Two Population Proportions

    Let's start by listing what you know from the example at the start of this article.

    Population

    Population Proportion

    Sample Size

    Sample Proportion

    Full-time employees of corporations in your country.

    \(p_1 = \) proportion of all full-time employees who put aside at least twelve percent of their earnings in savings.

    \(n_1 = 1300\)

    \(\hat{p}_1 = 0.40\)

    Part-time employees of corporations in your country.

    \(p_2 = \) proportion of all part-time employees who put aside at least twelve percent of their earnings in savings.

    \(n_2 = 290\)

    \(\hat{p}_2 = 0.38\)

    It is clear looking at the table that the sample sizes are very different, and their sample proportions are different as well. However, it will be very rare for you to find an example where the sample proportions are the same. Why might the sample proportions be different, even if you might eventually be able to conclude that the proportion of people who put aside at least twelve percent of their earnings is the same between part-time and full-time employees?

    Differences that occur between two samples just by chance are called sampling variability.

    One of the main questions that a hypothesis test for two population proportions tries to answer is whether the difference in your sample proportions happens because of sampling variability or because of an actual difference in the populations.

    Comparing Two Population Proportions with Dependent Samples

    One of the assumptions you will need is that your samples are independent.

    Two samples are independent if picking members for one sample doesn't influence how members of the second sample are picked.

    In the example involving employees, picking a person who is a full-time employee doesn't influence who you pick as a part-time employee, so they are independent. That is very different from dependent samples.

    Two samples are dependent if picking members for one sample automatically determines the members of the second sample.

    If you were doing a study on twins then picking a twin for one sample would automatically put the other twin in the second sample. Twins are a common example of dependent samples. This is called matched-pair data, and it requires a different form of hypothesis testing than you will see here.

    Forming Your Hypothesis

    There are many ways that \(p_1\) can be different from \(p_2\). It might be that \(p_1 < p_2\), or that \(p_1>p_2\). Rather than try and list all of the ways they are different and do a hypothesis test for each, you can look at the difference between the two population proportions. In fact, a hypothesis test for two population proportions is often called a hypothesis test for the difference between two population proportions for this very reason!

    In this kind of hypothesis test, your null hypothesis will almost always be that the two population proportions are the same. If you state that in terms of their difference you get:

    \[ H_0:\; p_1 - p_2 = 0.\]

    Then there are three varieties of alternative hypotheses outlined in the next table.

    Question

    Alternative hypothesis

    Test Type

    Is \(p_1\) different from \(p_2\)?

    \(H_a:\; p_1 - p_2 \ne 0\)

    Two-tailed test.

    Is \(p_1\) smaller than \(p_2\)?

    \(H_a:\; p_1 - p_2 < 0\)

    Left-tailed test.

    Is \(p_1\) larger than \(p_2\)?

    \(H_a:\; p_1 - p_2 > 0\)

    Right-tailed test.

    Let's go back to the example from the start of this article.

    Your goal here is to figure out if full-time employees and part-time employees have different saving habits, so the hypotheses would be:

    \[ \begin{align} &H_0:\; p_1 -p_2 = 0 \\ & H_a: \; p_1-p_2 \ne 0, \end{align} \]

    and it would be a two-tailed test.

    Next, let's look at the test statistic for this type of hypothesis test.

    Significance Test Statistic for Two Population Proportions

    It is important that your samples are independent, or the test statistic will be different from the one shown here. Since you are using independent samples, remember that

    \[ \mu_{\hat{p}_1 - \hat{p}_2} = p_1 - p_2.\]

    For a reminder on why this is true, see the articles Transforming Random Variables and Combining Random Variables.

    For the standard deviation,

    \[ \sigma_{\hat{p}_1 - \hat{p}_2} = \sqrt{ \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2} }.\]

    For the savings example, you have that \(n_1 = 1300\), \(n_2 = 290\), \(\hat{p}_1 = 0.40\), and \(\hat{p}_2 = 0.38\). Calculating the mean of the sampling distribution \(\hat{p}_1 - \hat{p}_2 \) gives you:

    \[\begin{align} \mu_{\hat{p}_1 - \hat{p}_2} &= p_1 - p_2 \\ &= 0.40 - 0.38 \\ &= 0.02 \end{align}\]

    The standard deviation for \(\hat{p}_1 - \hat{p}_2 \) is:

    \[ \begin{align} \sigma_{\hat{p}_1 - \hat{p}_2} &= \sqrt{ \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2} } \\ &= \sqrt{ \frac{0.40(1-0.40)}{1300} + \frac{0.38(1-0.38)}{290} } \\ &= \sqrt{\frac{0.24}{1300} + \frac{0.2356}{290} } \\ &\approx 0.03157 \end{align} \]

    So far you have only assumed that the samples are independent. For the next part, you will need to assume that the sample sizes are large enough. If they are, you can use the Central Limit Theorem to get that your sampling distribution \(\hat{p}_1 - \hat{p}_2 \) is approximately normal.

    How do you know if your samples are large enough? If all four of the following conditions are satisfied, then your samples are large enough for the sampling distribution \(\hat{p}_1 - \hat{p}_2 \) to be approximately normal:

    • \[n_1\hat{p_1} \ge 10\].

    • \[n_2\hat{p_2} \ge 10\].

    • \[n_1(1-p_1) \ge 10\]. and

    • \[n_2(1-p_2) \ge 10\].

    It isn't too hard to check that the sample sizes in the savings example are large enough for the sampling distribution to be approximately normal.

    The last condition to use this type of hypothesis test is that your sample is less than \(10\%\) of the overall population. In this case, the sample size is certainly less than \(10\%\) of all of the people in your country, so this condition is satisfied as well.

    Z-test for Difference in Population Proportions

    When doing a hypothesis test for the difference in population proportions, a \(z\)-test is used. To do this, you will need to calculate the test statistic, which uses the difference in the two proportions. To make calculations a little easier, it is helpful to find:

    \[ \begin{align}\hat{p}_c &= \frac{\text{number of successes in the two samples} }{\text{total of the two sample sizes}} \\ &= \frac{n_1\hat{p_1} + n_2\hat{p_2} }{n_1 + n_2} \end{align}\]

    Combining counts to get an overall proportion is called pooling, and \(p_c\) is called the pooled (or combined) proportion.

    Going again back to the savings example, \(n_1 = 1300\), \(n_2 = 290\), \(\hat{p}_1 = 0.40\), and

    \(\hat{p}_2 = 0.38\), which means that:

    \[ \begin{align}\hat{p}_c &= \frac{n_1\hat{p_1} + n_2\hat{p_2} }{n_1 + n_2}. \\ &= \frac{1300(0.40)+ 290(0.38) }{1300+ 290} \\ &= \frac{630.2}{1590} \\ & \approx 0.3964 \end{align}\]

    As long as your null hypothesis is \(H_0:\; p_1 -p_2 = 0 \), the test statistic can be calculated using the formula:

    \[ z = \frac{\hat{p_1} - \hat{p_2} }{\sqrt{ \dfrac{\hat{p}_c (1-\hat{p}_c) }{n_1} +\dfrac{\hat{p}_c (1-\hat{p}_c) }{n_2} } }\]

    Calculating the test statistic for the savings example:

    \[ \begin{align} z &= \frac{\hat{p_1} - \hat{p_2} }{\sqrt{ \dfrac{\hat{p}_c (1-\hat{p}_c) }{n_1} +\dfrac{\hat{p}_c (1-\hat{p}_c) }{n_2} } } \\ &= \frac{0.40 - 0.38 }{\sqrt{ \dfrac{0.3964 (1-0.3964 ) }{1300} +\dfrac{0.3964 (1-0.3964 ) }{290} } } \\ & \approx 0.63,\end{align} \]

    Rounded to \(2\) decimal places.

    Let's finish up the hypothesis test for the savings example. No significance level was given, so you will need to consider the Type I and Type II error consequences. See Errors in Hypothesis Testing for more information and examples. In this example, a Type I error would be deciding that the savings proportions are not the same for the two groups when in fact they are the same.

    A Type II error would be not thinking there is a difference in the population proportion between the two groups when in fact they are not the same. Neither error is very bad (unlike in a medical trial where the type of error is of much more importance) so choosing a significance level of \(\alpha = 0.05\) would be fine.

    Remember that this is a two-tailed test! So the \(P\)-value is twice the area under the \(z\)-curve and to the right of the \(z\)-value. In other words:

    \[ \begin{align} P\text{-value} &= 2(\text{area under curve to the right of }0.63) \\ &= 2\cdot P(z>0.63) \\ &= 2(0.2643) \\ &\approx 0.529 \end{align} \]

    The \(P\)-value is greater than the significance level of \(\alpha = 0.05\), so you will fail to reject the null hypothesis.

    Remember that you never say things like "the null hypothesis is true". For a reminder on why, see the article Hypothesis Testing.

    Communicating your conclusion can be the most challenging part of doing a hypothesis test. What does it mean to fail to reject the null hypothesis?

    Solution:

    The original goal was to find out if there is a difference in savings habits between full-time and part-time employees at corporations in your country. The null hypothesis is that there is no difference in the savings habits between the two groups. In failing to reject the null hypothesis, what you are saying is that there is no convincing evidence that there is a difference in savings habits between full-time and part-time employees.

    Why was there a difference in the population proportions then? It might have been from sampling variability. All you can say from the sample proportions is that you are not convinced there is a difference between the two sampling proportions.

    Hypothesis Testing of Two Population Proportions Example

    Let's look at another example of hypothesis testing for the difference in two population proportions.

    Many bulldog owners report that their pet snores, and in fact, their bulldog snores more frequently as it gets older.

    Hypothesis Tests for Two Population Proportions sleeping bulldog StudySmarterSleeping bulldog puppy.

    You have decided to do a test to see if this is actually true or maybe just a matter of perception. So you break down bulldogs into two groups, those under three years of age and those over three years of age, and choose a random sample of \(700\) bulldog owners to ask them about their dog's snoring. From the survey responses (not everyone responds to surveys), you create the following table:

    Population

    Population Proportion

    Sample Size

    Sample Proportion

    Bulldogs under the age of \(3\).

    \(p_1 = \) proportion of bulldogs under the age of \(3\) who snore more than five times a week.

    \(n_1 = 300\)

    \(\hat{p}_1 = 0.26\)

    Bulldogs over the age of 3.

    \(p_2 = \) proportion of bulldogs over the age of \(3\) who snore more than five times a week.

    \(n_2 = 291\)

    \(\hat{p}_2 = 0.392\)


    Before going any further, let's check to make sure that the conditions for doing a hypothesis test for two population proportions are satisfied. First, the samples are independent since a bulldog can't be both under \(3\) years old and over \(3\) years old at the same time. In addition, there are certainly far more than \(591\) people worldwide that own bulldogs, so the number of bulldog owners sampled is less than \(10\%\) of the overall population of people who own bulldogs. Also,

    • \(n_1\hat{p_1} = 300(0.26)=78 \ge 10\),

    • \(n_2\hat{p_2} = 291(0.392) = 114 \ge 10\).

    • \(n_1(1-p_1) = 300(1-0.26) = 222 \ge 10\)

    • \(n_2(1-p_2) = 291(1-0.392) = 176.9 \ge 10\).

    so all of the conditions for applying the test are met.

    The next step is deciding on the null and alternative hypotheses. The null hypothesis would be:

    \[ H_0: \; p_2-p_1 = 0\]

    or in other words that there is no difference in snoring between the two groups. The alternative hypothesis would be that there is a difference in the snoring rates of the two groups, so:

    \[H_a:\; p_2-p_1 \ne 0\]

    Calculating the pooled success rate (sometimes called the combined success rate):

    \[ \begin{align}\hat{p}_c &= \frac{n_1\hat{p_1} + n_2\hat{p_2} }{n_1 + n_2} \\ &= \frac{300(0.26)+291(0.392)}{300+291} \\ &\approx 0.325 . \end{align}\]

    Then the test statistic is:

    \[\begin{align} z &= \frac{\hat{p_2} - \hat{p_1} }{\sqrt{ \dfrac{\hat{p}_c (1-\hat{p}_c) }{n_1} +\dfrac{\hat{p}_c (1-\hat{p}_c) }{n_2} } } \\ &= \frac{ 0.392 - 0.26 }{\sqrt{ \dfrac{0.325 (1-0.325) }{300} +\dfrac{0.325 (1-0.325) }{291} } } \\ &\approx 3.425 \end{align}\]

    Notice that here you are using \(p_2-p_1\) as the null hypothesis simply for the convenience of having \(\hat{p_2} - \hat{p_1} \) be positive. It actually doesn't matter which version you choose for the null hypothesis, as long as you are consistent throughout your work and you make sure your \(z\) calculation matches.

    Remember that this is a two-tailed test! So the \(P\)-value is twice the area under the \(z\)-curve and to the right of the \(z\)-value. In other words:

    \[ \begin{align} P\text{-value} &= 2(\text{area under curve to the right of }3.425) \\ &= 2\cdot P(z>3.425) \\ &\approx 2(0.0003) \\ &= 0.0006, \end{align} \]

    where the value of \(P(z>3.425)\) can be found using a standard normal table or calculator.

    So at a \(\alpha = 0.05\) significance level, you can reject the null hypothesis, and conclude that there is a difference in bulldog snoring based on age.

    Would your conclusion have been any different if the alternative hypothesis had been:

    \[H_a:\; p_2-p_1 > 0?\]

    Solution:

    The main change would have been in calculating the \(P\)-value. Since it would be a one-tailed test, in this case, the calculation would be:

    \[ \begin{align} P\text{-value} &= \text{area under curve to the right of }3.425 \\ &= P(z>3.425) \\ &\approx 0.0003 \end{align} \]

    At the \(\alpha = 0.05\) significance level, you would still reject the null hypothesis and conclude that bulldogs over the age of \(3\) do snore more than bulldogs under the age of \(3\).

    Hypothesis Test of Two Population Proportions - Key takeaways

    • Two samples are independent if picking members for one sample doesn't influence how members of the second sample are picked.
    • Two samples are dependent if picking members for one sample automatically determines the members of the second sample.
    • For a hypothesis test for two population proportions, the null hypothesis will almost always be that the two population proportions are the same.
    • The conditions for applying a hypothesis test for the difference of two population proportions are:
      • The samples are independent.
      • The sample is less than \(10\%\) of the overall population.
      • \(n_1\hat{p_1} \ge 10\), \(n_2\hat{p_2} \ge 10\), \(n_1(1-p_1) \ge 10\), and \(n_2(1-p_2) \ge 10\) where \(n_1\) is the size of the first sample, \(n_2\) is the size of the second sample, \(p_1\) is the proportion of successes in the first sample, and \(p_2\) is the proportion of successes in the second sample.
    • The pooled proportion formula is \[ \begin{align}\hat{p}_c &= \frac{\text{number of successes in the two samples} }{\text{total of the two sample sizes}} \\ &= \frac{n_1\hat{p_1} + n_2\hat{p_2} }{n_1 + n_2}. \end{align}\]
    • The formula for the test statistic is \[ z = \frac{\hat{p_1} - \hat{p_2} }{\sqrt{ \dfrac{\hat{p}_c (1-\hat{p}_c) }{n_1} +\dfrac{\hat{p}_c (1-\hat{p}_c) }{n_2} } }\]
    Frequently Asked Questions about Hypothesis Test of Two Population Proportions

    How to find p value for difference in proportions? 

    First you will need to find the pooled success proportion, then use the formula for the test statistic. 

    How to compare percentages statistically? 

    You would use a hypothesis test for two population proportions, also called a hypothesis test for the difference of population proportions.

    When to use a two proportion z test? 

    When you have two independent populations, where the sample size is less than 10% of the overall population, and there are more than 10 successes and failures in each of the two samples.

    What is proportion test? 

    It is a statistical inference test.  It can be done with a single population proportion, or as a difference of two population proportions. 

    How to tell if proportions are equal? 

    Perform a hypothesis test for the difference of two populations proportions.

    Save Article

    Test your knowledge with multiple choice flashcards

    When you are looking at population proportions, and choosing a member for one sample automatically chooses a member for the second sample, the samples are called ____.

    When you are doing a hypothesis test for two population proportions, \(p_1\) and \(p_2\), what is your null hypothesis usually going to be?

    Which of the following is a condition to do a hypothesis test for two population proportions?

    Next

    Discover learning materials with the free StudySmarter app

    Sign up for free
    1
    About StudySmarter

    StudySmarter is a globally recognized educational technology company, offering a holistic learning platform designed for students of all ages and educational levels. Our platform provides learning support for a wide range of subjects, including STEM, Social Sciences, and Languages and also helps students to successfully master various tests and exams worldwide, such as GCSE, A Level, SAT, ACT, Abitur, and more. We offer an extensive library of learning materials, including interactive flashcards, comprehensive textbook solutions, and detailed explanations. The cutting-edge technology and tools we provide help students create their own learning materials. StudySmarter’s content is not only expert-verified but also regularly updated to ensure accuracy and relevance.

    Learn more
    StudySmarter Editorial Team

    Team Math Teachers

    • 14 minutes reading time
    • Checked by StudySmarter Editorial Team
    Save Explanation Save Explanation

    Study anywhere. Anytime.Across all devices.

    Sign-up for free

    Sign up to highlight and take notes. It’s 100% free.

    Join over 22 million students in learning with our StudySmarter App

    The first learning app that truly has everything you need to ace your exams in one place

    • Flashcards & Quizzes
    • AI Study Assistant
    • Study Planner
    • Mock-Exams
    • Smart Note-Taking
    Join over 22 million students in learning with our StudySmarter App
    Sign up with Email