Find study content
Learning Materials

Discover learning materials by subject, university or textbook.

Explanations
All Subjects

Anthropology

Archaeology

Architecture

Art and Design

Bengali

Biology

Business Studies

Chemistry

Chinese

Combined Science

Computer Science

Economics

Engineering

English

English Literature

Environmental Science

French

Geography

German

Greek

History

Hospitality and Tourism

Human Geography

Japanese

Italian

Law

Macroeconomics

Marketing

Math

Media Studies

Medicine

Microeconomics

Music

Nursing

Nutrition and Food Science

Physics

Politics

Polish

Psychology

Religious Studies

Sociology

Spanish

Sports Sciences

Translation
Features
Features

Discover all of these amazing features with a free account.

Flashcards

StudySmarter AI

Notes

Study Plans

Study Sets

Exams
What’s new?

Flashcards
Study your flashcards with three learning modes.

Study Sets
All of your learning materials stored in one place.

Notes
Create and edit notes or documents.

Study Plans
Organise your studies and prepare for exams.
Resources
Discover

All the hacks around your studies and career - in one place.

Find a job

Student Deals

Magazine

Mobile App
Featured

Magazine
Trusted advice for anyone who wants to ace their studies & career.

Job Board
The largest student job board with the most exciting opportunities.

StudySmarter Deals
Verified student deals from top brands.

Our App
Discover our mobile app to take your studies anywhere.

Learning Materials

Features

Discover

Chi Square Test for Goodness of Fit

Q: What is the purpose of a goodness-of-fit test?

A Chi-square test for goodness of fit can be conducted to confirm or deny a hypothesis about the distribution of a categorical data set.

Q: When to use the chi-square test for goodness of fit?

The Chi-square test for goodness of fit can be used when you wish to test hypotheses about categorical data sets.

Q: What is a chi-square test for goodness of fit?

A chi-square test for goodness of fit can be conducted to confirm or deny a hypothesis about the distribution of a categorical data set.

Q: What conditions are necessary to use the Chi-square test for goodness of fit?

The sample method must be random The variable under study must be categorical The expected value of observations for each category must be at least five Each outcome in the variable under study must be independent.

Q: What are the limitations of applying for the chi-square test?

The Chi-square test for goodness of fit can only be conducted on data that meets the four conditions.

Q: Why are Chi-square tests for goodness-of-fit always right-tailed?

The Chi-square test for goodness of fit is right tailed because the numerator of the Chi-square test statistic is always positive

So, you've familiarized yourself with the concept of Chi-square distributions and been introduced to the concept of Chi-square tests. Well, now you've come to the good bit. Now it's time to learn how to actually apply these handy little concepts to perform actual statistical testing on sets of data. The first chi-square test that can be performed is the Chi-square test for goodness of fit. In this explanation, you'll learn how you can use this cool little test to check if a distribution actually occurs as projected in reality, or if the distribution, in reality, differs from the projection in a statistically meaningful way.

Get started

Fact Checked Content
Last Updated: 06.01.2023
13 min reading time

Content creation process designed by
Content cross-checked by
Content quality checked by

If you don't feel totally comfortable with the idea of a Chi-Square Distribution or the basic concept of Chi-Square Tests, don't sweat it, there are StudySmarter explanations for both!

No point waiting around then, let's dive into it!

Chi-Square Test for Goodness of Fit Definition

What is the Chi-square test for goodness of fit then? Well...

The Chi-square test for goodness of fit is a statistical hypothesis test used to determine whether an expected distribution of outcomes is significantly different from the actual observed distribution of outcomes.

This is a lot of talk of outcomes and distributions and all sorts of statistics talk, but what does it all mean?

Well, imagine if you rolled a \(6\)-sided die \(100\) times. You would expect it to land on each of the sides roughly an equal number of times.

If you actually carried this out and recorded the results, you could then use the Chi-squared goodness of fit test to check if the real-life data matched your expectation, within reasonable limits of course.

Useful right? Ok so now you're hopefully familiar with the what and the why of Chi-square tests for goodness of fit, now let's get into the good stuff. The how.

Chi-Square Test for Goodness of Fit Hypotheses

The Chi-square test for goodness of fit is a hypothesis test. This means that of course, it must start with a set of hypotheses.

Now, to conduct a hypothesis test like this, you need a null hypothesis and an alternative hypothesis.

A null hypothesis is a hypothesis that states that any statistical difference between populations is down to random chance. For instance

\(H_0:\) A flipped coin will land on heads \(50\%\) of the time.

If the null hypothesis proves false through the test, what will have been found? The alternative hypothesis

\(H_a:\) A flipped coin will not land on heads \(50\%\) of the time.

How does the Chi-square test for goodness of fit prove or disprove the null hypothesis? Well, it tests how likely the outcome of the sample is to have occurred if the null hypothesis is true. If the probability is low enough, the null hypothesis is considered false, and the alternative hypothesis must be true.

For instance, say your sample was \(100\) coin flips and you got the following result.

Heads	Tails
\(99\)	\(1\)

Table 1. Heads vs tails test.

If there was a \(50\%\) chance of flipping heads with each flip, as the null hypothesis states, then how likely would this result be? It makes sense intuitively that the probability is so low that it's bordering on impossible.

What about if you obtained these results?

Heads	Tails
\(58\)	\(42\)

Table 2. Heads vs tails test.

Well, this is a bit closer so it's hard to say, but using the Chi-square test for goodness of fit, it could be determined whether this result proved or disproved the null hypothesis.

Chi-Square Test for Goodness of Fit Assumptions and Conditions

The Chi-square test for goodness of fit is not appropriate to be used on all data. In fact, there are fconditions, (sometimes referred to as assumptions) that must hold true.

The sampling method is simple random sampling.
The variable under study is categorical.
The expected value of observations for each category must be at least five.
Each outcome in the variable under study must be independent.

Let's take a look at each of these conditions a little more closely.

Random Sampling

For the Chi-square test for goodness of fit, the sample being analyzed must have constituents that have been chosen at random.

Say you wished to try and predict the frequency at which different types of candy appear in a mixed bag. Well, if you wished to see if your prediction was accurate, you could potentially use a Chi-square test for goodness of fit only if the bags you take to check this are chosen completely at random.

Categorical Variable

What is a categorical variable? Well, let's take the example of the mixed bags of candy from before. Each of the candies in the bag can be categorized by what type of sweet it is. There is no inherent ordering to these categories, therefore the variable is categorical. If, for instance, your data categories were school years, the variable would be simply ordered from low to high, and thus an ordinal variable, not categorical.

Of these two examples of variables, only the candy example is categorical, and therefore only the candies can be tested using the Chi-squared for the goodness of fit test.

Expected Value

The next condition for a Chi-square test for goodness of fit is expected sample observations per category of at least five. This one is nice and simple. Basically, this test can only be used on large enough sample sizes. Your hypothesis might be that there is the same number of each sweet spread out amongst the bags. If your sample includes \(200\) candy and five types of candies, then the expected number of each sweet found in the sample would be \(40\). This is above five, and therefore meets this condition for the test

Outcome Independence

The final condition for the Chi-square test for goodness of fit is outcome independence. All this means is that the probability of each outcome is not affected by the outcomes that came before. For instance, when it comes to the bag of candies, each time a sweet is picked from a bag there is a \(\frac{1}{5}\) chance that it is a cola bottle. This is true no matter how many cola bottles have been picked before, or how many gummy bears. The previous outcomes have no effect on this one, so the outcomes are independent and the condition is met.

Formula for Chi-Square Goodness of Fit Test Statistic

Once the hypotheses have been formulated and the conditions confirmed to have been met, it's time to calculate the Chi-square test statistic. This is done with this simple formula

\[\chi^2 = \sum_{i=1}^n \frac{(O_i-E_i)^2}{E_i}\]

Where \(O_i\) is the \(i^{th}\) observed value and \(E_i\) is the \(i^{th} \) expected value.

For example, with the following expected and observed values, the calculation would be carried out as follows.

	Cola Bottle	Flying Saucer	Gummy Bear	Fruit Lace	Toffee
Expected	\(40\)	\(40\)	\(40\)	\(40\)	\(40\)
Observed	\(20\)	\(25\)	\(15\)	\(18\)	\(22\)

Table 3. expected and observed values, chi-square test.

\[\begin{align} \chi^2 &= \sum_{i=1}^n \frac{(O_i-E_i)^2}{E_i} \\\\ &= \frac{(20-40)^2}{40} + \frac{(25-40)^2}{40} + \frac{(15-40)^2}{40}+... \\\\ &= 51.45 \end{align} \]

Performing the Test for Goodness of Fit

Firstly, you will need to know the significance level, \(\alpha\). The significance level sets the strength of the evidence you require to be able to consider the null hypothesis proven. Often significance levels will be set at \(5\%\), (\(\alpha=0.05\)). A lower significance level indicates that a greater strength of evidence is required.

Secondly, you will need to know the number of degrees of freedom of the problem. The number of degrees of freedom is simply the number of independent groups the variable has. This value is just the number of groups \(-1\). For example, for a variable with five groups, the number of degrees of freedom is four.

The next step in the test is to either find the Chi-square value or the p-value. Either of these values can be used to complete the test.

Performing the Test With the Chi-Square Value

From the Chi-square table, you can find the Chi-square value for your test for the significance level and degrees of freedom of your specific problem. Below is a small segment of the table.

Degrees of Freedom	Significance Level
Degrees of Freedom	\(0.2\)	\(0.1\)	\(0.05\)	\(0.025\)	\(0.01\)
\(1\)	\(1.64\)	\(2.71\)	\(3.84\)	\(5.02\)	\(6.64\)
\(2\)	\(3.22\)	\(4.61\)	\(5.99\)	\(7.38\)	\(9.21\)
\(3\)	\(4.64\)	\(6.25\)	\(7.82\)	\(9.35\)	\(11.35\)
\(4\)	\(5.99\)	\(7.78\)	\(9.49\)	\(11.14\)	\(13.28\)

Table 4 - Chi-Square Values

So, back to the candy example. If the significance level is set at \(5\%\), what is the Chi-square value? Well, the value where \(\alpha = 0.05\) and \(4\) meet is \(9.49\).

The question that now arises, is whether the test statistic is greater, or smaller than the Chi-square value. If your test statistic is lower than the Chi-square value, then you can consider the null hypothesis confirmed.

Performing the Test With the P-value

The \(p-\)value is the probability that (if the null hypothesis is true) sampling variation would produce an estimate that is further away from the hypothesis value than found in the current sample. It's a bit wordy, In other words, it's the probability that random sampling could produce a less accurate result than the current one.

Once again, the table is consulted. This time, find where your test statistic lies in the table, and extract the corresponding value from the significance level row. For example, for a test statistic of \(5\) when the degrees of freedom was \(3\), \(0.2< p <0.1\). As long as the \(p-\)value is greater than the significance level, the null hypothesis has not been disproven.

Chi-square Test for Goodness of Fit Example

(1) A biologist hypothesizes that each of the three types of fish occurs in equal numbers in a pond. They take a random sample of \(120\) fish to test the hypothesis, and the results were as follows

Bass	Crappie	Sunfish
\(32\)	\(52\)	\(36\)

Table 5. Fish data table.

Degrees of Freedom	Significance Level
Degrees of Freedom	\(0.2\)	\(0.1\)	\(0.05\)	\(0.025\)	\(0.01\)
\(1\)	\(1.64\)	\(2.71\)	\(3.84\)	\(5.02\)	\(6.64\)
\(2\)	\(3.22\)	\(4.61\)	\(5.99\)	\(7.38\)	\(9.21\)
\(3\)	\(4.64\)	\(6.25\)	\(7.82\)	\(9.35\)	\(11.35\)
\(4\)	\(5.99\)	\(7.78\)	\(9.49\)	\(11.14\)	\(13.28\)

Table 6. Degrees of freedom and significant level.

(a) State the hypotheses being tested.(b) Does the data being tested meet the conditions for a Chi-square test for goodness of fit?(c) Calculate the Chi-square test statistic.(d) Find the Chi-square value of the data, given the significance level is \(5\%\).(e) Does the sample disprove the null hypothesis?Solution:(a) The first step is to define the hypotheses.\(H_0\): Each type of fish occurs in equal numbers in the pond.\(H_a\): Each type of fish does not occur in equal numbers in the pond.(b) The question states that the sample is random, so the first condition is met.The variable is categorical as it is made up of unordered groups therefore the second condition is met.The expected value of each group is \(\frac{120}{3} = 40\), which is over five, therefore the third condition is met.Finally, when a fish is pulled out of the water there is always a \(\frac{1}{3}\) chance of it being any of the types of fish, therefore each outcome is independent, and so the fourth condition is met.Yes, it meets the four conditions(c) \[\begin{align} \chi^2& = \sum_{i=1}^n \frac{(O_i-E_i)^2}{E_i} \\\\ &=\frac{(32-40)^2}{40} +\frac{(52-40)^2}{40} + \frac{(36-40)^2}{40} \\\\ &= 5.6 \end{align}\]

(d) \[\begin{align} df &= n - 1 \\\\ &= 3 - 1 \\\\ &= 2\end{align}\]

With a significance level of \(5\%\), \(\alpha = 0.05\), the Chi-square value from the table is \(5.99\).

(e) As the test statistic is less than the Chi-square value \((5.6 < 5.99)\), the test has shown there is not sufficient evidence to disprove the null hypothesis.

(2) A school does a study about the occurrence of different colored eyes in its pupils. It is hypothesized that \(15\%\) of pupils will have green eyes, \(25\%\) of pupils will have blue eyes, and \(60\%\) of pupils will have brown eyes. Of the \(1000\) pupils, \(80\) are chosen at random. The results of the sample are as follows.

Green	Blue	Brown
\(18\)	\(28\)	\(34\)

Table 7. Colour data.

Table 8. Degrees of freedom and significant level.

Degrees of Freedom	Significance Level
Degrees of Freedom	\(0.2\)	\(0.1\)	\(0.05\)	\(0.025\)	\(0.01\)
\(1\)	\(1.64\)	\(2.71\)	\(3.84\)	\(5.02\)	\(6.64\)
\(2\)	\(3.22\)	\(4.61\)	\(5.99\)	\(7.38\)	\(9.21\)
\(3\)	\(4.64\)	\(6.25\)	\(7.82\)	\(9.35\)	\(11.35\)
\(4\)	\(5.99\)	\(7.78\)	\(9.49\)	\(11.14\)	\(13.28\)

Answer:

(a) \(H_0\): \(15\%\) of pupils will have green eyes, \(25\%\) of pupils will have blue eyes, and \(60\%\) of pupils will have brown eyes.

\(H_a\): It is not the case that \(15\%\) of pupils will have green eyes, \(25\%\) of pupils will have blue eyes, and \(60\%\) of pupils will have brown eyes

(b) The question states that the sample is random, so the first condition is met. The variable is categorical as it is made up of unordered groups therefore the second condition is met. The expected value of each group can be calculated as follows

\[Green = 80 \cdot 0.15 = 12\]

\[Blue = 80 \cdot 0.25 = 20\]

\[Brown = 80 \cdot 0.6 = 48\]

As the expected value of each group is greater than \(5\), the third condition is met.Finally, the color of one student's eyes is not affected by the color of any other student's eyes, therefore the fourth condition is met.

(c) \[\begin{align} \chi^2& = \sum_{i=1}^n \frac{(O_i-E_i)^2}{E_i} \\\\ &=\frac{(18-12)^2}{12} +\frac{(28-20)^2}{20} + \frac{(34-48)^2}{48} \\\\ &= 10.28 \end{align}\]

(d) First, find the degrees of freedom

\[\begin{align} df &= n - 1 \\\\ &= 3-1 \\\\ &=2 \end{align}\]

Now, as the test statistic is \(10.28\), from the table

\[p < 0.01 \]

(e) As the \(p-\)value is smaller than the significance level, sufficient evidence has been provided to disprove the null hypothesis.

\[p < 0.01 < 0.05\]

Chi-Square Test for Goodness of Fit - Key takeaways

The Chi-square test for goodness of fit is a statistical hypothesis test used to determine whether an expected distribution of outcomes is significantly different from the actual observed distribution of outcomes.
The Chi-square test for goodness of fit can only be carried out on data that meets the four conditions.
The Chi-square test for goodness of fit can be carried out either by comparing the Chi-square value and test statistic or by comparing the \(p-\)value of the data and the significance level.

Already have an account? Log in

Frequently Asked Questions about Chi Square Test for Goodness of Fit

What is the purpose of a goodness-of-fit test?

A Chi-square test for goodness of fit can be conducted to confirm or deny a hypothesis about the distribution of a categorical data set.

When to use the chi-square test for goodness of fit?

The Chi-square test for goodness of fit can be used when you wish to test hypotheses about categorical data sets.

What is a chi-square test for goodness of fit?

A chi-square test for goodness of fit can be conducted to confirm or deny a hypothesis about the distribution of a categorical data set.

What conditions are necessary to use the Chi-square test for goodness of fit?

The sample method must be random
The variable under study must be categorical
The expected value of observations for each category must be at least five
Each outcome in the variable under study must be independent.

What are the limitations of applying for the chi-square test?

The Chi-square test for goodness of fit can only be conducted on data that meets the four conditions.

How many categorical variables are needed for the Chi-square test for goodness-of-fit?

One.

Why are Chi-square tests for goodness-of-fit always right-tailed?

The Chi-square test for goodness of fit is right tailed because the numerator of the Chi-square test statistic is always positive

Save Article

How we ensure our content is accurate and trustworthy?

At StudySmarter, we have created a learning platform that serves millions of students. Meet the people who work hard to deliver fact based content as well as making sure it is verified.

Content Creation Process:

Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.

Get to know Lily

Content Quality Monitored by:

Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.

Get to know Gabriel

Discover learning materials with the free StudySmarter app

About StudySmarter

StudySmarter is a globally recognized educational technology company, offering a holistic learning platform designed for students of all ages and educational levels. Our platform provides learning support for a wide range of subjects, including STEM, Social Sciences, and Languages and also helps students to successfully master various tests and exams worldwide, such as GCSE, A Level, SAT, ACT, Abitur, and more. We offer an extensive library of learning materials, including interactive flashcards, comprehensive textbook solutions, and detailed explanations. The cutting-edge technology and tools we provide help students create their own learning materials. StudySmarter’s content is not only expert-verified but also regularly updated to ensure accuracy and relevance.

Learn more