Categorical Variables

How satisfied are you with this app?  Please rate it on the following scale,

Get started

Millions of flashcards designed to help you ace your studies

Sign up for free

Need help?
Meet our AI Assistant

Upload Icon

Create flashcards automatically from your own documents.

   Upload Documents
Upload Dots

FC Phone Screen

Need help with
Categorical Variables?
Ask our AI Assistant

Review generated flashcards

Sign up for free
You have reached the daily AI limit

Start learning or create your own AI flashcards

StudySmarter Editorial Team

Team Categorical Variables Teachers

  • 9 minutes reading time
  • Checked by StudySmarter Editorial Team
Save Article Save Article
Contents
Contents

Jump to a key chapter

    • \(1\) very unsatisfied

    • \(2\) somewhat unsatisfied

    • \(3\) neither satisfied nor unsatisfied

    • \(4\) somewhat satisfied

    • \(5\) very satisfied

    You have just seen categorical variables!

    What are Categorical Variables?

    Remember that univariate data, also known as one-variable data, are observations that are made on the individuals in a population or sample. That data comes in different types, like qualitative, quantitative, categorical, continuous, discrete, and so on. In particular, you will be looking at categorical variables, which are also often called categorical data. Let's first look at the definition.

    A variable is called a categorical variable if the collected data falls into categories. In other words, categorical data is data which can be divided into different groups instead of being measured numerically.

    Categorical variables are qualitative variables because they deal with qualities, not quantities. So, some examples of categorical data would be hair colour, the type of pets someone has, and favourite foods. On the other hand things like height, weight, and the number of cups of coffee that someone drinks per day would be measured numerically, and so are not categorical data.

    To see the various types of data and how they are used you can take a look at One-Variable Data and Data Analysis.

    Categorical vs. Quantitative Data

    Now you know what categorical data is, but how is that different from quantitative data? It helps to look at the definition first.

    Quantitative data is data that is a count of how many things in a data set we have a particular quality.

    Quantitative data usually answers questions like "how many" or "how much". For example quantitative data would be collected if you wanted to know how much people spent on buying a cell phone. Quantitative data is often used to compare multiple sets of data together. For a more complete discussion of quantitative data and what it is used for, take a look at Quantitative Variables.

    Categorical data is qualitative, not quantitative!

    Categorical vs. Continuous Data

    All right, what about continuous data? Can that be categorical? Let's take a look at the definition of continuous data.

    Continuous data is data that is measured on a scale of numbers, where the data could be any number on the scale.

    A good example of continuous data is height. For any of the numbers between \(4 \, ft.\) and \(5 \, ft.\) there could be someone of that height. In general, categorical data is not continuous data.

    Types of Categorical Variables

    There are two main types of categorical variables, nominal and ordinal.

    Ordinal Categorical Variables

    A categorical variable is called ordinal if it has an implied order to it.

    An example of ordinal categorical data would be the survey at the start of this article. It asked you to rate satisfaction on a scale of \(1\) to \(5\), meaning there is an implied order to your rating. Remember that numerical data is data that involves numbers, which the survey example does have. So it is possible for survey data to be both ordinal and numerical.

    Nominal Categorical Variables

    A categorical variable is called nominal if the categories are named, i.e. if the data does not have numbers assigned.

    Suppose a survey asked you what kind of housing you live in, and the options you could pick from were dorm, house, and apartment. Those are examples of named categories, so that is nominal categorical data. In other words, if it has a named category but isn't numerically ordered, then it is a nominal categorical variable.

    Categorical Variables in Statistics

    Before you go on to look at more examples of categorical variables, let's look at some of the advantages and disadvantages of categorical data.

    On the advantage side are:

    • The results are very straightforward because people only get a few options to choose from.

    • Because the options are laid out ahead of time, there are no open-ended questions that need to be analyzed. Categorical data is called concrete because of this property.

    • Categorical data can be much easier to analyze (and less expensive to analyze) than other kinds of data.

    On the disadvantage side are:

    • In general, you need to get quite a few samples to make sure the survey accurately represents the population. This can be expensive to do.

    • Because the categories are laid out at the start of the survey, it isn't very sensitive. For example, if the only two options for hair colour on a survey are brown hair and white hair, people will have trouble deciding which category to put their hair colour in (assuming they have any at all). This can lead to non-responses, and people making unanticipated choices on what their hair color is which skews the data.

    • You can't do quantitative analysis on categorical data! Because it isn't numerical data you can't do arithmetic on it. For example, you can't take a survey satisfaction of \(4\), and add it to a survey satisfaction of \(3\) to get a survey satisfaction of \(7\).

    You can see a summary of the advantages and disadvantages of categorical variables in statistics in the following table:

    Table 1. Advantages and disadvantages of categorical variables
    AdvantagesDisadvantages
    Results are straightforwardLarge samples
    Concrete dataNot very sensitive
    Easier and less expensive to analyseNo quantitative analysis

    Collecting Categorical Data

    How do you collect categorical data? This is often done through interviews (either in person or on the phone) or surveys (either online, in the mail, or in person). In either case, the questions asked are not open-ended. They will always ask people to choose between a specific set of options.

    Categorical Data Analysis

    The collected data then needs to be analysed, so how do you analyze categorical data? Often it is done with proportions or percentages, and it can be in tables or graphs. Two of the most frequent ways to look at categorical data are bar charts and pie charts.

    Suppose you were asked to give a survey to decide whether people liked a particular soft drink and got back the following information:

    • 14 people liked the soft drink; and
    • 50 people did not like it.

    First, we should figure out if this categorical data.

    Solution

    Yes. You can divide up the answers into two categories, in this case "liked it" and "didn't like it". This would be an example of nominal categorical data.

    Now, how could we represent this data? We could do so with a bar or a pie chart.


    Categorical Data in Tables bar chart showing the number of people who like the soda as a smaller bar than the one for people who didn't like it StudySmarter

    Like and Didn't Like Bar Chart

    Categorical Data in Tables pie chart showing the percentage of people who like the soda as a smaller pie wedge than the one for people who didn't like it StudySmarter

    Pie chart showing percentage of people who liked or didn't like the soda

    Either one gives you a visual comparison of the data. For many more examples of how to construct a chart for categorical data, see Bar Graphs.

    Examples of Categorical Variables

    Let's look at some examples of what categorical data can be.

    Suppose you are interesting in seeing a movie, and you ask a bunch of your friends whether they liked it or not in order to decide whether you want to spend money on it. Of your friends, \(15\) liked the movie and \(50\) didn't like it. What is the variable here, and what kind of variable is it?

    Solution

    First of all, this is categorical data. It is divided into two categories, "liked" and "didn't like". There is one variable in the data set, namely your friends' opinions of the movie. In fact, this is an example of nominal categorical data.

    Let's look at another example.

    Going back to the movie example, suppose you asked your friends whether or not they liked a particular movie, and what city they live in. How many variables are there, and what kind are they?

    Solution

    Just like in the previous example, your friends' opinions of the movie is one variable, and it is categorical. Since you also asked what city your friends live in, there is a second variable here, and it is the name of the state they live in. There are only so many states in the US, so there are a finite number of places they could list as their state. So the state is a second nominal categorical variable you have collected data on.

    Let's change what you are asking in your survey a bit.

    Now suppose you have asked your friends about how much they are willing to pay to see the movie, and you give them three price ranges: less than $5; between $5 and $10; and more than $10. What kind of data is this?

    Solution

    This is still categorical data because you have laid out the categories your friends can answer in before you asked them to answer your survey. However this time it is ordinal categorical data since you can order the categories by price (which is a number).

    So how do you compare categorical variables anyway?

    Correlation Between Categorical Variables

    Suppose you asked your friends whether or not they liked a particular movie, and whether they paid less than \($5\), between \($5\) and \($10\), or more than \($10\) to see it. Those are two categorical variables, so how can you compare them? Is there any way to see if how much they paid to see the movie influenced how much they liked it?

    One thing you can do is look at comparative bar charts of the data, or at a two-way table. You can find more information about those in the article Bar Graphs. The other thing you can do is a more official kind of statistical test, called a chi-square test. This topic can be found in the article Inference for Distributions of Categorical Data.

    Categorical Variables - Key takeaways

    • A variable is called a categorical variable if the data collected falls into categories.
    • Categorical variables are qualitative variables because they deal with qualities, not quantities.
    • A categorical variable is called ordinal if it has an implied order to it.
    • A categorical variable is called nominal if the categories are named.
    • Ways to look at categorical variables include tables and bar charts.
    Categorical Variables Categorical Variables
    Learn with 0 Categorical Variables flashcards in the free StudySmarter app

    We have 14,000 flashcards about Dynamic Landscapes.

    Sign up with Email

    Already have an account? Log in

    Frequently Asked Questions about Categorical Variables

    What is a categorical variable? 

    A categorical variable is one where the data collected isn't a measurement.  For example, hair color is a kind of  categorical data, but pounds of produce bought per week is not.

    What are examples of categorical variables? 

    Hair color, educational level, and customer satisfaction on a scale of 1 to 5 are all categorical variables.

    What are nominal and categorical variables? 

    A nominal categorical variable is one that can be put into categories, but the categories aren't intrinsically ordered.  For example whether you live in a house, apartment, or someplace else are categorical, but they don't have an intrinsic number associated with them.

    What's the difference between categorical and quantitative? 

    Quantitative data is data that represents an amount, like height in inches.  Categorical data is data that is collected in categories, for example if a survey asked someone if they were less than 4 feet tall, between 4 and 6 feet tall, or more than 6 feet tall.

    How to measure categorical variables? 

    The most common way to measure categorical data is with percentages that are displayed graphically, as in bar graphs.

    Save Article

    Discover learning materials with the free StudySmarter app

    Sign up for free
    1
    About StudySmarter

    StudySmarter is a globally recognized educational technology company, offering a holistic learning platform designed for students of all ages and educational levels. Our platform provides learning support for a wide range of subjects, including STEM, Social Sciences, and Languages and also helps students to successfully master various tests and exams worldwide, such as GCSE, A Level, SAT, ACT, Abitur, and more. We offer an extensive library of learning materials, including interactive flashcards, comprehensive textbook solutions, and detailed explanations. The cutting-edge technology and tools we provide help students create their own learning materials. StudySmarter’s content is not only expert-verified but also regularly updated to ensure accuracy and relevance.

    Learn more
    StudySmarter Editorial Team

    Team Math Teachers

    • 9 minutes reading time
    • Checked by StudySmarter Editorial Team
    Save Explanation Save Explanation

    Study anywhere. Anytime.Across all devices.

    Sign-up for free

    Sign up to highlight and take notes. It’s 100% free.

    Join over 22 million students in learning with our StudySmarter App

    The first learning app that truly has everything you need to ace your exams in one place

    • Flashcards & Quizzes
    • AI Study Assistant
    • Study Planner
    • Mock-Exams
    • Smart Note-Taking
    Join over 22 million students in learning with our StudySmarter App
    Sign up with Email