Data representation meaning
In physics, 'data' is usually used to refer to the information that we observe and collect over the course of an experiment. It is important how we represent data that we obtain so it can be understood easily and quickly, especially when there is a large amount. Also, if data is represented clearly then it is easier to spot patterns and draw conclusions from them.
Data representations illustrate and summarise data, assisting us to understand the meaning of the data and identify features in results.
Types of data representation
There are two types of data - qualitative data and quantitative data - and these are represented in different ways.
Quantitative data is information that can be quantified, which means that it can be counted or measured.
Quantitative data can be assigned numerical values and will often answer questions that begin with 'how much' or 'how many'. For example, the question 'how many people went to the moon each year in the last century?' would return a numerical value for each year. This set of data would be quantitative.
Qualitative data is information that is described in words rather than numbers.
Qualitative data is expressed in terms of language instead of numbers and the information cannot be measured or counted. An example is if you asked a group of people an open-ended question in an interview, such as "How do you feel about the taste of this new product?". This would lead to a qualitative data set.
These two types can be further classified into four categories of data.
Discrete data
Discrete data is a type of quantitative data. It refers to data that can only take certain values and not any of the values in between. You get discrete data by counting something. For example, you could put a group of people into categories based on their age. This data would be discrete because you cannot have anything other than an integer (whole) number of people! Discrete data is often best represented by a bar chart, such as the one shown below.
Fig. 1: A bart chart can be used to represent discret data.
Continuous data
Continuous data is another type of quantitative data. Continuous data can take any value on a certain scale. An example is how the Force due to Earth's Gravity varies with distance from the surface of the Earth. Continuous data is normally represented by a line graph, like the one shown below.
Fig. 2: The Force on a unit mass due to the Earth's Gravity can be plotted against the distance from Earth's surface as a line graph.
Continuous data is also sometimes represented by a scatter graph, which involves many points plotted on a graph. If there is a pattern to the points - we say that the variables are correlated - then a line of best fit can be drawn through them. How to draw a line of best fit is explained further below.
Categoric data
Categoric data is a type of qualitative data and is information that can be placed into categories, as its name suggests. Categoric data is often represented by pie charts, which are a useful way of visualising the proportions of the different data components in a whole data set.
Fig. 3: A pie chart showing the proportion of the world's energy resources currently in use.
Ordered data
Ordered data (also called ordinal data) is a type of qualitative data that is very similar to categoric data. The only difference for ordered data is that the data can be put in a specific order. An example of an ordered data set is the electromagnetic spectrum - the frequencies are split into ranges and each range has a name, such as visible Light or ultraviolet. This is shown below.
Fig. 4: The electromagnetic spectrum is grouped into categories based on wavelength (or frequency).
Data representation examples
Examples of how each category of data can be represented were mentioned above. Some of these are very important in physics, you will use them a lot for representing the data found in experiments. Quantitative data is used much more in physics than qualitative data as we often measure the different quantities of variables in experiments and observe how they depend on each other.
Line graphs
A line graph consists of a straight line or curve that shows a relationship between two different quantities - one must depend on the other. The data plotted on a line graph can take any point in a certain range so it is continuous.
Consider the graph from earlier showing the force due to Earth's gravity at different distances from the surface. The horizontal line is theaxis and the independent variable is plotted on this axis. In the force-distance graph the distance from the Earth's surface is the independent variable, because it does not depend on the force.
The independent variable is the cause of changes of other variables in the experiment. The independent variable does not change if any of the other variables being considered change.
The vertical line is the axis and we normally plot the dependent variable on this axis. In this case, the force due to gravity is the dependent variable as it depends on the distance from the Earth.
The dependent variable is the effect. It depends on the value of the independent variable.
Histograms
Histograms are also used to present quantitative data. Histograms are a type of bar chart, but they are different from regular bar charts because the intervals on a histogram are not fixed. Also, a histogram displays continuous data, which is shown by how there are no gaps between the bars.
Fig. 5: A histogram showing the results of a maths test.
In normal bar charts, theaxis represents the number of data elements in each category, also called the frequency for that category. However, in histograms, theaxis represents the frequency density, which is the frequency divided by the interval length.
\[\text{Frequency density}=\dfrac{frequency}{interval length}\]
A histogram is better for representing data when it is irregularly grouped - when there are many more data points in one category than the others for example. Look at the bar chart showing the number of people in different age ranges in a group of people that was shown above. If one category was 10-100 years and the other was 0-10 years, then the larger interval category would have a much taller bar than the other even if the ages of the people were evenly spread throughout the entire range. On the other hand, if this new situation was plotted for a histogram then the bars would not be so different in height which would show that the ages were spread evenly. This shows how a histogram is more useful to someone analysing the data.
Stages of data representation
Correct data representation is very important when carrying out experiments. It makes your results clearer and it means that your experiment is easy to replicate. There are several stages to producing a good data representation.
Stage 1 - Identify the variables
The majority of experiments that you will carry out will involve measuring how one variable changes due to another. You should first identify what these variables are and how to measure them, also note which one is the independent variable and which is the dependent variable using the definitions above.
Let's take the experiment to test the validity of Hooke's law as an example. An experimental setup is shown below. Hooke's law says that the extension of a spring is proportional to the force applied to it. This means that the extension depends on the force, so the extension is the dependent variable and the force is the independent variable.
Fig. 6: In the experimental setup to test Hooke's law shown, the force due to the weights is the independent variable and the extension is the dependent variable.
Stage 2 - Data collection
When performing your experiment, it is usually best to record your results in a table. Plot a table of the independent variable against the dependent variable. Make sure you leave extra spaces if you are doing repeat Measurements for the dependent variable.
For the Hooke's law experiment, we should hang different weights from the spring and see how the extension changes. The Measurements should be taken multiple times for each weight and recorded in a table along with the value of the weight each time.
Stage 3 - Data analysis
Once you have recorded all of your results in a suitable table, you can use them to plot a graph. As mentioned above, usually the independent variable should be on theaxis and the dependent variable on theaxis. If there is a relationship between the two variables then the graph should have a line of best fit and if there is not the graph should simply be a scatter graph with no correlation.
Fig. 7: An ideal spring would show a perfectly directly proportional relationship between force and extension.
A straight line graph for force against extension can be plotted for Hooke's law experiment. Perfect results would allow you to draw a graph like the one shown. This is very difficult to achieve, but you can come close by using certain techniques during the data representation stages...
Data representation tools
There are many tools and techniques that you can use in your experiments both when collecting and representing data in order to make the data more accurate and clearer.
Mean
When possible, it is always best to repeat your measurements and take the mean of them. The mean is a type of average that you will have certainly come across it before in your studies. It can be found from the equation,
\[\text{mean}=\dfrac{\text{sum of all values}}{\text{total number of values}}\]
This reduces the potential for error in your results. Taking repeat measurements also reduces the risk of a certain measurement being an anomaly, and helps to identify these in the data.
An anomaly is a data point that seems unusual and does not fit the pattern of the other points in the data set.
Lines of best fit
When you plot a graph for two variables that have a linear relationship, a line of best fit can be drawn through the data points. A line of best fit is drawn below for some data points you might obtain by carrying out the experiment on Hooke's law.
Fig. 8: A line of best fit should go through as many points as possible and have roughly as many points above and below it.
A good method for drawing a line of best fit is to make sure that it passes through as many data points as possible and that there are approximately as many points above the line as there are below. If there are anomalous points, you should not consider them when drawing the line. Lines of best fit are useful because they allow you to find the gradient, which can often be used to find the value of a constant under investigation. In the Hooke's law experiment, the gradient of the force vs extension line graph tells us the spring constant of the test spring.
Data Representation - Key takeaways
- When we talk about 'data' in physics, we mean the information that is observed and collected over the course of an experiment.
- Data representations illustrate and summarise data, assisting us to understand the meaning of the data.
- The two types of data are qualitative data and quantitative data.
- Quantitative data is information that can be quantified, which means that it can be counted or measured.
- Qualitative data is information that is described in words rather than numbers.
- Quantitative data can be split up into discrete and continuous data.
- Qualitative data can be split up into categoric and ordered data.
- A line graph is a straight line or curve that shows the relationship between two variables plotted on perpendicular axes.
- A histogram looks similar to a bar chart but it can have different length intervals on theaxis, and also it represents continuous data.
- The main stages of data representation are identifying the variables, collecting the data, and analysing the data.
- Examples of tools that can be used to improve data representation include taking the mean of several measurements and drawing a line of best fit on scatter graphs.