Jump to a key chapter
This is where statisticians use of estimator bias comes in. Since your estimate is based on an average idea of how things have gone in the past, you can use an estimator for the average, and from there figure out how biased or unbiased it is.
Comparing estimators and finding the variance or standard error of an estimator are explained in the article Quality of Estimators.
Definition of the Bias of an estimator
Say, for example, you wanted to find the mean length of fish in an aquarium. Not only are there a huge number of fish you'd need to measure, but it's also very difficult to catch and measure all the fish.
Instead of measuring every single fish in the population (which is referred to as a census), a better approach would be to take a sample of fish, and from that sample find an estimate for the mean length of the fish. This is referred to as an estimator.
First, however, you need to know what a statistic is.
The statistic, \(T\), is comprised of \(n\) samples of random variable \(X\) (i.e. \(X_1,X_2,X_3,...,X_n\)). These observations are independent and are each identically distributed.
Often these are called test statistics to differentiate them from the word "statistics". Mathematically, this means that the statistic used to estimate a parameter, \(T\), will be comprised of \(n\) independent, random samples taken from a random variable, \(X\).
An estimator is a statistic used to estimate a population parameter. An estimate is the value of the estimator when taken from a sample.
You might also see an estimator called a point estimate. It is important to be able to recognise what estimators are. Have a look at the following example.
Explain why the following functions are or are not estimators where \(X_1, X_2,...,X_n\) are taken from a population with parameters \(\mu\) and \(\sigma\).
i) \(\dfrac{X_3+X_6}{2}\)
ii) \(\dfrac{\sum(X_i-\mu)^2}{n}\)
Solution:
i) The function
\[\dfrac{X_3+X_6}{2}\]
is an estimator since it is comprised of independent, identically distributed samples.
ii) On the other hand,
\[\dfrac{\sum(X_i-\mu)^2}{n}\]
is not an estimator since it contains \(\mu\) which is not a sample. In fact, this potential estimator is not even a statistic. The variable \(\mu\) is the population parameter! You can't use a formula involving the population parameter to estimate the population parameter.
Let's take look at a quick overview.
Overview of estimator bias
Not all statistics are reliable estimators. To determine the validity of a statistic's ability to estimate a parameter, you will need to find the expected value of the statistic.
If the expectation of the statistic is different to the parameter that you want to estimate, then this tells you that the statistic is biased.
You can think of bias as a measure of how skewed your sampling distribution is, or how far from the population parameter your estimator is as well. The more skewed the sampling distribution, the higher the bias.
For more information on skew, see the article Skewness.
Bias of an estimator explanation
You can write the definition of an estimate being biased or unbiased using simple mathematical notation.
If \(\hat{\theta}\) is a statistic used to estimate population parameter \(\theta\), \(\hat{\theta}\) is unbiased when
\[\text {E}(\hat{\theta})=\theta\]
where \(\text{E}\) is the notation for expected value. Any statistic which is not unbiased is called biased.
If \(\hat{\theta}\) is biased, the bias can be found using the following formula:
\[\text{Bias}(\hat{\theta})=\text{E}(\hat{\theta})-\theta.\]
How large the bias of \(\hat{\theta}\) is can be found using the following formula:
\[\text{Bias}(\hat{\theta})=\text{E}(\hat{\theta})-\theta.\]
Notice that if \(\text{E}(\hat{\theta})=\theta \) then \(\text{Bias}=0\).
Let's put the definition to use.
Show that \(\text{E}(\bar{X})=\mu\) where
\[\bar{X}=\frac{(X_1+X_2+\dots+X_n)}{n} \]
is an unbiased estimator.
Solution:
Keeping in mind that \(\text {E}(aX)=\text {E}(X)\), you have
\[\begin{align}\text {E}(\bar{X})&=\frac{1}{n}\text{E}(X_1+\dots +X_n)\\&=\frac{1}{n}(\text {E}(X_1)+\dots +\text {E}(X_n))\end{align}\]
Since \(\text {E}(X_i)=\mu\) for all \(i\), you have
\[ \begin{align} \text {E} (\bar{X}) &= \frac{\mu +\mu +\dots + \mu}{n} \\ &= \frac{n \mu}{\mu}\\ &=\mu .\end{align}\]
This shows that \(\text {E}(\bar{X})=\mu\), which means \(\bar{X}\) is an unbiased estimator of parameter \(\mu\). This means that on average, this statistic will give the correct value for the estimated parameter.
For a reminder on why \(\text {E}(aX)=\text {E}(X)\), see the article Sum of Independent Random Variables.
The fact that the previous example gives you an unbiased estimator is why you will see it used to construct confidence intervals.
Estimator Bias example
Not all estimators are unbiased!
You are given
\[T=\frac{X_1+2X_2}{n}\]
as a candidate for an estimator of the parameter for the mean of a distribution, \(t\), where \(n\) is the total number of samples taken. Find the bias of this statistic.
Solution:
In this problem, the population parameter is the mean, \(t\). So to find the bias, you can use the formula
\[\text{Bias}(T)=\text {E}(T)-t,\]
giving you
\[ \begin{align} \text{Bias} (T) &= \text {E} \left(\frac{X_1+2X_2}{n}\right) -t \\&= \frac{\text {E} (X_1)+2\text {E} (X_2)}{n} -t \\&= \frac{3t}{n}-t\\&= \frac{t(3-n)}{n} .\end{align}\]
Therefore the bias of estimator \(T\) is
\[\text{Bias}(T) = \dfrac{t(3-n)}{n}.\]
Bias of estimator formula
While the sample mean is one way to get an unbiased estimator, it is not the only way. Let's look at applying the formula for the estimator of bias to variance instead.
To find an estimator for the population variance, you may try to use the variance of the sample which would be denoted as
\[V=\frac{\sum\limits_{i=1}^n(X_i-\bar{X})^2}{n}.\]
However, since this formula uses the sample mean, \(\bar{X}\), rather than \(\mu\), the population mean, the variance of a sample will be biased towards the sample mean rather than the population mean.
Instead, you can use a different statistic: the sample variance. This will give you an unbiased estimator for the population variance, \(\sigma^2\).
An unbiased estimator for the population variance, \(\sigma ^2\), is the sample variance, \(S^2\):
\[S^2=\frac{\sum\limits^n_{i=1} (X_i-\bar{X})^2}{n-1}.\]
This formula isn't always the easiest to use when calculating the sample mean. The are other ways to find \(s^2\).
These are the ways that you can calculate the sample variance:
\[\begin{align} s^2 &= \frac{\sum\limits^n_{i=1} (X_i-\bar{X})^2}{n-1} \\&= \frac{\sum\limits_{i=1}^n x^2-n\bar{x}^2}{n-1} \\&=\frac{S_{xx}}{n-1} .\end{align} \]
In general, \(S^2\) is used to denote the estimator for the population variance, and \(s^2\) is used to denote a particular estimate. It's worth learning the above two equivalent formulas as they are significantly easier to apply than the first one.
Let's take a look at the proof that \(s^2\) is an unbiased estimate for \( \sigma ^2\). In other words, the goal is to show that \(\text {E}(s^2)=\sigma ^2\).
To do this, you need to write the expectation of the sample variance
\[\text{E}(S^2) = \frac{\sum\limits_{i=1}^n x^2-n\bar{x}^2}{n-1} \]
in terms of \(\sigma\) and \(\mu\). Notice that you have already used one of the alternate ways of calculating the sample variance.
First, using the definition of \(\sigma ^2\), you have
\[\begin{align} \sigma ^2 &=\text{Var}(X) \\ &=\text {E}(X^2)-\mu ^2, \end{align} \]
therefore \(\text{E}(X^2)=\sigma ^2 +\mu ^2.\)
You also know that \(\text{Var}(\bar{X})=\dfrac{\sigma ^2}{n}\) and \(\text{E}(\bar{X})=\mu\), so you can write \(\text{Var}(\bar{X})\) as
\[\begin{align} \text{Var}(\bar{X}) &= \frac{\sigma ^2}{n} \\ &=\text {E}(\bar{X} ^2)-\mu ^2, \end{align}\]
so
\[\text {E}(\bar{X}^2)=\frac{\sigma ^2}{n}+\mu ^2.\]
The expectation of the sample variance is given by:
\[\begin{align} \text {E}(S^2) &= \frac{ \text {E}\left(\sum\limits_{i=1}^n X^2-n\bar{X}^2\right)}{n-1} \\&= \frac{ \text {E}\left(\sum\limits_{i=1}^n X^2\right)-\text {E}(n\bar{X}^2)}{n-1} .\end{align} \]
Since
\[\begin{align} \text {E}\left(\sum\limits_{i=1}^n X^2\right)&=\sum\limits_{i=1}^n \text {E}(X^2)\\ &=n\text {E}(X^2), \end{align}\]
you have
\[\begin{align} \text {E}(S^2) &= \frac{ n\text {E}(X^2)-\text {E}(n\bar{X}^2)}{n-1} \\ &= \frac{n(\sigma ^2 +\mu ^2)-n\left(\dfrac{\sigma ^2}{n} +\mu ^2\right)}{n-1}\\ &=\frac{n\sigma^2 +n\mu ^2 -\sigma ^2 -n\mu ^2 }{n-1} \\&= \frac{(n-1)\sigma ^2}{n-1} \\ &=\sigma^2 . \end{align} \]
Since \(\text {E}(s^2)=\sigma ^2\), you have shown that \(s^2\) is an unbiased estimate for the population variance, \(\sigma ^2\).
While you may not need to memorise the proof, it is always good to read and understand the steps to ensure you have a good understanding of the topic.
Estimator Bias - Key takeaways
- An estimator is a statistic used to estimate a population parameter. An estimate is the value of the estimator when taken from a sample.
- The statistic, \(T\), is comprised of \(n\) samples of random variable \(X\) (i.e. \(X_1,X_2,X_3,\dots ,X_n\)). These observations are independent are each identically distributed.
- If \(\hat{\theta}\) is a statistic used to estimate population parameter \(\theta\), \(\hat{\theta}\) is unbiased when \(\text {E}(\hat{\theta})=\theta\).
- If \(\hat{\theta}\) is biased, the bias can be quantified using the following formula:\[\text{Bias}(\hat{\theta})=\text {E}(\hat{\theta})-\theta.\]
Learn faster with the 4 flashcards about Estimator Bias
Sign up for free to gain access to all our flashcards.
Frequently Asked Questions about Estimator Bias
What are examples of biased estimators?
Biased estimators are where the expectation of the statistic is different to the parameter that you want to estimate.
How do you find the bias of an estimator?
The bias of an estimator is the difference between the expectation of the estimator and the parameter it is supposed to estimate.
What does it mean if an estimator is biased?
If an estimator is biased, then it will have an expected value that is different than the parameter it is supposed to estimate.
What is bias and variance of an estimator?
The bias of an estimator is whether it is, on average, different from the parameter it is supposed to estimate. The variance of an estimator is how consistent the estimator is.
Are estimators always unbiased?
No, estimators are not always unbiased. It is preferential to use an unbiased estimator, but it may be the best option.
About StudySmarter
StudySmarter is a globally recognized educational technology company, offering a holistic learning platform designed for students of all ages and educational levels. Our platform provides learning support for a wide range of subjects, including STEM, Social Sciences, and Languages and also helps students to successfully master various tests and exams worldwide, such as GCSE, A Level, SAT, ACT, Abitur, and more. We offer an extensive library of learning materials, including interactive flashcards, comprehensive textbook solutions, and detailed explanations. The cutting-edge technology and tools we provide help students create their own learning materials. StudySmarter’s content is not only expert-verified but also regularly updated to ensure accuracy and relevance.
Learn more