Transcript

Hi and welcome to EasyStats! Today we are going to learn how to calculate variance.

In the previous video we defined variance as a measure of how data is spread out around its mean.

Population variance is the true variance, but it’s extremely tedious to gather data for an entire population. Instead, one can take a sample from a population and infer population variance by finding sample variance.

(0:31/7:44)

To begin we are first going to learn how to calculate sample variance.

Sample variance is denoted by .  I will talk about why I chose  instead of s in a bit.

To calculate sample variance, the first step is to find the difference between each data point and mean. This difference can be represented as  minus  where  represent each data point which in our case is the weight of each individual Smartie. Whereas  represents the sample mean which is simply the average of our entire data, which is the weight of all the Smarties.

(1:24/7:44)

Once we have found the difference between each data point and the mean then we can add the squared results of each data points which is represented by this.

Here n represents the number of data points.

The square is important because it always gives us a positive answer for this difference. If we did not use the square then some of the differences would be positive well others would be negative, and if we simply added them they would cancel each other out and give us incorrect answer, and that is the reason we have to square them.

(2:08/7:44)

Now you might think that the next step is to divide the expression by n and that this will give you your variance, but this is not true. We do not divide this expression by n, instead we divide it by (n-1). So dividing this expression by (n-1) gives us our formula for variance.

The proof behind using (n-1) is out of scope of this video but will be furthered discussed in future videos.  For now we will say that primary reason behind using (n-1) is because had I divided by n, the sample variance would be an under estimate if compared to population variance. 

(2:50/7:44)

Now that we have our formula, we are now ready to calculate the sample variance of the Smarties weight.

Here I have already added these numbers and found out the total weight of Smarties, and the number of Smarties is nine.

The first step is to calculate sample mean, which is represented by . To find my sample mean I just need to divide my total weight by number of Smarties. So if I divide 9.8 by 9 … and this number is going to be same for all the data points. 

(3:29/7:44)

The next step is to find the difference between each individual Smartie weight and the average Smartie weight. We do this by subtracting   from . So for my first Smartie I can subtract 1 by 1.0889 and that gives me... Similarly I can calculate the difference for the remaining Smarties. 

Now I’m just going to square these numbers so that they are all positive.  -0.0889 squared, gives me …. Now I can do this for the remaining Smarties.

(4:19/7:44)

Now if you calculate the sum of these numbers, it will come out to be, summation of   minus  squared, equals 0.04889 grams squared. And it is this number, and this that we need to find our sample variance. So this gives me…and my final answer comes out to be…which is the sample variance of the Smarties weight.

(5:08/7:44)

We are now going to talk about population variance.

This formula is used to calculate population variance. I have differentiated my population variance from my sample variance by using sigma instead of s. Note that I have used population mean mu, which is not equal to my sample mean . Mu is a representation of population mean. For example I would get my population mean if I could measure the weight of all the Smarties in the world and find their average.

Note that I’m using n and not (n-1), and this is the formula for population variance.

(5:56/7:44)

Did you notice that the unit of sample variance and population variance is always squared? This is because the difference between the data point and the mean is squared. In order to represent the unit as originally seen as a data point and not a square, we can find standard deviation.

Sample standard deviation is simply the square root of variance, and this is the reason why I denoted variance by  so I could denote my sample standard deviation by s. My square cancels out my square root. So for our example we can find out the square root of 0.006111 grams squared, and that gives me… In case of sample standard deviation my units are not squared, and my units match with the ones of data. But in case of sample variance my units are squared, and they do not match the units of my sample data. As you might of guessed by now, my population standard deviation just like my sample standard deviation is a square root of my population variance. The square cancels out the square root and it gives us sigma.

(7:28/7:44)

For your homework, measure your room several times and find the sample variance and standard deviation of your results.

Remember if you want to find out the mystery of n-1, tune in to our upcoming videos.

Thanks for watching EasyStats! Bye for now.