Download Spreadsheet:

Proof:

Transcript

The chi square or chi squared distributions describe the variance of samples from Normal populations.

Consider the n-1 version of the variance formula:

S2 n-1 = 1/(n-1) [(x1-x-bar)2 + (x2-x-bar)2 + (x3-x-bar)2 + … + (xn-x-bar)2]

If the population from which the sample elements are drawn is Normal, then each of the data, x1 will be normally distributed as will their mean, x-bar, since it is simply a sum of variables that have Normal distributions.

So, if the sample variance is the sum of a bunch of Normally-distributed quantities squared, what shape will its distribution take?

(0:48/4:16)

To answer that question, let’s look at the shape we get if we square a single normal distribution. For simplicity, we will make its mean zero and its variance one.

If we square the values of points that fall between 1 and 1.5 in this distribution, they will map to points that range from 1 to 1.5 squared, which is 2.25. When squared, corresponding negative points will map to the same final range as their positive counterparts. The members of the distribution that fall between 2 and 2.5 will map to the new range 4 to 6.25, the squares of their original values, as will the matching negative numbers. Thus they will be more spread out and so the likelihood of getting a particular value in the set of squared values will be less. Next, points that started between zero and ½ and zero and -½ will map to the narrow zone between their squared values, namely between zero and ¼.

If we continue this process for all ranges in the original graph, we obtain the graph shown. This figure is the chi squared distribution for k=1 degree of freedom.

(2:03/4:16)

Here is a histogram of elements from a random Normal distribution with a mean of zero and variance of one, along with its theoretical Normal curve. If we now square each of the data values, and plot a histogram of those values, we obtain this histogram, which matches well with the chi squared distribution for k=1.

If you calculate the variance for a sample size n=2, then the variance calculation simplifies as shown in this slide, and as you can see, you really have only one independent Normal distribution squared, namely the one associated with x1 minus x2.

The variance associated with a sample size of n=3 can be simplified to the sum of two squared quantities, each of which has same variance.

(2:49/4:16)

You might not be surprized to learn that the variance associated with a sample size of n gives rise to the sum of n-1 independent squared normal distributions and is characterized by the chi squared distribution with k=n-1 degrees of freedom.

You can see a progression in these shapes as you increase n from 2 to 3 to 5 to 30, when n is large, the curve approaches a normal distribution. This last result should not be surprising since the Central Limit Theorem says that when you add a sufficient number of continuous distributions together you will obtain a Normal distribution. These chi squared curves also match well with histograms from our Sampling Distributions Spreadsheet.

In case you are curious, the general formula for the chi squared family of distributions is the one shown here, and the distribution for k degrees of freedom has a mean of k and variance of 2k.

But don’t worry, most spreadsheets and computer libraries have built-in functions for the chi squared distribution, so you will probably never have to use this expression to program your own function.

(3:58/4:16)

If the mean of the chi square distribution for k=1 is equal to 1, can you explain why the mean for the chi square distribution with k degrees of freedom is equal to k?  Can you do the parallel calculation that relates their variances?