Statistical Inference

Transcript

In this video we introduce the idea of statistical inference, a concept that crime scene investigators, medical doctors and scientists use all the time. Statistical inference allows us to determine the probability that an observed event has a particular cause. For example, it can help us to determine the probability that a fiber found at a murder scene came from a particular suspect, or that a patient’s chest pain indicates a heart attack, or that certain physical measurements demonstrate existence of the long-hypothesized Higgs boson.

(0:39/7:34)

Suppose that a certain population had a mean mu1. The probability of that population giving rise to a sample with a mean of x-bar is denoted using this expression, which reads, “the probability of x-bar given mu.” In this expression, x-bar represents the event that a sample with a mean x-bar occurred, and mu1 represents the event that the sample came from a population with a mean mu1. This is called a conditional probability as it gives the probability of a certain event, namely obtaining a sample with mean x-bar, subject to another condition, namely that the source of that sample had a mean mu1.

If we had another population with mean m2, then the probability of it producing a sample with a mean x-bar is denoted like this. And for a third population, with mean mu3, we write this.

(1:33/7:34)

If we know the probability of each population occurring, we can calculate the total probability of obtaining a sample with a mean x-bar.

To calculate the probability that the sample came from a population with a certain mean mui, we use Bayes’ theorem. If we assume that each population is equally likely, the expression simplifies to this one, and if the sample distribution is Normal, we get this expression, where N is the probability density function for a Normal distribution evaluated at the first argument shown and having the variance indicated by the second argument.

(2:11/7:34)

We can illustrate this process graphically. Consider the population with mean mu1. That a single element taken from the population will be near the mean is likely, and so the probability density function has a relatively high value there. The probability of single elements having values near the mean is somewhat likely, and the probability of values further away are relatively unlikely, as reflected by the PDF.

Assuming that a sample consists of two or more elements, that is n≥2, the sample distribution will be narrower and taller than the population distribution, and the blue words describe the likelihoods of producing samples with particular means. As you can see, it is unlikely that the population shown will produce the particular x-bar we observed. One can quantify the probability by reading the height of the sample distribution over x-bar. Note that we have bent some of the mathematical rules in making this statement, since for continuous distributions, the probability of obtaining a particular exact value is actually zero.

A population with mean mu2, could look like this and its sample distribution like this. This population has a mean closer to that of the observed sample, and it is not surprising that the probability of this sample producing the observed sample is higher than for the previous one. Finally, we consider a population with mean mu3, its associated sampling distribution and probability of it producing the observed sample.

(3:52/7:34)

If we imagine obtaining these three conditional probabilities from their corresponding Normal distributions graphically, we can calculate the probabilities of the observed sample having come from each of the three possible populations. Note that the conditional probabilities need not sum to one. However, the probabilities of the sample having come from different source populations must sum to one, since the event of obtaining a mean of value x-bar is a certainty – it has already happened.

Your intuition may have told you that it was more likely that the sample came from the population with a mean of mu2, than from the other two populations, but by using statistical inference, we can quantify the probabilities associated with each of the three possible sources of the sample. It tells us that there is a nearly an 82% chance that the sample came from the population with a mean of mu 2, but the chances are only about 5% that it came from the population with a mean of mu3.

In this example, we assumed that the sample came from one of three possible populations, that is, it could have come from any one of three different sources.

(5:05/7:34)

Suppose that the police are investigating three suspects, each of whom they say had equal opportunity to commit the murder. Let’s assume that the three sample distributions we discussed earlier describe the thicknesses of the fibers in the coats of the three murder suspects, while x-bar gives the mean thickness of the fibers found on the victim. Who do you think is the most likely perpetrator of the crime? Why?

The statistical inference principles we just discussed can be extended to situations where an observed sample might have come from any one of many potential sources, perhaps even an infinite number of them.

(5:41/7:34)

Suppose that the probability density functions of each of those populations are the same, except that their means are different. Assume further, that the means of the samples from those populations are distributed according to this asymmetric distribution. That is, if you took a whole bunch of samples from that one population, the histogram of the sample means would have this shape.

It is not difficult to find the probability that a sample with particular mean came from a population with any specified mean. Consider a population with the mean shown by the green curve. The probability of that population producing the observed sample is proportional to the height of this green bar, which we can position at the mean of that population. It should go there, as it indicates the probability that that population produced the observed mean. We can do the same for the population shown by the aqua-coloured sample distribution, the light blue one, the dark blue one and finally, the purple one.

What we discover is that the resulting curve, which gives that probability of a sample with the observed mean being produced by a population with a particular mean, is the shape of the sampling distribution, but flipped backwards.

(6:56/7:34)

Suppose that a series of populations each contained elements that were normally distributed with variance sigma^2 and a wide range of means. What would the sampling distributions be for each of these populations if a sample of size n=3 were taken? These curves would correspond to the coloured curves in this image. Finally, what shape would the probability density function have that relates population means to sample mean. That curve corresponds to this black curve in this image.