# Sampling variation and sampling distributions

## Learning outcomes

On watching this video, students should be able to:

1. Describe the abstract idea of a sampling distribution and how it reflects the sample to sample variability of a sample statistic or point estimate.
2. Identify the standard error as the standard deviation of the sampling distribution and explain how it is a measure of the precision of a point estimate or sampling variability.
3. Distinguish between the uses of the standard deviation and uses of the standard error.
4. Infer that although the sampling distribution is a theoretical construct that we never empirically observe, we can estimate the precision of a point estimate using the standard error which is estimated from a single solitary sample.
5. Confirm that larger samples will contain less sampling variation and thus offer a more precise point estimate, and that larger samples are more likely to be closer to the true population value (assuming there is no systematic bias).

A thought experiment about sampling distributions:

• Imagine you take a random sample of individuals from a target population, measure something and then calculate a sample statistic, the “mean” let’s say. You calculate the mean in the sample because what you really want to know is the mean in the population, and the sample mean is a point estimate of this population parameter.

• Imagine you take another independent random sample and calculate another mean, it is highly likely it would be different to the first mean because it is a different sample - the sample was selected completely independently of the first sample, and individuals were selected by a random process.

• Imagine you keep doing this over and over again, each time calculating a mean and recording its value. The sample means would vary from sample to sample and you could plot their distribution with a histogram. We call this distribution the sampling distribution. We call it sampl-ing because it is the distribution from “sampl-ing” lots of times. This is different to the “sample” distribution which is the distribution of the observed data.

• The spread or standard deviation of this sampling distribution would capture the sample-to-sample variability of your estimate of the population mean. It would thus be a measure of the amount of uncertainty in your estimate of the population mean or “sampling variation” or “sampling error”. You can also see it as a measure of precision of the point estimate, in this case the mean.

• You might imagine that means calculated from bigger samples would vary less from sample to sample, and likewise, that means calculated from samples taken from populations with less variation, would vary less from sample to sample. This would mean more precise point estimates.

• We call the standard deviation of the sampl-ing distribution the “standard error” to distinguish it from the standard deviation of the sample distribution. You might find it helpful to remember this by interpreting the word “error” in standard error as reflecting sampling “error”.

• You've had to imagine all this because we almost always do only one experiment or take only one sample, so we never observe the sampling distribution. A sampling distribution is abstract, it describes variability from sample to sample, not across a sample.

Uses of the sampling distribution:

• Since we often want to draw conclusions about something in a population based on only one sample, understanding how our sample statistics vary from sample to sample, as captured by the standard error, is really useful. It allows us to answer questions such as: what is a plausible range of values for the mean in this population given the mean that I have observed in this particular sample? What is the probability of seeing a difference in means between these two treatment groups as big as I have observed just due to chance? Does my study provide any evidence for changing best practice?