« Previous | Next »

4 Ideas of statistical inference

Concepts

Jargon

Although not a concept, there is some important jargon that you need to be familiar with in order to learn statistical inference. Two key terms are point estimates and population parameters. A point estimate is a statistic that is calculated from the sample data and serves as a best guess of an unknown population parameter. For example, we might be interested in the mean sperm concentration in a population of males with infertility. In this example, the population mean is the population parameter and the sample mean is the point estimate, which is our best guess of the population mean. Population parameters are typically unknown because we rarely measure the whole population.

What is statistical inference?

The practice of statistics falls broadly into two categories (1) descriptive or (2) inferential. When we are just describing or exploring the observed sample data, we are doing descriptive statistics (see topic 1). However, we are often also interested in understanding something that is unobserved in the wider population, this could be the average blood pressure in a population of pregnant women for example, or the true effect of a drug on pregnancy rate, or whether a new treatment perform better or worse than the standard treatment. In these situations we have to recognise that almost always we observe only one sample or do one experiment. If we took another sample or did another experiment, then the result would almost certainly vary. This means that there is uncertainty in our result, if we took another sample or did another experiment and based our conclusion solely on the observed sample data, we may even end up drawing a different conclusion!
The purpose of statistical inference is to estimate this sample to sample variation or uncertainty. Understanding how much our results may differ if we did the study again, or how uncertain our findings are, allows us to take this uncertainty into account when drawing conclusions. It allows us to provide a plausible range of values for the true value of something in the population, such as the mean, or size of an effect, and it allows us to make statements about whether our study provides evidence to reject a hypothesis.

Estimating uncertainty:

Almost of all of the statistical methods you will come across are based on something called the sampling distribution. This is a completely abstract concept. It is the theoretical distribution of a sample statistic such as the sample mean over infinite independent random samples. We typically only do one experiment or one study and certainly don't replicate a study so many times that we could empirically observe the sampling distribution. It is thus a theoretical concept. However we can estimate what the sampling distribution looks like for our sample statistic or point estimate of interest based on only one sample or one experiment or one study. The spread of the sampling distribution is captured by its standard deviation, just like the spread of a sample distribution is captured by the standard deviation. Do not get confused between the sample distribution and sampling distribution, one is the distribution of the individual observations that we observe or measure, and the other is the theoretical distribution of the sample statistic (eg, mean) that we don't observe. So that we don't get confused between the standard deviation of the sample distribution and the standard deviation of the sampling distribution, we call the standard deviation of the sampling distribution the standard error. This is useful because the standard deviation of the sampling distribution captures the error due to sampling, it is thus a measure of the precision of the point estimates or put another way, a measure of the uncertainty of our estimate. Since we often want to draw conclusions about something in a population based on only one study, understanding how our sample statistics may vary from sample to sample, as captured by the standard error, is also really useful. The standard error allows us to try to answer questions such as: what is a plausible range of values for the mean in this population given the mean that I have observed in this particular sample? And what is the probability of seeing a difference in means between these two treatment groups as big as I have observed just due to chance? The standard error is thus integral to all statistical inference, it is used for all of the hypothesis tests and confidence intervals that you are likely to ever come across.

Confidence intervals:

Confidence intervals are computed from a random sample and therefore they are also random. The long run behavior of a 95% confidence interval is such that we’d expect 95% of the confidence intervals estimated from repeated independent sampling to contain the true population parameter.The population parameter (eg; population mean) is not random, it is fixed (but unknown), and the point estimate of the parameter (eg; sample mean) is random (but observable). A 95% confidence interval is defined by the mean plus or minus 2 standard errors. If the estimate is likely to be within two standard errors of the parameter, then the parameter is likely to be within two standard errors of the estimate. This is the foundation on which the correct interpretation and understanding of a confidence interval lies.

Therefore it is okay to interpret a 95% confidence interval as "a range of plausible values for our parameter of interest" or "we're 95% confident that the true value lies between these limits". It is not okay to say "there's a 95% probability that the true population value lies between these limits". The true population value is fixed, so it is either in those limits or not in those limits, there is no probability other than 0 (not in CI) or 1 (in CI). That is difficult to get your head around but if you do manage to you will have reached a milestone of understanding statistical ideas.

Hypothesis tests:

A hypothesis test asks the question, could the difference we observed in our study be due to chance?

We can never prove a hypothesis, only falsify it, or fail to find evidence against it.

The statistical hypothesis is called the null hypothesis and is typically stated as no effect or no difference, this is often opposite to the research hypothesis that motivated the study.

You can see a hypothesis test as a way of quantifying the evidence against the null hypothesis. The evidence against the null hypothesis is estimated based on the sample data and expressed using a probability (p-value).

A p-value is the probability of getting a result more extreme than was observed if the null hypothesis is true. All correct interpretations of a p-value concur with this statement.

Therefore, if p=0.04, it is correct to say "the chance (or probability) of getting a result more extreme than the one we observed is 4% if the null hypothesis is true. It is not correct to say "there's a 4% chance that the null hypothesis is true". The hypothesis is fixed and the data (from the sample) are random, so the hypothesis is either true or it isn't true, it has no probability other than 0 (not true) or 1 (true). Like with confidence intervals, understanding this will means you have reached a milestone of understanding of statistical concepts.

Statistical significance is not the same as practical (or clinical) significance.

Connections with other material

All point estimates (statistics calculated from the sample data) are subject to sampling variation, and all methods of statistical inference seek to quantify this uncertainty in some way.
The ideas of a confidence interval and hypothesis form the basis of quantifying uncertainty. Almost all statistics in the published literature (excluding descriptive) will report a p-value and/or a measure of effect or association with a confidence interval.
The probability distribution of a statistic is actually the sampling distribution.
Much of the critical appraisal of the methodology of a study can be seen as a special case of evaluating bias or precision.

« Previous | Next »