# e-lecture: Introduction to statistical modelling

This e-lecture is by Dr Katie Pike. It has been adapted from another MSc course so please ignore Katie’s references to previous lectures. However, the principles are the same and it covers all of the learning objectives as stated below. The beginning of the lecture revisits correlation and includes a quick discussion of Spearman’s rank correlation which is a non-parametric alternative to the correlation we described in Topic 2 (Pearson’s). We have already covered much of this section of the lecture but it can be useful to revisit this, and in particular contrast correlation with regression but you may want to skip this bit if you're comfortable with this material. Likewise, there is also some discussion on measures of effect for proportions such as relative risks and absolute risk reduction which you may be familiar with and wish to skip.

## Learning outcomes

On watching this video students should be able to:

- Explain the distinction between correlation and regression in terms of their purpose (or what each can tell us).
- Interpret a regression coefficient for a linear model as the amount of change in Y for a one unit change in X. We can interpret this as the average or mean difference in Y for a unit increase in X.
- Understand that just because a regression model can describe a relationship between two variables, it does not mean that the relationship is causal – “association does not imply causation”.
- Confirm why a 95% CI for a regression coefficient should always be presented, and interpret it in the same way as other confidence intervals described, namely as a range of plausible values for the true relationship (association) between two variables.
- State the null hypothesis (value) for a regression coefficient, and that the p-value for a regression coefficient estimates the probability of observing a regression coefficient as big as estimated if in the null were true.
- Interpret the R-squared statistic is an indication of the fit of the whole statistical model to the data, it represents the proportion of variation in the outcome explained by the model.
- List the basic assumptions for a regression model and understand that if these are not met then we may not get valid statistical inference.
- Interpret regression coefficients for categorical predictors, and recognise the role of the reference category.
- Recall that we can use a multivariable regression model to examine the relationship between an exposure and an outcome while controlling (in the statistical sense) for potential confounders.
- Interpret regression coefficients from multivariable regression models
- Identify when a logistic regression model can be used and interpret the exponentiated regression coefficients from a logistic regression model as odds ratios, using all the same principles of inference with respect to confidence intervals and p-values for the regression coefficients.
- Identify situations when a survival analysis can be used (when the outcome is a time to an event).
- Explain what censoring is in a survival analysis is, and in particular that all subjects are included in a survival analysis up to the point when they either experienced the event or were censored.
- Interpret a Kaplan Meier survival plot and a log rank hypothesis test for a difference between survival curves. With respect to the log rank test, be able to state the null hypothesis is and appreciate that this test does not describe the relationship or tell us anything about the direction of the relationship.
- Interpret the exponentiated regression coefficients from a Cox proportional hazard model as a hazard ratio which is similar to a risk ratio.