# 2 Understanding, describing & exploring data

## Concepts

• In order to answer a research question (ultimately the end goal), data must be organised, this involves both numerical and graphical summaries.
• Summaries have several roles; they help us to understand & communicate features of our data such as patterns and measures of the middle and spread of values, they highlight exceptions to these features and patterns such as extreme values, they may also indirectly suggest appropriate statistical models for answering our research question. Lastly they are sometimes used for designing further studies, for example, statistics such as the standard deviation are used to calculate what sample size might be needed for a study.
• Numerical and graphical summaries should be chosen that capture features of the data and any unusual observations. Important features include the centre and spread of the distribution.

## Connections with other material

• The methods of statistical analysis used to answer research questions or make statistical inference, eg, comparing groups to see if a difference may be real or just due to chance, depend on the type of data and their features. Understanding different variable types and ways of summarising them is essential to choosing the correct statistical method.
• Every analysis begins with an exploration and description of the data using numerical and graphical summaries. It is the first part of the process of comparing groups or assessing associations. It helps avoids choosing the wrong method. It is also part of the ethos of transparency in reporting of science.
• Describing the data collected in a sample is often not the ultimate goal of a research project. Questions such as is the difference observed in these data true or could it just be due to chance, often remain. This part is statistical inference.