Introduction To Statistics

In this section we will cover statistics, loosely defined as ways of using numbers, tables and figures to represent the data you collect. There are two main types of statistic: descriptive and inferential. What are descriptive and inferential statistics? Well, descriptive statistics only describe what you have found. This includes means, standard deviations, and graphical representations of your data. Inferential statistics allow us to make statements about the populations. Imagine a study we set up to look at reading skill and educational placement? Well, we cannot select all of the possible children. We can only select a subset of all the deaf children in mainstream and sign bilingual schools. This subset is called our sample; all of the deaf children in mainstream and sign bilingual schools are called our population. Descriptive statistics describe our sample; inferential statistics allow us to make statements about the population from which that sample was selected.

Samples, Populations and Generalisability

Whatever study you choose to conduct, it will probably have a target population. The target population is the group of people who could be involved in your study. For example, if you wanted to do some research on British Sign Language learning by hearing people, then your target population would be all hearing people who are currently learning British Sign Language. That’s a lot of people! So, in practice, you will probably select a smaller group of people for you study – say 30-50 people in the Bristol area. These 30-50 people are your sample. Maybe you test the number of learning strategies they use, and find that on average they use 3.4 strategies. This is an average, or in research terms usually called a mean. This is an example of a descriptive statistic – it describes the average score for your sample. Other types of descriptive statistic include standard deviation and sample size. Standard deviation is a statistic that measures how much the scores of the sample vary. If everyone in the sample gets the same score, then the standard deviation will be 0 (zero) – the more the different scores vary, the higher the standard deviation. Sample size indicates how many people there are in the sample – in this example it may be 35 (indicating 35 different hearing BSL learners).

Descriptive statistics are reported in the Results section. You will not normally report all of your raw data in the Results section, including it in an Appendix instead. Often the descriptive statistics are incorporated in a Table, or presented as a Figure (such as a graph or diagram). More on this below.

Normally, you do not want to confine your findings to this sample – you want to be able to generalise your findings to all hearing BSL learners (i.e. the target population). In order to be able to do this, you need two things:

A representative sample. If your sample looks nothing like the target population, then it is hard to generalise your findings. For example, if all of your participants are women aged over 70 years studying in Bristol, do they truly represent BSL learners around the UK? No. So, you may carefully construct your sample so that it is a smaller version of the population. Using information from the Census you may calculate that you need 52% women and 48% men, certain quotas of people in different age categories, and so on. The extent to which you do this will depend upon your research question and hypothesis, and you should consult carefully with your dissertation supervisor before recruiting participants to your study.

Inferential statistics. Your descriptive statistics will describe your sample, but not allow you to make conclusions about the population. For example, let us assume you found that male hearing BSL learners use an average of 2.6 strategies, compared with women who use 4.3 strategies on average. In your sample, women clearly use more learning strategies than do men. But is this true for the target population? That is, can you generalise your finding to all hearing BSL learners in the UK? To do this, you use inferential statistics. Put succinctly, inferential statistics allow you to make a probability statement – if you say that women use more strategies than men (in the target population), then what is the probability you made a mistake in saying this?

Inferential statistics are beyond the scope of this short introductory course, and again you will need to discuss with your supervisor the appropriateness of using them and how to go about doing so. We will make one important point here, though. You want to minimise the chance of making a mistake when you say two groups are different. The more participants you have, the less likely you are to make a mistake. A large number of participants is good. This is not related to the size of the target population, for statistical reasons you do not need to worry about. Focus upon your sample, and try to make it as large as possible.

Descriptive Statistics

More now on descriptive statistics, and how to report what you find in your study. As we saw in the previous section, descriptive statistics do just what they say – describe the data you have collected. In your results section, you will state the kind of data you collected and then present it in a form which the reader can understand. Merely presenting a large table with all your raw data will not help the reader. You must help them make sense of what you found. Normally this is done with a Table or a Figure.

There are 3 different types of descriptive statistic that you should report:

Measures of central tendency. These are ways of measuring the average score for your sample, or groups within your sample. There are three different ways of calculating an average, the one you choose depending upon your level of measurement (see earlier section):

Mean. This is the most common way of measuring an average. You add up all of the scores, and divide by the number of scores you added together. So, if your sample scored 3, 4, 2, 4, 5, 2, 3, 4, 5 and 2, the mean would be 3.4. This is all of the scores added together, and then divided by the number of scores (in this case, 10). Means are best used where the data is measured on a ratio or interval scale.
Median. The median is the middle score. How is this calculated? You start by arranging the scores in ascending order. Using the example scores from above, this would give us: 2, 2, 2, 3, 3, 4, 4, 4, 5, 5. Next you locate the middle score – that is the median value. In this example, the middle falls between two numbers: 2 2 2 3 3 * 4 4 4 5 5. In this case, we take the average of the two numbers either side of the middle: (3 + 4) / 2 = 3.5. The median is best used when the data are measured using an ordinal level of measurement. This is because it often does not make sense to talk about half of something. In a horse race, it would not make much sense to say that on average Irish horses finished in 2.2 or 2.5 place. It is better to say the median finishing place was 2^nd, or 2^nd-3^rd.
Mode. The final measure of central tendency is the mode. This is the most frequently occurring score. Using the example from above, the modal score would be 2 and 4! This is because the scores are bimodal – there are two peaks, with both 2 and 4 occurring three times each. The mode is best used when the data are nominal – that is when you are measuring the number of times you observe something, or when you are measuring how many participants belong in different categories.

Measures of dispersion. These indicate variation in scores. Not everybody in your sample will obtain the same score. As a result, the average score can be misleading. The only measure of dispersion you need to concern yourself with is the standard deviation, which you should use whenever you report a mean score. Each mean will have a standard deviation associated with it. So, how do you calculate the standard deviation?

Make a table with 4 columns, as shown below.

In the first column, enter the scores for each participant.

Score	Mean	Score-Mean	(Score-Mean)²
6
7
4
5
3
5
6
8
2
4

Next, calculate the mean score and enter that in column the second column.

Score	Mean	Score-Mean	(Score-Mean)²
6	5
7	5
4	5
5	5
3	5
5	5
6	5
8	5
2	5
4	5

Subtract the mean from each score, and enter the differences in the third column.

Score	Mean	Score-Mean	(Score-Mean)²
6	5	1
7	5	2
4	5	-1
5	5	0
3	5	-2
5	5	0
6	5	1
8	5	3
2	5	-3
4	5	-1

Now square each of these differences (multiply them by themselves) and enter the new value in the fourth column.

Score	Mean	Score-Mean	(Score-Mean)²
6	5	1	1
7	5	2	4
4	5	-1	1
5	5	0	0
3	5	-2	4
5	5	0	0
6	5	1	1
8	5	3	9
2	5	-3	9
4	5	-1	1

Add all of the numbers in the fourth column together.

1 + 4 + 1 + 0 + 4 + 0 + 1 + 9 + 9 + 1 = 30

Finally, take the square root of this number.

5.48 = standard deviation

The larger the standard deviation, the more variation there is in the group of scores. It is important to know this, as it gives you extra information about whether two groups of scores are really different. Consider the data given in the Table below:

	Group A	Group B
Mean	4.3	6.4
Standard deviation	3.5	4.2

Looking at the mean alone, you may conclude that group A scored higher than group B. While this is true on average, by looking at the standard deviations we can see that the scores within each group varied a lot. This means that many members of group A will have scored more than group B, and vice versa. If we performed an inferential statistical test on this data, we would probably find that there was no real difference between the two groups as the scores varied widely for each of them.

Sample statistics. The only sample statistic you will need to report is the sample size. For each group of participants for whom you report a mean and standard deviation, you should also indicate how many participants were in that group. This tells the reader whether your calculations are based upon a lot of people or just one person!

There are commonly accepted abbreviations for the mean, standard deviation and sample size. The ones that you should use are M (mean), SD (standard deviation) and N (sample size).

Internet Resources

Generalisability of Research Data http://trochim.human.cornell.edu/tutorial/ward/tutorial.htm

Central Tendency & Dispersion http://www.psychstat.smsu.edu/introbook/sbk13.htm

Statistics Every Writer Should Know http://nilesonline.com/stats/