Misinterpreting Key Stage 1 test scores

Harvey Goldstein and Peter Mortimore, Institute of Education, University of London.


There has been a worrying recent trend whereby inadequately designed or interpreted research is launched without any proper evaluation of its worth. Whether such research is carried out by OFSTED, DfEE or individual academics, it is normal practice for it to be reviewed by peer groups. This may occur, for example, at seminars or via refereeing and allows inconsistencies or inappropriate interpretations to be remedied before the results are made public. We are concerned that policy makers themselves seem to lack a proper understanding of this need. This can lead to policies which will be implemented on the basis of flawed evidence, whether in the area of the teaching of Reading and Mathematics, the publication of league tables, the effects of working mothers or the imposition of homework. A recent report well illustrates the problem.

The annual testing of all English seven year olds at key stage1 (KS1) of the National curriculum provides large quantities of information on pupils, schools and local education authorities(LEAs). Politicians of many persuasions now appear to accept that these data should be used to compare institutions and authorities. The ease with which this can be done raises a real danger that it will be implemented without careful thought about how legitimate it may be to do so.

The report, from the Social Market Foundation (Marks, 1997), analyses the KS1 test score results from 1995. It produces tables based on three criteria: mean test scores in Reading, Spelling and Mathematics;the proportion of children scoring no more than level 1 and the proportion scoring at level 3 or above. In all, just under 600,000children are included in about 15,000 schools (containing at least10 pupils with test scores). The main thrust of the report is the study of variations between schools and between local education authorities and it provides detailed LEA tables. Its main aim is to draw policy conclusions. The purpose of the following critique is to decide whether these conclusions can be supported by the evidence provided.

Variations between schools

The first key issue in any interpretation of KS1 test scores is the nature of the reporting scale. The notional equivalence between a level (1, 2 3 or 4) and age is set out by the DfEE so that for example, an average 7 year old is supposed to achieve level 2and an average 9 year old level 3. Unfortunately there is no objective evidence to validate such a claim since the tests were only given to children aged seven. Thus the report's use of age equivalence when describing differences among schools is unreliable and potentially misleading. The report shows, for example, that the top 25% of schools have 45% or more of their children at level 3 or higher,whereas the bottom 25% have only 22% or fewer of their children at this level. The report assumes that level 3 or higher is associated with achievement two years or more above average. It describes school differences in terms of years of learning, but this is purely speculative and we really do not know how wide the learning gap really is.

The use of age equivalencies to report results leads the report's author into further important errors. First, since results for Reading, Spelling and Mathematics are all reported in this way,it is implied that performance between subjects can be compared. This is of course incorrect, since the levels do not have a common interpretation across subjects. If a proper age standardisation were to be carried out for each subject then,by definition, the distribution of standardised scores among students would be the same for each subject. The variation among schools,however, might not be the same, and this could be of some interest. In the present case, however, we are unable to determine this. The second error occurs when the report compares the KS1 results to the previous year's KS2 (11 year old) results in terms of age differences. The report concludes that the variation among schools is greater at KS1 than KS2, but again this is an invalid conclusion,because there is no proper age standardisation.

Finally, the report compares the results for LEA, voluntary and GM schools showing the latter two categories getting better results,as does a small sample of private schools. In this context the report fails to mention that such comparisons say nothing about the relative performance of each type of school because they fail to adjust for intake differences. Later in the report, however,attention is drawn to the strong association between KS1 achievements and intake achievement, but the report then goes on to insist that the observed differences 'merit further attention from all those involved in education'. If the author accepts that intake achievement may explain some or all differences, as other research has repeatedly demonstrated, then he cannot also claim that the results 'bespeak a major crisis'.

Variations between LEAs

The same general criticisms about the use of age equivalent scales apply to the LEA comparisons. Here, however, each local authority's performance is identified and the reader is clearly invited to pass judgement on how well each one is performing. This again ignores the fact that no such judgement is possible without controlling for intake achievement.

The report's recommendations

The report quotes with approval the OFSTED study of Reading in 45 Inner London Primary schools as providing supporting evidence for its recommendations. The methodology of this study, however,has already been shown to be faulty ( Mortimore and Goldstein, 1996).

The report claims that 'What can and should be done follows directly from the evidence given above, which constitutes a major indictment of what has passed for good primary practice over the last two decades'.

It goes on to make claims about effective teaching methods, using the OFSTED study and some unnamed research findings to support this claim. Unfortunately there is no properly argued case to support this assertion and the KS1 results by themselves cannot provide the evidence. Whatever the rights and wrongs of the arguments,inferences about teaching methods cannot be derived from these results.

Finally the report pleads for the publication of school league tables on the grounds that parents should 'know which schools are achieving reasonable results for their pupils and which are not'. The problem is that, because they fail to adjust for intake achievement, the league tables do not tell parents this! Those who advocate publication of league tables also have a responsibility to emphasise their limitations. If this is not done then parents and others are being denied access to relevant and important inform at ion. In the current educational climate this issue seems to be overlooked,yet it is crucial if citizens are to experience their full democratic rights.

In conclusion

From what we have already stated it is clear that this report has little to offer policymakers or practitioners. It makes a number of misleading claims and purports to derive conclusions about teaching styles and the publication of school test score information from an analysis of KS1 data. The report misunderstands the nature of the data it is dealing with: in particular such data allow no causal conclusions to be drawn about LEA or school performance. To do so requires, as an essential, although not sufficient, prerequisite, that pupils' intake achievements are available and are incorporated in any analyses.

Good research needs careful attention to detail, exposure before potentially critical peers for review and a clear statement of whatever caveats are necessary. The rapid, high profile publication of poorly executed studies is in nobody's long term interests.


Edit this page