Evaluating the evaluators: a critical commentary on the final evaluation of the english national literacy and numeracy strategies

Back to Harvey Goldstein's commentaries >>

Introduction

The final report of the Ontario Institute for Studies in Education (OISE) evaluation of these strategies has now been published.

There have been two interim reports, in 2000 and 2001, and the present report draws on these, adds from its recent investigations and provides conclusions. Critiques of the two previous reports are available and I shall draw on these here. The report's 8 page executive summary brings together its main conclusions: since this will provide the main source for any media and political comment. I shall centre the following critique on this summary, but will also draw from the main report.

The principal author of the report, Lorna Earl, has been invited to place a response to my critique on this web site and this will be published if the invitation is accepted.

Aims of the evaluation

In the course of their work the team visited schools, talked to central and local government policy makers, carried out surveys of teachers and gave presentations of their work at seminars. A great deal of effort was expended and the final report in one sense is an impressive summary of this work. Simply as a factual history of a government policy, at least from the government's point of view, it is useful and gives insight into certain aspects government thinking, especially its determination to pursue its agenda on 'standards'. It contains much that will be of interest to those in English education and there are also lessons here for other educational systems, although there is no attempt in the final report to draw any such wider inferences.

In their first interim report the authors state that they did not wish to carry out an evaluation 'in the typical sense' but acted instead as 'critical friends'. It isn't entirely clear what such an evaluation really means but since all their reports offer judgements about the success of the strategies, it seems reasonable to judge them by the usual standards of intellectual coherence and adherence to good evidence. As I shall argue, the authors sadly fail on both these counts.

Strikingly, the OISE report fails to mention the pilot studies that preceded the introduction of the strategies. These were set up by the New Labour government to test out the efficacy of different approaches in a sample of schools together with a 'control' sample. Yet, before the pilot for the literacy strategy ended, and before its results could have any effect, the strategy itself was introduced, and a similar thing occurred with the numeracy strategy. Such a cavalier attitude towards evidence based policy implementation would seem to be highly relevant to any evaluation of the strategies themselves. To avoid any discussion of this is an extraordinary omission.

Criteria for success

It is quite difficult to be clear exactly what the report means by success. They point to the pervasiveness of the strategies in Primary school classrooms and the priority given to them by teachers. But they also appear to include the increase of 'whole class teaching' as a success, without qualification. They claim that teaching has improved (Page 3 of summary), but this seems to be based upon their surveys, and one should be very wary of drawing conclusions from these. Firstly, overall 47% of schools in the sample refused to participate; of those that did the response rates for heads and teachers were overall respectively 80% and 56%. Thus the true response rates were respectively about 38% for heads and 26% for teachers. The report doesn't consider the problems of bias from such low response rates, and in fact, in the chapter where the survey results are presented they, misleadingly, only mention the 80% and 56% response rates from participating schools.

The report's authors show considerable uncertainty about whether they can use KS2 test score changes as a measure of the success of the strategies. The summary reflects this, but also says "The gap (over time) has narrowed substantially between pupil results in the most and least successful schools…..if this improvement in low attaining schools continues, it would be a significant measure of success" (Page 3). There are two curious aspects to this statement. First, in the main report (Page 128) they state that "the gap between low achieving and high achieving LEAs…has narrowed". This is not at all the same thing and the gap between schools could be as wide as before or wider. Secondly, the use of the term 'successful schools" is very misleading. The term is widely recognised as applying to institutions where students make the most progress and not to those which simply have high test scores. Elsewhere the report does make reference to 'value added' but this error does seem to betray a careless lack of precision, if not of understanding.

On page 6 the report concludes that 'initial gains in achievement scores' could be ascribed to changes in teaching practice. In the main report (page 45) the authors are very clear that they think that the rise in test scores reflects the 'success' of the strategies. They attempt to confront the parallel increase in science test scores (not part of any strategy) by vaguely appealing to the possibility that there is some kind of spin off in terms of better lesson planning and delivery. They fail to countenance the possibility that the rises in all test scores are largely influenced by the high stakes nature of the tests and the pressure to perform well in league tables. There is certainly evidence from several places about such effects (see for example evidence from Texas: What do test scores in Texas tell us? ) which the authors are surely aware of? There is also hardly a mention of the problems associated with interpreting the year on year changes in test scores which has bedevilled all large scale repeat assessment schemes. With different tests every year there can be no foolproof way of measuring absolute trends.

The summary and main reports are written using language that allows the authors to claim that they are presenting all sides of a case, yet manages to convey an aura of general support for the strategies and their effects. In the main report (Pages 35-36), for example, the authors briefly discusses the problem of 'high stakes assessment and mention controversy, but passes no opinion on this and then go on to quote approvingly from QCA guidelines about the use of test results for comparing schools – without mentioning that these are highly contentious.

Context

The main report has a section (Pages 26-30) on the political context. This has a very brief history of accountability – and even here they imply that the 'naming and shaming' episodes pre-dated the 1997 New Labour government, whereas they were in fact quickly adopted by that government. The rest of the section is little more than a summary of government policy pronouncements about its aspirations for standards. There is no serious discussion of government strategies to weaken LEAs, exercise increased centralised control through financial and other means or to promote a market mechanism with all its implications for learning and teaching. There is also no reference to continued government support to the head of OFSTED, Chris Woodhead, who systematically attacked the teaching profession and whom many consider to be a significant factor in contributing to low teacher morale.

Conclusion

In this critique, and in the two earlier ones, I have argued that this evaluation of the literacy and numeracy strategies is scientifically unsound. What strikes me forcibly, reading the documents, is how the authors appear as subservient to the present government and its policies. Various quotations from Prime Minister Blair border on the sycophantic (for example, Pages 26-27 main report) and it is easy to see the report as part of a very sophisticated public relations exercise in support of government policies – which is presumably not what they intended. It is a pity that this comes from academics who, elsewhere, have produced good research. I have no doubt that the researchers will dispute this conclusion, but it would be very interesting to know just how and why they have produced such a mockery of good practice.

Harvey Goldstein, February 2003