Tests and standards: what is the evidence?

Back to Harvey Goldstein's commentaries >>

This government is extremely fond of claiming that their education policies are 'raising standards'. They have not gone out of their way to define just what they mean by 'standards' but since they typically place great weight on increasing key stage test scores over time it is reasonable to assume that this provides their main criterion.

One problem with using test scores is that it excludes potentially negative effects on those aspects of education that are not regularly monitored or measured, for example artistic and humanities curriculum subjects. The government is somewhat silent about such 'side effects', and it appears to have few plans systematically to explore these possibilities. (Note that OFSTED inspections are not really designed to monitor national trends in a consistent way and they anyway still depend heavily on key stage test and exam results).

In addition to this, however, there is also a question mark about the way in which the test results themselves should be interpreted, and I want to take the rest of this article to explore this. The first aspect that I will deal with is that of whether we can really trust increasing test scores to reflect increased achievement, as opposed to the tests themselves being made easier over time. The second aspect is the way in which concentration on the testing may have narrowed what is taught and whether, within a broader definition of mathematics and literacy, we can really conclude anyway that achievement has changed?

Judging trends over time

Like public exams, the problem of making comparisons over time depends upon expert judgements. For Key stage tests scores this involves judgements by the agency developing the tests and QCA as to where level thresholds (e.g. level 4 at Key stage 2) are to be set each year. When there are real substantial underlying changes over time such judgements may be acceptable, but when relatively subtle changes are involved, the essentially subjective judgements of the experts are less obviously definitive and will depend, among other things, on what the expectations are about underlying real changes. It would be helpful if QCA were to provide a range of judgements about changes over time, rather than the final verdict that is inevitably some kind of average from those involved. This would then allow a much more informed judgement about whether any changes were real or not.

The second concern is with the nature of the tests and the tendency that they may encourage 'teaching to the test'. When the results are so important to the 'league table' positions of schools and even to teachers' promotion and salary prospects the natural tendency is to prepare pupils to do well in the tests at the expense of other aspects of education. More specifically, does the test content itself tend to focus teacher and pupil attention on those aspects of mathematics and literacy that occur in the tests? It is difficult to find evidence from England about this since there has been no reliable attempt to monitor it, but recent evidence from the USA does throw some interesting light on this issue.

In the state of Texas, under former Governor George Bush, a very high profile testing programme was instituted in 1990 for grades 3 - 10 in Texas schools. The results are used to rank schools in league tables and certain funds are allocated on the basis of the test results. Over the 1990s very large gains in student test scores were observed, and certain ethnic minority differences were reduced. Dubbed the 'Texas miracle' these results have been used as a justification for such testing programmes involving rewards given to schools for performance on the tests.

Recently researchers from the RAND corporation compared the results of this intensive testing programme with results obtained from a national testing programme, the National Assessment of Educational progress (NAEP) that is carried out over the whole of the USA. What they found was that for mathematics and reading, compared to the rest of the USA the comparative gain in test scores over time of the Texas students on the national test was much less than that implied by the Texas test scores and in some cases no different at all from changes found in the US as a whole. Moreover, the ethnic results from NAEP showed that, if anything, in Texas the differences were increasing rather than decreasing. The researchers conclude that the concentration on preparation for the Texas State tests may be hindering an all round development of mathematics and reading skills, especially for minority students.

So what does all of this tell us about testing in England and about the emphasis on reaching key stage 'targets'? We do need to be careful about generalising from a rather different educational system such as that of Texas. Nevertheless, there are enough similarities to give us reason to doubt whether the current policy in this country is either fair or efficient. At the very least a thorough and independent appraisal of the 'side effects' on schools teachers and pupils, of current policy, needs to be carried out. Ideally the government should drop its obsession with test results and recognise that there is much more to education and learning than striving to boost test scores to meet arbitrary targets. Hopefully, the new Secretary of State for education will have the courage to change direction towards a more rational policy.

(This commentary first appeared in the Newsletter of the Socialist Education Association, November 2001)