The reliability and other aspects of national curriculum tests

Back to Harvey Goldstein's commentaries >>

HMCI Chris Woodhead has been expressing his views about the reliability and general usefulness of National Curriculum tests (TES December 18, 1998). He has attracted some support from teachers and condemnation from QCA and the NFER, the latter being responsible for providing many of these tests (TES January 8, 1999).

Let's assume that Woodhead uses the term reliability in its technical sense, namely that a test score or level incorporates a certain amount of randomnoise. Then, as Chris Whetton points out (TES, January 8th, 1999),one can only increase reliability by testing children more intensively or possibly by reducing the validity of the test that is the extent to which it reflects what is intended. Yet if a test contains too much noise it may not be very useful for judging the achievements of individual children. Its usefulness will depend on the extent of the reliability which therefore needs to be known and taken into account when the test is used. The response of Nick Tate (head of QCA, TES, January 8th 1999) does not address this issue; his purported discussion of reliability is in fact a discussion of test validity! This illustrates the sad lack of guidance available to schools about how to understand and allow for problems with test reliability. Guidance from OFSTED might be useful here.

The second issue raised by Woodhead is that because the tests change from year to year, one cannot make objective judgements about whetherstandards of achievement have really changed over time; it may be that any changes observed are simply a result of tests becoming more or less difficult. As Whetton points out, this echoes criticisms made by others, yet Whetton fails to answer those criticisms. In fact Woodhead is simply reiterating what many researchers have often pointed out, and the Government and QCA would do well to take this issue seriously in view of the statements by Blunkett and others about targets to be achieved based upon changes in test results. This problem applies both to national Curriculum tests and to public examinations and experience from other countries, notably the USA, confirms the difficulty, if not impossibility, of objectively measuring standards of achievement overtime. A more detailed discussion of this issue is contained in a critique of the 1997 education white paper (PDF, 36kB).

Another of Woodhead's claims is that 'standardised tests' would be more useful than current National Curriculum tests. Here he misunderstands the nature of standardised tests. The 'standardisation' refers to the process of administration and to the use of age and other relevant adjustments when interpreting scores. In these senses,as Whetton points out, National Curriculum tests are effectively standardised tests. The validity of these tests is of course another matter, and that is a legitimate subject for informed debate.

Finally, Woodhead claims that tests are being administered 'creatively 'in schools in order to improve results. While it may be difficult to obtain good evidence to support such a suggestion it is quite plausible to suppose that this does occur. It accords with views often expressed here and in other countries by those working with schools when teachers are exposed to certain kinds of external testing systems. The issue only really arises when schools are confronted with 'high stakes' assessment which is used to judge them in terms of their pupils' achievements. Thus when National Curriculum tests constitute the basis of published league tables which are so important for the fate of schools, there is every encouragement for schools to 'play the system'. To then say that teachers who do this should be 'condemned', as Whetton does, is totally to misunderstand the nature of what is happening.

Many people believe that league tables are educationally misleading and divisive. In this sense the rules which govern the administration and reporting of the tests will lead on to bad decisions, so that there can be a moral legitimacy in refusing to abide by such unjust rules. If the rules of test administration are being broken by 'creative' teachers then blame should be allocated not to the teachers but tot hose responsible for imposing the rules. This will be particularly pertinent for those teachers and schools with relatively disadvantaged children where the rules will tend too per ate rather harshly. Such teachers may perceive their actions as the only ones remaining to them which can be used to protect themselves, and their children, against an unfair system.

Harvey Goldstein, January 15 1999

Note: some of the documents on this page are in PDF format. In order to view a PDF you will need Adobe Acrobat Reader