The 1997 Education white paper
A critical commentary
This commentary concerns two key elements of the White Paper, the setting of targets in literacy and numeracy to be achieved over 5 years, and the proposals for publication of performance tables.
The literacy and numeracy targets
What the white paper says
- In the 1996 National tests on 6 in 10 of 11-year-olds reached the standard in Maths and English expected for their age (our italics)
- By 2002, 80% of 11-year-olds will be reaching the standards expected for their age in English and 75% will in Maths.
What does the white paper mean?
There are few details in the White Paper about what the 'expected' standards are. The statistical appendix to the White Paper, however, chooses level 4 or above of the National Curriculum at age 11 and level 2 at age 7 as the criteria. At GCSE the criterion of 5 or more A-C passes is chosen.
We must assume therefore that the White Paper regards level 4 in 1996 as a standard that, in a properly functioning system, virtually all pupils should reach. It cannot be using the word 'expected' in its technical sense of a mean, since that would imply that performance was on average above that mean!
If, therefore, level 4 is to be interpreted as having been set as a target, this raises the issue of how that was done and whether, in some sense, such a target can be maintained consistently over time.
How was level 4 specified?
The original TGAT report (DES, 1988, para. 108) stated that "the average expectation for an age 11 pupil will be level 4". It must be assumed that this expectation has informed the design of Key Stage 2 tests and the results quoted above, therefore, simply reflect this aim.
In reality, of course, there is no absolute criterion which determines what pupils will be able to do. At any age, in every educational system, there is a large variation among pupils and to use level 4 or any other level as a 'benchmark' involves a contestable judgement about what is desirable. To quote a figure of 6 out of 10 with disapproval is both unhelpful and strictly meaningless. Nevertheless, it might still be argued that a specific target for improvement could be useful and this is examined in the next section.
Maintaining level 4 over time
The White Paper lays great stress on its proposals for achieving the 80% and 75% targets by 2002: it invites the public to judge its policies in terms of its ability to meet those targets. Hence, there is clearly considerable interest in knowing precisely how it is to ensure that the target is maintained consistently over time, without 'shifting'. The White Paper gives no indication whatsoever how this is to be done and presumably the Government does not view this as problematical: the Literacy Task Force report made a reference to the task being handed over to SCAA (QCA).
Yet all the evidence suggests that it is actually impossible to define such a consistent standard!
The debate in the UK over how to measure standards over time dates back at least to the early 1970s with concern over apparent declines in reading standards (Start and Wells, 1972). It was a major concern of the APU and the debate is summarised in Gipps and Goldstein (1983). In the USA a similar debate surfaced around the comparison of reading performances between 1984 and 1986 in NAEP (Beaton and Zwick, 1990). In the 1990s in particular the debate has tended to centre around trends in GCSE performance from year to year.
In all of these cases the conclusion essentially has been the same, namely that the attempt to measure absolute standards over time is doomed to fail. In effect, it is impossible to distinguish 'real' changes in the performance of pupils from changes in the (different) tests or exams that are used over time. For example, it was shown by the APU research that even apparently minor changes in question format or small changes to content could greatly affect correct response rates: NAEP concluded that the ordering and placing of questions could affect responses. The only possible way to ensure that the same thing is being measured is to use precisely the same test in precisely the same way over time. This, of course, would be unacceptable, so the conclusion must be that there is literally no way in which level 4 (in 1996 or 1997) could be maintained as a standard through to 2002.
There are of course, other useful targets to aim for, some of which are mentioned in the White Paper. Reducing the gap between ethnic groups, social classes and the sexes are legitimate and measurable. There is a strong case for the Government to set up rigorous evaluations of policy initiatives to see what does and doesn't work in terms of improving performance. It would also be worthwhile using highly trained experts to judge extensive random samples of pupil work in English, Maths etc. every year, with a view to making informed judgements about basic levels of literacy or numeracy. Such judgements, however, would not be able to provide precise indicators of 'standards': rather they would act as a mechanism which aimed to detect any large shift which might occur - possibly as a result of policy or external factors, so that further investigation could be undertaken. In other words it would have a crude, but potentially useful, screening function.
We see, therefore, that the White Paper aim is simply unachievable because there exists no way to measure what is happening. The Government needs to recognise this and to drop these targets, rather than trying to do the impossible. If it persists there is a real danger that attention will become narrowly, and entirely misleadingly, focussed on the numbers achieving the levels each year, while ignoring more important issues.
Finally, I should make it clear that the argument here is not a new one and nor is it merely a debating point. Much of the White Paper is predicated on being able to achieve these level targets: this cannot be delivered and the sooner this is recognised the better.
What does the white paper say?
Chapter 3.3 says 'schools with similar intakes of pupils achieve widely differing results'. The Appendix says 'schools with broadly similar intakes (here measured by the proportion of pupils taking up free school meals) have widely differing achievements. (our italics).
Chapter 3.7 says 'The publication of performance data benefits parents and acts as a spur to improve performance. We will publish more such data….supplementing "raw" results with a measure of the progress which pupils have made. Data on prior attainment, which could form the basis of true measures of "value added", are not yet available…but can be introduced progressively from 1998.'
Chapter 3.9 says 'We intend to speed up the publication of information on Primary schools' performance by requiring 11 year olds assessment results to be prepared and published locally, but in a form which continues to make national comparisons possible and which allows additional information to be published by individual authorities.. Getting the information to parents sooner will make it more useful to them when choosing schools.'
What does this mean?
First, it must be pointed out that the White Paper appears to have misunderstood what is meant by 'intake adjustment'. Adjusting for the proportion of children taking up free school meals is a very poor substitute for adjusting for achievements when they start school: there is a wealth of evidence from school effectiveness research which shows this quite clearly. This is important because the White Paper says (Chapter 3.13) that individual school targets are to be determined partly with reference to the performance of 'similar schools'. The only sensible way to do this is by properly adjusting for intake achievements and it would be of considerable concern if judgements were unable to take these into account but instead pretended that factors such as the proportion with free school meals were an adequate substitute. The Government needs to pay close attention to the research findings in this area.
It is absolutely clear that the Government is committed to continuing to publish league tables of test and exam results based on raw data, with the emphasis on local comparisons. It believes that these tables are useful to parents in choosing schools. It admits that proper value added tables are not yet available, and there is no certainty that these will ever be feasible on a widespread basis. The policy of publishing league tables, begun by the previous Government, has been shown to be detrimental in a number of ways and there has been a considerable public debate about this. It is clear that the Government recognises the force of this debate since it talks about the desirability of 'value added' tables. Yet the whole point of 'value added' tables is that they correct the false impressions given by 'raw' tables which take no account of intake! It is logically inconsistent therefore, to maintain that 'raw' tables are misleading and to recommend them to parents for choosing schools!
The White Paper nowhere provides any indication that league tables, of whatever kind, have serious drawbacks. It is reasonable to believe that parents and others have a democratic right to be provided with information about any potentially misleading inferences which could be drawn from published tables. A 'health warning' seems essential and I believe that government has a major responsibility for ensuring that this is done (see Goldstein and Myers, 1996, for a detailed discussion). In addition to what I have already mentioned the other major factor is that any published ranking of schools is subject to large measures of uncertainty. Again, the research is very clear on this - up to two thirds of institutions cannot be separated due to 'sampling variability'. This information considerably diminishes the usefulness of league tables of whatever variety, other than as crude initial screening instruments.
The White Paper discussion of accountability processes is marred by its failure to understand properly the limitations of the performance indicators it proposes. To continue to promote league tables as described in the White Paper is likely further to damage schools and pupils and impede any attempts to raise standards generally.
Harvey Goldstein, 02 August 1997
- Beaton, A. E. and Zwick, R. (1990). Disentangling the NAEP 1985-1986 reading anomaly. Princeton, Educational Testing Service.
- DES (1988). Task Group on Assessment and Testing. Department of Education and Science. London
- Gipps, C. and Goldstein, H. (1983). Monitoring Children. London, Heinemann
- Goldstein, H. and Myers, K. (1996). Freedom of information: towards a code of ethics for performance indicators. Research Intelligence (57): 12-16.
- Start, B. and Wells, K. (1972). The trend of reading standards. Slough, NFER.