Improving schools - what is the evidence?

Back to Harvey Goldstein's commentaries >>

A major part of the continuing debate about school performance centres on the detection of those institutions that are 'improving' or 'deteriorating' over time. Many feel that such detection, if it can be done, would provide useful feedback and be important when specific interventions occur, such as following an OFSTED inspection. Past attempts to do this have relied upon crude measures of performance based upon exam scores or test results that take no account of the possibility that changes in pupils' intake characteristics may be responsible for what is observed. To rectify this, 'value added' analyses have been suggested which adjust for intake differences, so that fair comparisons can be made.

To investigate this, John Gray, Sally Thomas and I have carried out extensive analyses of A level (and AS level) exam results for all students in 1994-1997 – some 700,000 in 2500 maintained and private institutions, including schools, sixth form colleges and further education colleges. The data were provided by the DfEE as part of an ESRC funded project with the results to be published later this year in a special issue of the British Educational Research Journal. The value added aspect comes from linking student A level results to GCSE results used to adjust for different achievements at the start of A level courses.

The first 'raw' set of analyses, not involving a value added element, looked at the average A level scores for each institution in each year. From year to year the between-school correlation of A level scores was very high, about 0.94, dropping to about 0.88 across a three year gap, between 1994 and 1997. When a value added analysis was carried out, however, the corresponding correlations were about 0.75 and 0.55. This reflects the fact that over time institutions will tend to maintain the achievement level of their intakes, but that once this is allowed for, the progress made by students during the 2 year A level course is less predictable from year to year.

The next set of analyses looked at the overall performance trend for each institution over the four years. For both the 'raw' and value added analyses, only a small minority (less than 10% in the value added case) of institutions had trends that could be separated statistically from the average trend. This echoes the common finding of school effectiveness studies of single cohorts that find large 'confidence intervals' meaning that performance for most institutions is indistinguishable from the average. These findings are underlined in another analysis, where the value added trend calculated from the first three years was used to predict the year 4 value added score, resulting in a correlation of only 0.51.

The implications from all these analyses seem fairly clear, and the existing evidence is that the general conclusions from this study of A level results apply also to other stages of schooling. As is now generally accepted, 'raw' unadjusted comparisons (league tables) are misleading measures of the effectiveness of institutions. Nevertheless, even where value added analyses are available, past performance is not a very precise predictor of future performance. Thus, if a parent or pupil wished to use the most recent value added scores for institutions in order to make a choice, they face a dilemma: the correlation between the most recent scores and the value added scores at the end of the future 2-year A level courses is the 3-year correlation of just 0.55, and this imposes an inherent constraint upon any advice that can be given to parents and others about choosing institutions on the basis of their 'effectiveness'.

Only a minority of institutions can be detected as consistently 'improving' or 'deteriorating' over time. Thus, crude improvement indicator league tables that purport to compare institutions across time will be misleading, as are the existing league tables. Even if based upon value added analyses, the provision of confidence intervals will show just how imprecise such comparisons really are.

In the case of interventions by Government or other bodies, making judgements about changes in performance needs to be based upon value added analyses and also to take account of the considerable lack of precision that we have shown. There are also, of course, important implications for the use of 'improvement' measures in the implementation of performance related pay and the DfEE, (March 2001) cash awards to schools based upon unadjusted trends.

If government is truly concerned about evidence based policy then it should take careful note of these findings, and especially should take the opportunity to tell the public the whole story about the limitations of performance measures.

Harvey Goldstein, Institute of Education