Developments in GCSE Caroline Gipps, University of London Institute of Education, 20 Bedford Way London WC1H 0AL Gordon Stobart, National Council for Vocational Qualifications 222 Euston Road, London NW1 2BZ
The General Certificate of Secondary Education (GCSE) is the public examination taken by pupils at 16+. It is a relatively new examination with the first papers taken in 1988. The changes which were brought in with GCSE were: it involves use of coursework assessment rather than 100 per cent examination, thus oral, practical and extended project work play an important part in the assessment; it is aimed at the whole ability range; differentiated exam papers (pitched at different levels) are therefore required for some subjects; it was intended eventually to be criterion-referenced so that candidates could be graded in relation to their own performance rather than in relation to how others performed.
The GCSE has, it is generally acknowledged, brought about changes in teaching style and content resulting in a broadening of students' curricular and pedagogic experience. A higher proportion of the age group takes it than was the case with the previous 16+ exams (over 85 percent of the age group enters at least one subject). Coursework assessment has had a powerful effect in many schools; 100 per cent coursework-assessed syllabuses have proved popular in English and are available in a number of other subjects. The move towards criterion-referencing has been problematic and pupils are graded on the basis of rather loose grade descriptions.
One of the reasons for the interest of the DES and Secondary Examinations Council (SEC) in the development of criterion-referencing within GCSE was concern over comparability, or rather the lack of it, in GCSE grades from different examination boards. With a single, consistent, system of clearly-defined grades, the idea was that all the boards would apply the same standards in awarding grades (Orr and Nuttall, 1983).
Another reason for the introduction of GCSE was an attempt to boost standards: Sir Keith Joseph's aim was to get 80%-90% of 16 year olds up to the level previously deemed to be average. On norm-referenced tests there is no point in trying to get every pupil to achieve an average or above-average score since, by definition, these tests are designed to have half the population scoring above and half below the mean. With criterion-referenced assessment in theory everyone can achieve the top grade.
However, it is important not to understate the difficulties in designing criterion-referenced assessment, particularly in relation to advanced levels of assessment in complex subject areas. The main problem is that, as the requirements become more abstract and demanding, so the task of defining the performance clearly becomes more complex and unreliable. Thus while criterion-referencing may be ideal for simply defined competencies ('can swim 50 metres'), it is less so as the task becomes more complex: either the assessment must become more complex (for example, the driving test requires intensive one-to-one assessment) or the criteria must become more general. If the criteria are more general they are less reliable, since differences in interpretation are bound to occur.
Specifying the criteria has proved to be a particular problem in GCSE. Sir Keith Joseph's announcement of the new GCSE in 1984 included a reference to grade-related criteria - the criteria which students would have to meet in order to gain a particular grade:
'examination grades should have a clearer meaning and pupils and teachers need clearer goals. We accordingly need grade-related criteria which will specify the knowledge, understanding and skills expected for the award of particular grades.' (DES, 1987b)
There were already grade descriptions in the GCSE subject criteria which gave a broad idea of the performance likely to have been shown by a candidate awarded a particular grade, but what was wanted were more specific descriptions of performance. Working parties were set up in each of the main subjects to develop grade criteria. These working parties first identified 'domains': coherent and defined areas of knowledge, understanding and skills within each subject. The groups then broke the domains down into abilities and produced definitions of performance, or criteria, required for achievement at different levels. These draft grade criteria were then put out for consultation, in 1985. Two key issues emerged from this.
First, the complexity of the draft grade criteria militated against their usefulness, 'particularly to employers' (DES, 1987b, op cit). What had happened was that the working parties had produced numerous and often complex criteria which made their assessment unmanageable. In history, for example, there were ten sub- elements across three domains and criteria were given for four levels of performance within each sub-element, adding up to 40 statements of performance to be used not only by those doing the assessment but also by those interpreting candidates' performance. In English the problem lay in the rather broad criteria formulated, which made reliable differentiation between performance at different grades very difficult. An example taken from the domain of writing will make this clear: to get a grade A/B a candidate should 'Give a coherent and perceptive account of both actual and imagined experience', while to get a grade F/G he or she should 'Give a coherent account of personal experience'. At the other extreme, the maths group produced eighty detailed criteria for one domain at a single grade level (Brown, 1988).
Second, care was needed to make sure that teaching and assessment strategies based on the draft grade criteria would not lead to the breaking down of subjects into isolated tasks. This, of course, is bound to be a danger where there is a highly specified curriculum and/or assessment. The SEC had been aware of this problem for some time (Murphy, 1986) and in the briefing paper to the draft grade criteria subject working parties said that:
'The rigorous specification of full criterion- referencing for assessment in the GCSE would result in very tightly defined syllabuses and patterns of assessment which would not allow the flexibility of approach that characterises education in this country.' (SEC, 1984, p. 2)
In order to refine and develop the draft grade criteria the SEC funded a re-marking exercise. This involved the re-marking of the 1986 joint O-level/CSE exam scripts according to the draft criteria. This exercise threw up a host of problems. First, there was a poor match between the domains and levels produced by the working parties and the content of the exam papers studied; this was not particularly surprising since these exam papers had not been designed to cover the domains and levels. More importantly, however, there were ambiguities in the criteria: the hierarchies of performance given by the draft grade criteria bore little relationship to the actual responses of candidates to specific questions; and there was a lack of equivalence between the same levels of performance on different questions and across different domains. It was concluded that the draft grade criteria were largely unworkable (Kingdon and Stobart, 1987).
At this point, the draft grade criteria were dropped. As the DES paper put it:
'In the light of this outcome from the re-marking exercise, the SEC decided to approach from a different angle the task of making GCSE grades more objective.' (DES, 1987b, op cit, para 13.)
This new angle involved the development of performance matrices, which meant starting the other way round. The starting point was some of the existing approved GCSE syllabuses and the task was to develop, for these particular syllabuses, specific descriptions of performance, 'attributes', at different levels. These attributes defined as 'a quality developed in students who follow a particular course' are described at different levels of performance (e.g. Grades A, C and F) and combined into domains. (Quite how the introduction of the concept of attributes was to help in an already hugely complicated area is not at all clear!) The point about performance matrices is that they relate to individual syllabuses rather than the whole subject, and they are based on examiners' articulation of their implicit judgements in awarding grades.
The final reports of the working parties were produced in mid-1988 with varied reactions to the viability of performance matrices. But when the SEC was superseded by the School Examinations and Assessment Council (SEAC) at the end of 1988, one of the first things it did was to freeze work on performance matrices (SEAC, 1989).
In the summer of 1988, the first GCSE papers were graded on the basis of the original (loose) grade descriptions. This approach continued throughout the first five years of GCSE awarding and looks likely to continue, though these grade descriptions were to have to be revised to bring them into line with the appropriate National Curriculum Statements of Attainment (SoAs). These SoAs were the assessment criteria for the National Curriculum.
With the advent of a 'criterion-referenced' national assessment system there seemed little point in continuing the search for performance matrices or grade-related criteria. The GCSE was to follow the attainment targets (ATs) and SoAs, and these were to form the basis for criterion-referencing. However, as we have intimated already, the imprecision of the ATs and SoAs and the problems of aggregation are likely, at best, to produce only a loosely criterion-referenced system.
Syllabuses were to be based on the national curriculum programmes of study and assessment reported on the ten-level scale. The model proposed by SEAC for the 1994 Key Stage 4 (age 16) awards was mark-based (SEAC, 1992). Pupils were to receive their subject level on their combined marks from each AT, not from aggregating and averaging the AT levels as at Key Stage 3 (age 14). Similarly, within an AT the level was to be determined by the total marks across all the levels. For example, if the level 8 boundary is set at 70 per cent, pupils may achieve it by gaining high marks on the lower level questions and fewer marks on questions targeted at levels 9 and 10. The problem with this approach is that this degree of compensation weakens the criterion-referenced basis of the assessment. A pupil might even gain a level 8 simply by being highly consistent at lower levels and gaining hardly any marks at level 8.
Another example of the difficulty in aggregating complex data is the process of 'compensation' in which a poor performance in one part can be offset by a good performance elsewhere, a process which is traditionally offered in examination marking. In a marked-based approach, it simply means that low and high marks are combined to produce a 'middling' overall mark. Strict criterion-referencing does not work like this - pilots, for example, are expected to master every aspect of flying, and failure on one part leads to overall failure. It would be of little comfort to know that a pilot is extremely good at taking off and that this compensates for poor landing skills. If strict criterion-referencing were translated into exam performance, however it would mean that the final subject level would be determined by the worst skill areas. For example, if algebra was only grade F, then the subject grade would have to be F; to give an overall grade C because geometry was grade A would be misleading, as grade C level competence has not been shown in all areas of the subject.
In 1993 the whole process of shifting to reporting GCSE in terms of national curriculum levels was halted, awaiting the outcome of a review of the National Curriculum and its assessment.
The experience of using grade descriptions in GCSE has not been particularly effective: they were referred to in the first year by examiners and Chief Examiners but the tendency is for them to become internalised and examiners then rely on memory. One difficulty is that grade descriptions are descriptions of typical performance at that grade and therefore they are not very useful at the grade boundaries.
Recent developments
From the announcement of the National Assessment programme until summer 1991, discussion centred around how the GCSE, with its grade descriptions, might be brought in to line with the 10-level scale for national assessment. In July 1991, however, there was a development which overshadowed this focus. The Prime Minister decided that the amount of coursework should be reduced, so that any award in any subject would depend in part on an externally set examination. One hundred per cent coursework schemes would therefore no longer be approved. The Prime Minister, in a speech to the Centre for Policy Studies, a right-wing think tank, said:
'It is clear that there is now far too much coursework, project work and teacher assessment in GCSE. The remedy surely lies in getting GCSE back to being an externally assessed exam, which is predominantly written. I am attracted to the idea that for most subjects, a maximum of 20% of the marks should be obtained from coursework.' (quoted in Daugherty, R 1994)
The next thing, shortly after the publication of the GCSE results in summer 1992, was that the Secretary of State quoted from an unpublished HMI report and used the increase in A to C grades to cast doubt on the quality of the GCSE and to announce an urgent enquiry. This was, in some ways, an unexpected event, but in other ways not. The critics of the GCSE, those in the government and on the extreme right, had long felt that it was too easy and coursework was perceived to be inappropriate, although the previous year's results had been welcomed by ministers as evidence that the GCSE was raising standards of performance of the nation's 16-year olds. However, the continuing picture of more students receiving higher grades in 1992 was interpreted as showing that standards were dropping and that the GCSE was too easy. The HMI's report suggested that there was some evidence for erosion of standards and this was the professional item upon which John Patten fixed his attack. The eventual result of this enquiry was the publication, in January 1993, of a Mandatory Code of Practice for GCSE which lays down in detail the lines of accountability for examination procedures and standards; the processes by which examination papers and coursework assignments are set; and how examination boundaries are to be set and aggregated to a subject grade.
Meanwhile, in the core National Curriculum subjects, Maths, Science and English, syllabuses were due to be ready for the first cohort of students to start in September 1992, who were then to be certificated against the 10-level scale in summer 1994. This work was being hampered by difficulties in shifting from a mark-based and grade description-related system of grading, towards one based entirely on Statements of Attainment achieved and National Curriculum levels.
The review of the National Curriculum and Assessment programme, which was announced in April 1993, in the face of a massive teacher boycott of National Assessment at Key Stage 3 and Key Stage 1, was embarked upon by Sir Ron Dearing, an ex-civil servant and businessman.
One of the first things that he did was to recommend that the original GCSE A to G grading scale be retained. By January 1994 he recommended, and the Secretary of State agreed, that this scale should be confirmed for the years beyond 1995, with the addition of a 'Starred A' grade in order to raise the ceiling of achievement. Dearing eventually recommended that the 10-level scale should not be used at all at Key Stage 4 and that the achievement of less able students, who were not able to be awarded a GCSE certificate, might have to be recognised in Records of Achievement. This uncoupling of grading at Key Stage 4 from the 10-level scale is hugely significant for the future of the National Curriculum and its assessment structure, since the 10-level scale now effectively finishes at 14. The syllabuses and assessment schemes for GCSE will, of course, have to be related to the (new, slimmed down) National Curriculum, but the details of how this will work are not yet clear. The idea of a national entitlement curriculum was also damaged by the Dearing recommendations, since a minimal core curriculum was proposed for Key Stage 4. As a result, only the core subjects (English, mathematics and science) are compulsory for all students, together with short courses in technology and in a modern, foreign language. These short courses cannot be assessed by GCSE but will need some other qualification, either as a stand-alone certificate, presumably reported in terms of National Curriculum levels, or as part of a full GCSE qualification (eg Design and Technology).
Other subjects such as history and geography will now be optional at Key Stage 4 (14-16 years), a far cry from the original proposals in which both were compulsory to 16+, then modified to choosing one of them, or taking a 'short course' in each.
The rationale for this has been to 'free-up' curriculum time so that schools will be free to choose what to offer for around 40% of the timetable. A very recent development here is that schools may be able to offer the coursework-based General National Vocational Qualification (GNVQ). This modular approach is far more strongly criterion-referenced in its approach to demonstrating mastery in each unit and will provide a distinct contrast to the increasingly examination-based GCSE.
Thus despite Sir Keith Joseph's goal of 'grade-related criteria' for the GCSE and the introduction of a National Curriculum which was intended to be criterion-referenced, the GCSE looks set to remain a largely mark-based examination in which grades are awarded by a mix of total marks allocated and examiner judgements about the overall quality of performance in the subject.
That said, most people who are involved with the GCSE, as teachers, examiners, and assessment specialists, feel that it is a good model for an external exam, as long as we can keep some coursework in it.
References
Brown M (1988) 'Issues in formulating and organising attainment targets in relation to their assessment' in National Assessment and Testing: a research response ed H Torrance, BERA.
Daugherty R (1994) National Curriculum Assessment, London, Falmer Press.
DES (1987) 'Improving the basis for awarding GCSE grades', unpublished paper, September 1987 (made available to TGAT).
Kingdon M and Stobart G (1987) The draft grade criteria: a review of LEAG research, LEAG discussion paper.
Murphy R (1986) 'The Emperor has no clothes': grade criteria and the GCSE' in The GCSE: an uncommon exam ed C Gipps, ULIE, Bedford Way Paper 29.
Orr R and Nuttall D (1983) 'Determining standards in the proposed system of examining at 16+', Comparability in Examinations, Occasional Paper 2, London, Schools Council.
SEAC (1989) Progress report on the GCSE, July 1989.
SEAC (1992) National Curriculum Assessment. Assessment arrangements for core and other foundation subjects London: SEAC.
SEC (1984) The development of grade-related criteria for the GCSE. A briefing paper for working parties,London SEC.