Assessment

Comparable Outcomes: Setting the standard?

What is the comparable outcomes framework, how does it underpin grade standards and are there alternatives?

Why is the role of comparable outcomes being debated?
What is the comparable outcomes framework?
Why is the comparable outcomes framework used?
What is the principle underlying the comparable outcomes framework?
How does the comparable outcomes approach work in practice?
What are the benefits of the comparable outcomes framework?
What are the challenges with applying the comparable outcomes framework in setting grade boundaries?
What are the alternatives to standardising GCSE, AS and A-level grades?
What is the future for the comparable outcomes framework?
Date18/11/21
AuthorCorina Balaban (Researcher), James Lloyd (Head of Policy and Public Affairs) & Phoebe Surridge (Researcher)- AQA

Why is the role of comparable outcomes being debated?

The comparable outcomes framework has been at the heart of standards and grade setting for GCSEs, AS and A-levels in England over the last decade. Politicians credit it for having “halted” grade inflation.[1]

However, over time, stakeholders have raised concerns around its use and perceived impact on schools in particular.

This briefing describes how comparable outcomes is used, its key benefits, challenges and alternative approaches to setting standards.

What is the comparable outcomes framework?

Comparable outcomes is a framework that exam boards in England, under the oversight of the exams regulator Ofqual, use to guide the setting of grade boundaries for GCSEs, AS and A-levels. By applying the comparable outcomes framework, exam boards and the regulator are better able to standardise the grades awarded each year at a national level. The aim of the comparable outcomes framework is to ensure that grades are comparable over time and across exam boards.[2]

Why is the comparable outcomes framework used?

For students, the value of GCSE, AS and A-level grades depends on their currency with employers and education institutions, such as universities.

Without any attempt at standardisation, qualifications would lose their value. If an employer or university cannot be confident as to what level of attainment is represented by different grades awarded to a student, the value of that student’s qualifications is undermined.

The comparable outcomes framework is used in England as the preferred method for enabling greater standardisation of grades.

Crucially, applying the comparable outcomes framework means a grade in one subject in one year can more meaningfully be compared:

  • with other years, e.g. a grade 6 in GCSE Maths in 2018 will reflect a similar level of attainment to a grade 6 in GCSE Maths in another year;
  • across exam boards, e.g. the outcome of a student taking an AQA exam should be comparable to the outcome of a student taking the same qualification from another exam board.[3]

What is the principle underlying the comparable outcomes framework?

The basic principle that underpins the comparable outcomes framework is that for a qualification where the entry cohort is similar to previous years, the overall proportion of students achieving each grade should also be similar to previous years, with some adjustment for differences in the difficulty of the papers.[4]

How does the comparable outcomes approach work in practice?

Each exam board uses the same statistical process to model grade outcomes for each of their qualifications. Ofqual oversees the application of comparable outcomes by exam boards, and describes the approach as follows:[5]

“Predictions are based on the relationship [emphases added] between prior attainment and national results in a reference year. Exam boards use prior attainment at Key Stage 2 when predicting GCSE outcomes, and prior attainment at GCSE when predicting AS and A level outcomes.”

These predictions are used to generate statistically recommended grade boundaries i.e. where the statistics suggest the position of grade boundaries should be. Senior examiners then look at student exam papers around each boundary to see whether they represent an appropriate level of performance. The examiners make adjustments to the boundaries as they see necessary. The weighting given to the statistical predictions in making decisions about grade boundaries also depends on the robustness of the statistics.

Final grade boundaries are therefore decided by a combination of the examiner perspective and the statistics.

What are the benefits of the comparable outcomes framework?

In addition to improving comparability over time and across exam boards, there are other key benefits to the application of the comparable outcomes framework.

First, the comparable outcomes approach enables new qualifications – and new versions of existing qualifications – to be introduced without penalising the earliest cohorts who take these qualifications.[6]

When new qualifications are introduced, there can be a sudden drop in performance followed by a slow recovery, often referred to as the ‘Sawtooth Effect’.[7] There are a number of reasons why the first students to take a new qualification might be expected to perform worse relative to later cohorts.[8] For example, teachers may be less familiar with teaching the content of the new specification, and there may be fewer supporting materials available, such as text books and sample exam papers.[9]

The comparable outcomes framework ensures there is a smooth transition between old and new qualifications (e.g. in the wake of a reform of the national curriculum).

Second, the comparable outcomes approach makes it easier to separate out changes in performances caused by differences in the difficulty of the paper from changes caused by differences in the students taking the paper. For example, if a GCSE Physics paper is harder than the paper from the previous year – despite the many quality assurance processes exam boards use to make the difficulty of papers as similar as possible – the comparable outcomes approach is used to help ensure students sitting the harder paper aren’t unfairly penalised for doing so.[10]

Third, the comparable outcomes approach has proved very effective in addressing concerns about grade inflation. In periods of stability, there has been a historical tendency for average grades to increase over time, damaging trust in the qualifications.[11] The comparable outcomes framework helps stabilise the value of the qualifications for students, universities and employers.[12]

What are the challenges with applying the comparable outcomes framework in setting grade boundaries?

A number of stakeholders have criticised the use of the comparable outcomes approach to inform grade boundaries for GCSE, AS and A-levels in England.

These criticisms typically do not stem from concerns with the use of the comparable outcomes framework itself, nor the policy decision to attempt to standardise grades.

Instead, these concerns typically relate to the interaction of these policy choices with other aspects of the school and accountability system.

In particular, there is concern that the comparable outcomes approach does not reflect any genuine rise (or fall) in attainment in the grade outcomes awarded.[13] [14] By maintaining stability of outcomes at a national level, it is sometimes perceived that individual schools cannot demonstrate improvements, or that they cannot do so without a corresponding drop in another school’s results, although there has been no decline in the standards achieved by that school.[15] [16] It is therefore felt that comparable outcomes cannot fully reflect improvements in teaching and professional development.

Another concern arises from the way in which applying the comparable outcomes framework can mean a broadly comparable proportion of students is awarded a GCSE grade 3 to 1 each year. Although exam boards do not award ‘fail’ grades, the government’s policy of describing grade 4 as a ‘standard pass’ means grades 3 to 1 are often viewed implicitly as fail grades. Some stakeholders argue that in effect, some students will ‘inevitably’ fail, and that this is unacceptable in the context of student trust and experience, as well as damaging to student motivation. 

Several points can be made in response to these concerns.

First, in practice, large cohort-level changes in performance in a single year are rare.[17] [18] Small changes at a national level can and do occur even within comparable outcomes – the process aims for stability over time, it does not fix outcomes. Over several years, trends in performance could be observed.

Second, as an additional safeguard, Ofqual has introduced the National Reference Test (NRT) to detect any improvements in teaching standard over time in GCSE English Language and Maths, at a national level.[19] The test is administered to a sample of Year 11 students in the March before they sit their GCSE examinations. The results could be used to amend the GCSE grade boundaries if any large changes in performance were observed.

Historically, prior to the adoption of the comparable outcomes approach, two other processes were used to ensure comparability of grades within the exam system in England. Both processes are, to some extent, still applied alongside the comparable outcomes framework, but in a more nuanced way.

Norm-referencing sees each grade consistently awarded to the same percentage of each subject cohort consistently year on year and across exam boards without adjusting results based on previous years. This approach was popular in the 1950s but fell out of favour towards the 1980’s.[20]

However, norm-referencing confronts a number of challenges associated with the comparable outcomes approach – e.g. the inability to demonstrate systemic improvements in learning outcomes – without the additional flexibility provided for comparable outcomes by the use of data from the National Reference Test.

The second approach, criterion-referencing, was popularised in the 1980s as replacement for norm-referenced tests.[21] It aims to measure a candidate’s performance against a set of pre-defined assessment criteria; each grade is awarded to all those who satisfy the performance-related criteria stated for that grade.[22] In contrast to norm-reference tests, in criterion-referenced tests, the performance of other students does not affect a student’s score.[23]

Crucially, however, unlike the comparable outcomes framework, when used alone, criterion referencing does not protect against the Sawtooth Effect described above.

What are the alternatives to standardising GCSE, AS and A-level grades?

If policymakers decided to end the use of the comparable outcomes framework and any attempt to standardise grades, and instead rely entirely on subjective examiner judgement, this would result in the loss of the benefits described above in relation to the currency of grades, grade inflation and the smooth transition between cohorts.

In addition to the loss of such benefits, policymakers would likely need to accept accompanying consequences. For example, the development of bespoke entrance tests to enable differentiation between applicants by employers and education institutions who no longer felt were able to rely on GCSE, AS and A-level grades.

However, the use of bespoke grades would undoubtedly lead to concerns around fairness. For example, variations in the ability of individual schools (or families) to prepare students for the entrance tests of specific institutions would be likely to lead to challenges for social mobility.

What is the future for the comparable outcomes framework?

The principal beneficiaries of comparable outcomes are students, given the critical role the framework has had in underpinning the currency of GCSE, AS and A-level qualifications.

As noted, many criticisms of the use of the comparable outcomes approach to standardise grades do not reflect objections to the approach itself - or the policy decision to standardise GCSE, AS and A-level grades – but rather, they relate to the interaction of these decisions with policy decisions around school accountability and the GCSE ‘pass’ framework.

There is no doubt that the comparable outcomes approach represents a complex process that is prone to misunderstanding by stakeholders,[24] with accompanying risks to public confidence.

Ultimately, policymakers may conclude public confidence in comparable outcomes cannot be maintained alongside the existing school accountability framework. In this situation, policymakers may need to choose between the relative importance of grading standards versus other policy objectives, or consider other options to meet those objectives.

[1] Gibb, N. (2016). Government Response to the Consultation on Ofqual’s National Reference Test. Department for Education. https://questions-statements.parliament.uk/written-statements/detail/2016-03-24/HCWS650

[2] Newton P., E. (2020) What is the Sawtooth Effect? The nature and management of impacts from syllabus, assessment, and curriculum transitions in England. Ofqual.

[3] Ofqual (2017). Inter-board comparability of grade standards in GCSEs AS and A levels 2017. Inter-board comparability of grade standards in GCSEs, AS and A levels 2017 (publishing.service.gov.uk)

[4] Ofqual (2017). Inter-board comparability of grade standards in GCSEs AS and A levels 2017. Inter-board comparability of grade standards in GCSEs, AS and A levels 2017 (publishing.service.gov.uk)

[5] Ofqual (2017) Inter-board comparability of grade standards in GCSEs, AS and A levels 2017. Inter-board comparability of grade standards in GCSEs, AS and A levels 2017 (publishing.service.gov.uk)

[6] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32

[7] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32

[8] Jadhav C (2016) Setting standards for new AS qualifications. The Ofqual Blog. https://ofqual.blog.gov.uk/2016/08/09/setting-standards-for-new-as-qualifications/

[9] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32

[10] Newton P., E. (2020) What is the Sawtooth Effect? The nature and management of impacts from syllabus, assessment, and curriculum transitions in England. Ofqual.

[11] Benton, T. & Sutch, T. (2014). Analysis of use of Key Stage 2 data in GCSE predictions, ARD Research Division. Ofqual

[12] Benton, T. (2016). Comparable Outcomes: Scourge or Scapegoat? Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment

[13] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32

[14] Benton, T. (2016). Comparable Outcomes: Scourge or Scapegoat? Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment

[15] Bousted, M. (2015) ‘England’s secondary heads and teachers are stuck in a zero-sum game from which it’s impossible to escape’. TES News. ‘England’s secondary heads and teachers are stuck in a zero-sum game from which it’s impossible to escape’ | Tes News

[16] Benton, T. (2016). Comparable Outcomes: Scourge or Scapegoat?. Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment

[17] Sizmur, J., Ager, R., Bradshaw, J., Classick, R., Galvis, M., Packer, J., ... & Wheater, R. (2019). Achievement of 15-year-olds in England: PISA 2018 results: Research report, December 2019. Research and analysis overview: PISA 2018: national report for England - GOV.UK (www.gov.uk)

[18] Coe, R. (2007). Changes in standards at GCSE and A-level: Evidence from ALIS and YELLIS: A report for the ONS.

[19] Stacey, G. (2015). The national reference test. The Ofqual Blog. The national reference test - The Ofqual blog

[20] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32

[21] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32

[22] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32

[23] Prince, R. (2016). Predicting Success in Higher Education: The Value of Criterion and Norm-Referenced Assessments. Practitioner Research in Higher Education10(1), 22-38.

[24] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32

Assessment

What can this year’s GCSE entries tell us to look for in tomorrow’s results?

With GCSE Results published on 22nd August, Dr Chinwe Njoku looks into the underlying data on what subjects this year's cohort took and how this has changed from previous years.

Read Moreicon / report
Assessment

A-level maths students hit six figures – what’s behind its popularity?

On Thursday, more than 100,000 A-level maths students in England will find out their results – 11.4% more than last year. Why the upturn? Dr Chinwe Njoku, AQA Education Insights Lead and former maths teacher, was heartened by the news and keen to look at the story behind the data.

Education Policy

What comes after ‘urgent’ for the new Education Secretary?

After the burning issues are addressed, what should come next for the new Education Secretary?

Education Policy

Labour’s oracy plans: They need clear goals

Sir Keir Starmer has said he wants to boost students’ confidence by raising the importance of speaking skills – oracy. In this previously published blog, Reza Schwitzer, AQA’s director of external affairs, applauds the ambition but warns there needs to be clear goals

Education Policy

Through the looking glass: How polling the public can help policymakers learn about themselves

Public attitude data is key to effective policymaking. Proper polling can reveal what people think about existing policies and what they want for the future. But, if looked at from a different angle, it can also help policymakers question themselves and their assumptions about the public. In this blog, AQA’s Policy and Evidence Manager Adam Steedman-Thake, reveals the lessons he learned about himself while reading a recent public attitude survey.

Assessment

Assessing oracy: Is Comparative Judgement the answer?

Oracy skills are vital to success in school and life. And yet, for many children, opportunities to develop them are missed. Educationalists are engaging in a growing debate about where oracy fits into the school system. Labour has put it at the heart of its plans to improve social mobility and an independent commission is looking at how it is taught in the classroom. This renewed focus on oracy means it is more important than ever that teachers have a way to reliably assess and understand their students’ attainment and progression. Amanda Moorghen of oracy education charity Voice 21 explains how Comparative Judgement can help with that and why it may be a game changer.

Education

TV subtitles as an aid to literacy: What does the research say?

Jack Black is probably best known in educational circles for playing a renegade substitute teacher in School of Rock. But the Hollywood star has made a more conventional foray into education by backing the use of TV subtitles to improve child literacy. Stephen Fry and the World Literacy Foundation also want parents to use their TV remotes to get children reading. So, could this simple click of a button be a solution to boost pupils’ reading skills? AQA’s resident expert on language teaching, Dr Katy Finch, casts her eye over the research to see whether it stacks up.

Data Analysis

What is left behind now education’s Data Wave has receded?

Is data the solution to all education’s issues? About a decade ago the prevailing wisdom said it was. Advocates of this Data Wave argued that harvesting internal statistics would help schools solve issues such as teacher accountability and attainment gaps. As with all waves, after crashing onto the beach they recede, leaving space for another to roll in. In this blog, teacher, author and data analyst Richard Selfridge looks at the legacy of the Data Wave to see what schools can take from it.

International Approaches

Finland & PISA – A fall from grace but still a high performer?

Finland was once recognised as one of the most successful educational systems in the world. At the turn of the millennium, it topped the PISA rankings in reading, maths and science. But by 2012, decline set in. The last set of results showed performances in maths, reading and science were at an all-time low. In this blog Dr Jonathan Doherty of Leeds Trinity University outlines some reasons that may account for the slide.

Briefing

PISA, TIMSS and PIRLS: What actually are they and what do they tell us?

According to the latest PISA results, England’s science scores are still on a downward trajectory that started a decade ago. Yet TIMSS, another respected study, has science performances rising. Which of them is right? Is one more valid than the other? In this blog AQi examines three International Large-Scale Assessments and finds that, although they may look the same from a distance, get up close and you’ll find they are very different beasts.

Read Moreicon / chart

Join the conversation on Twitter

Follow us for discussion and new post alerts

Download a PDF version.

Download a copy of this content to your device as a PDF file. We generate PDF versions for the convenience of offline reading, but we recommend sharing this link if you'd like to send it to someone else.

Sign up to our newsletter

Sign up for news and alerts

Pages
BlogsPublicationsData StoriesEventsAbout
Hosted and Curated by AQA with AQA logo
© 2024 AQA