UNCOVERED: 1 in 4 EXAM GRADES IS WRONG
'An important contribution to our thinking.’ – Sixth Form Colleges Association
'An uncomfortable but important read.’ – Headmasters’ and Headmistresses’ Conference
'Everyone in UK education should reflect upon the problems identified in this powerful book' – Higher Education Policy Institute
Every summer one million GCSE and A-Level candidates receive results that define their school years and set them up for their life. But those results are gravely unreliable.
In fact, about one grade in four in England is WRONG. That is 1.5 million grades every year.
An A-Level grade B might have been an A, or even a C, had a different examiner marked the script. Similarly, a GCSE grade 7 might have received a grade 8 or a 6.
For a decade, young people and their friends and families have been unable to grasp the full extent of this randomness. Now, in this definitive and easy to follow book, Dennis Sherwood explains why so many pupils receive final grades that don’t do them justice. And he suggests ways to regain trust, which apply to essay-based exams throughout the world.
FOREWORD by Robin Bevan,
Headteacher, Southend High School for Boys
NEU Past National President, 2020-21
CHAPTER 1: EXAM GRADES ARE IMPORTANT
THURSDAY, 15TH AUGUST 2019
A FACT THAT MIGHT BE A SURPRISE
WHAT THIS BOOK IS ABOUT
SOME RELEVANT EVIDENCE
CHAPTER 2: EXAMS IN ENGLAND
WHAT THIS CHAPTER IS ABOUT
THREE QUICK QUESTIONS
GCSE, AS AND A-LEVEL
EXAM CENTRES AND SCHOOLS
AWARDING BODIES
THE REGULATORS – OFSTED, THE DFE, AND OFQUAL
THE HOUSE OF COMMONS EDUCATION COMMITTEE
MARKING
THE RANK ORDER
GRADE STRUCTURES AND GRADE BOUNDARIES
CRITERION REFERENCING, COHORT REFERENCING AND NORM REFERENCING
Criterion referencing
Cohort referencing
Norm referencing
CHALLENGES AND APPEALS
HOW THE APPEALS PROCESS WORKS NOW
CHAPTER 3: ARE EXAM GRADES 99.2% ACCURATE?
SOME REALLY GOOD NEWS
EDEXCEL’S CLAIM
EDEXCEL’S 99.2% NUMBER
EDEXCEL ARE NOT ALONE…
…BUT OFQUAL KNEW THIS, CERTAINLY IN 2014
CHAPTER 4: TWO IMPORTANT WORDS: ‘ACCURATE’ AND ‘RELIABLE’
WHAT DOES ‘ACCURATE’ MEAN?
CAN EXAM MARKS EVER BE ACCURATE?
FUZZINESS
CAN EXAM GRADES EVER BE ACCURATE?
RELIABILITY
THE BIG QUESTION
CHAPTER 5 – SUMMER 2016: OFQUAL MAKE IT HARDER TO APPEAL
WHY THE APPEALS PROCESS IS IMPORTANT
THE ‘REASONABLENESS’ TEST
IS THE ‘REASONABLENESS’ TEST REASONABLE?
SOME NUMBERS
WHAT’S GOING ON?
WHAT HAPPENED IN 2016
THE OUTCOME
CHAPTER 6: OFQUAL’S FIRST MEASURES OF GRADE RELIABILITY
MARKING CONSISTENCY METRICS, NOVEMBER 2016
OFQUAL’S KEY FINDINGS
WHY ARE GRADES UNRELIABLE?
THE STING IN THE CAPTION
MAKING GCSE GRADES EVEN MORE UNRELIABLE
HOW THE PRESS REPORTED THE SUMMER 2017 RESULTS
‘IT ALL COMES OUT IN THE WASH’
AUGUST 2018
CHAPTER 7: OFQUAL’S REAL MEASURES OF GRADE RELIABILITY
MARKING CONSISTENCY METRICS – An Update, NOVEMBER 2018
THE REAL RELIABILITIES OF EXAM GRADES
WHAT IS THE AVERAGE RELIABILITY OVER ALL SUBJECTS?
WHAT THESE NUMBERS MEAN
GRADE RELIABILITY BY MARK
‘UNFORTUNATE’ AND ‘LUCKY’ STUDENTS
WHY OFQUAL’S MEASUREMENTS ARE UNDERESTIMATES
CHAPTER 8 – WHY GRADES ARE UNRELIABLE
THE STORY SO FAR…
THREE REASONS WHY MARKING IS NOT THE PROBLEM
‘COMMON SENSE’
A MORE POWERFUL EXPLANATION – FUZZINESS
FUZZINESS IS A PROPERTY OF THE SUBJECT ONLY
ONE WAY TO MEASURE FUZZINESS
WHY FUZZINESS IS IMPORTANT…
…BUT OFQUAL REFUSE TO ACKNOWLEDGE THIS
CHAPTER 9 – NOVEMBER 2018 TO SUMMER 2019
THE PRESS RESPONSE TO OFQUAL’S UPDATE
HEPI, 2019
THE DAILY TELEGRAPH, 10TH AUGUST 2019
THE SUNDAY TIMES, 11TH AUGUST 2019
THE DAILY TELEGRAPH, 13TH AUGUST 2019
THE BBC, GCSE RESULTS DAY, 22ND AUGUST 2019
WHAT HAPPENED NEXT
CHAPTER 10 – 2020: CAGS AND RANK ORDERS
EXAMS ARE CANCELLED
OFQUAL’S GUIDANCE AND CONSULTATION
WHAT SCHOOLS HAD TO DO
THE CAGS
THE RANK ORDER
STATISTICAL STANDARDISATION
CAN YOU GUESS THE ALGORITHM?
STATISTICAL STANDARDISATION, GRADE INFLATION AND NORM REFERENCING
ROUNDING
STATISTICS
WHAT OFQUAL SHOULD HAVE DONE
Define all the rules
Give every school the same spreadsheet
Expect, and allow for, exceptions and outliers
A PUZZLE
CHAPTER 11: THE GREAT CAG CAR CRASH
OFQUAL’S BLOG OF 18TH MAY 2020
ALAS, POOR ISAAC
EARLY WARNINGS
THE EDUCATION SELECT COMMITTEE REPORT OF 11TH JULY 2020
OFQUAL’S 2020 SUMMER SYMPOSIUM
THE GUARDIAN, 8TH AUGUST 2020
THE SCOTTISH PRECEDENT
OFQUAL CHANGES THE RULES FOR APPEALS
GAVIN WILLIAMSON’S APPEALS ‘TRIPLE LOCK’
A-LEVEL RESULTS DAY, 13TH AUGUST 2020
… AND THE NEXT FEW DAYS
THE FUSE BURNS…
THE EXPLOSION
CHAPTER 12 - THE AFTERMATH 206
THE REACTION
WHY WAS THE ALGORITHM THROWN AWAY?
WERE THE CAGS RIGHT? OR INDEED FAIR?
WILL THE REAL GRADE PLEASE STAND UP?
THE ALGORITHM
EXAM GRADES ARE ‘RELIABLE TO ONE GRADE EITHER WAY’
CHAPTER 13 - SUMMER 2021: THE TAGS 233
EXAMS ARE CANCELLED AGAIN
‘WE’RE TRUSTING TEACHERS, NOT ALGORITHMS’
PERHAPS TEACHERS REALLY CAN BE TRUSTED…
WERE TAGS FAIR?
TOWARDS 2022, AND BEYOND…
MARCH 2022
CHAPTER 14 – NINE WAYS TO DELIVER RELIABLE AND TRUSTWORTHY GRADES
SETTING THE SCENE
WHAT’S THE PROBLEM WE HAVE TO SOLVE?
THREE DIFFERENT STRATEGIES
Strategy 1 – Reduce fuzziness to zero
Strategy 2 – Accept fuzziness exists and change existing processes a little
Strategy 3 – Accept fuzziness exists and do something quite different
STRATEGIES THAT REDUCE FUZZINESS TO ZERO
Solution 1 – Only one examiner
Solution 2 – Artificial intelligence (AI)
Solution 3 – Multiple-choice exams
Solution 4 – Tighter mark schemes
Solution 5 – Better training of examiners, better quality control
STRATEGIES THAT ACCEPT THAT FUZZINESS EXISTS, AND CHANGE EXISTING
PROCESSES A LITTLE
Solution 6 – Double marking
Solution 7 – Use grades
Solution 8 – Fewer, wider, grades
Solution 9 – Different grade structures for different subjects
CHAPTER 15 – FIVE FUNDAMENTALLY DIFFERENT WAYS TO DELIVER RELIABLE AND TRUSTWORTHY ASSESSMENTS
FIVE MORE SOLUTIONS
AN EASY WAY TO ESTIMATE ANY SUBJECT’S FUZZINESS f
SOLUTION 10 – AWARD GRADES DETERMINED BY m + f
SOLUTION 11 – AWARD GRADES DETERMINED BY m – f
SOLUTION 12 – TWO GRADES
SOLUTION 13 – THREE GRADES
SOLUTION 14 – THROW GRADES AWAY AND AWARD m ± f
A FINAL THOUGHT
CHAPTER 16 – OVER TO YOU…
APPENDIX - FUZZINESS, A DEEPER DIVE
REFERENCES
ACKNOWLEDGEMENTS
INDEX