When I wrote about an exam announcement last week it was out of date before I’d finished typing. This post too may now be out of date if the appeals system allows major changes, but I have seen so much false information that I thought I’d better get this out there.
Exams were not sat this year. The decision was made instead to predict what grades would have been given. This is probably the decision that should have been debated. Instead the debate has centred on how grades were predicted with much talk of an evil algorithm crushing children’s hopes. Some wished to predict grades deliberately inaccurately in order to allow grade inflation to hide the problems. Because opportunities such as university places and employment are finite, grade inflation actually doesn’t solve any problem. What it does is make sure that when people lose out on opportunities, it would not be clear that this year’s grades were the problem. I argued against the idea that grade inflation solves problems here and will not be going into it again now, but it is worth noting that most disagreement with any opinions I express in this post will be from advocates of using grade inflation to solve problems, rather than anything else. In particular, it needs to be acknowledged that the use of teacher assessment would have on average led to more grade inflation.
However, because people seemed to think inaccuracy in grades would justify grade inflation, and because people objected to specific grades when they arrived, there has now been huge debate about how grades were given. Much of this has been ill-informed.
I intend to explain the following:
- How grades are predicted.
- Why predicted grades are inaccurate.
- What claims about the process are false or unproven.
Normally, I’d split this into 3 posts, but things are moving so fast I assumed people would want all this at once in one long post.
How grades are predicted.
Ofqual produced a statistical model that would predict the likeliest grades for each centre (usually a school or college). This used all the available data (past performance and past grades of the current cohort) to predict what this year’s performance would have been. This was done in accordance with what previous data showed would predict grades accurately. A lot of comment has assumed that if people are now unhappy with these predictions or individual results, then there must have been a mistake in this statistical model. However, this is not something where one can simply point at things one doesn’t like and say “fix it”. You can test statistical models using old data, e.g. predict 2019 grades from the years before 2019. If you have a model that predicts better than Ofqual’s then you win, you are right. If you don’t, and you don’t know why the Ofqual model predicts how it does, then you are probably wrong. In the end, proportions of grades were calculated from grades given in recent years, then adjusted in light of GCSE information about current students, then the number of expected A-levels in each subject at each grade was calculated for each centre. Centres were given information about what happened in this process in their case.
Although the model came up with the grades at centre level, which students got which grades was decided by the centres. Centres ranked their students in each subject and grades were given in rank order. Some commentary has overlooked this, talking as if the statistical model decided every student’s grade. It did not. It determined what grades were available to be given (with an exception to be discussed in the next paragraph), not which student should get which grade. As a result the majority of grades were not changed and where they were, it would often have been a result of the ranking as well as the statistical model.
Finally, there was an exception because of the problem of “small cohorts” taking exams i.e. where centres had very few students taking a particular exam (or very few had taken it in the past). This is because where there was less data, it would be harder to predict what grades were likely to be given. Centres had also been asked to predict grades (Centre Assessed Grades or CAGs) for each student and for the smallest cohorts these were accepted. Slightly larger cohorts were given a compromise between the CAGs and the statistical model, and for cohorts that were larger still, the statistical model alone was used.
It is important to understand this process if you think a particular grade is wrong. Without knowing whether the cohort was small; why the statistical model would have predicted what it did; how the distribution was calculated for a centre, and where a student was in the ranking, you do not know how a grade came to be given. For some reason, people have jumped to declare the evils of an “algorithm”. Didn’t get your result? It’s the result of an algorithm.
As a maths teacher, I quite like algorithms. Algorithms are the rules and processes used to solve a problem, perhaps best seen as the recipe for getting an answer. Every year algorithms are used after exams to decide grade boundaries and give grades. A mark scheme is also an algorithm. The alternative to algorithms deciding things is making arbitrary judgements that don’t follow rules. This year is different in that CAGs; a statistical model (also a type of algorithm), and centre rankings have replaced exams. The first thing that people need to do to discuss this sensibly is to stop talking about an algorithm that decided everything. If you mean the statistical model then say “the statistical model”. There are other algorithms involved in the process, but they are more like the algorithms used every year: rules that turn messy information into grades. Nobody should be arguing that the process of giving grades should not happen according to rules. Nobody in an exam board should be making it up as they go along.
Why predicted grades are inaccurate.
Predicted grades, whether from teachers or from a statistical model, are not likely to be accurate. That’s why exams are taken every year. The grades given will not have been the same as those that would have been given had exams been sat. Exam results are always influenced by what seem like random factors that nobody can predict (I will discuss this further in the next section). We can reasonably argue over what is the most accurate way to predict grades, but we cannot claim that there is a very accurate method. There are also situations where exam results are very hard to predict. Here is why I think this year’s results will be depressingly inaccurate.
Some students are exceptional. Some will get an A* in a school that’s never had an A*. Some will get a U in a school that’s never had a U. Predicting who these students are is incredibly difficult and remains difficult even where historic A-level results are adjusted to account for the GCSE data of current students. Students will have often unfairly missed out (or unfairly gained) wherever very high or low grades were on the table (i.e. if students were at the top and the bottom of rankings). This is the most heartbreaking aspect of what’s happened. The exceptional is unpredictable. The statistical model will not pick up on these students. If a school normally gets some Us (or it gets Es but this cohort is weaker than usual) the model will predict Us. If a school doesn’t normally get A*s (or it does but this years cohort is weaker than usual) the model will not predict A*s. This will be very inaccurate in practice. You might then think that CAGs should be used to identify these students. However, just as a statistical model won’t pick up an A* or U student where normally there are none, a teacher who has never taught an A* or U student will not be able to be sure they have taught one this time. In the case of U it might be more obvious, but why even enter a student for the exam if it was completely obvious they’d get U? The inaccuracy in the CAGs for extreme grades was remarkable. In 2019, 7.7% of grades were A*; in 2020, 13.9% of CAGs were A*. In 2019, 2.5% of grades were Us; in 2020, 0.3% of CAGs were Us. Both the CAGs and the statistical models were likely to be wrong. There’s no easy way to sort this out, it’s a choice between two bad options.
As well as exceptional students, there are exceptional schools. There are schools that do things differently now, and their results will be different. Like exceptional students, these are hard to predict. Ofqual found that looking at the recent trajectory of schools did not tell them which were going to improve and so the statistical model didn’t use that information. Some of us (myself included) are very convinced we work in schools that are on the right track and likely to do better. However, no school is going to claim otherwise and few schools will admit grades are going to get worse, so again, CAGs are not a solution. Because exceptional schools and exceptional students are by their very nature unpredictable, this is where we can expect to find the biggest injustices in predicted grades.
Perhaps the biggest source of poor predictions is the one that people seem to be reluctant to mention. The rankings rely on the ability of centres to compare students. There is little evidence that schools are good at this, and I can guarantee that some schools I’ve worked at would do a terrible job. However, if we removed this part of the process, grades given in line with the statistical model would be ignoring everything that happened during the course. Few people would argue that this should happen, so this hasn’t been debated anywhere near as much as other sources of error. But for individual students convinced their grades are wrong, this is likely to be incredibly important. Despite what I said about the problems with A*s and Us, a lot of students who missed out on their CAG of A* will have done so because they were not highly ranked, and a lot of students who have got Us will have done so because they were ranked bottom and any “error” could be attributable to their school rather than an algorithm.
Finally, we have the small cohorts problem. There’s no real way round this, although obviously plenty of technical debate is possible about how it should be dealt with. If the cohort was so small that the statistical model would not work, something else needs to be done. The decision was to use CAGs fully or partially, despite the fact that these are likely to have been inflated. Inflated grades are probably better than random ones or ones based on GCSE results. But this is also a source of inaccuracy. It also favours centres with small cohorts in a subject and, therefore, it will allow systematic inaccuracy that will affect some institutions very differently to others. It is the likely reason that CAGs have not been adjusted downwards equally in all types of school. Popular subjects in large sixth forms are likely to have ended up with grades further below CAGs than obscure subjects in small sixth forms.
Which claims about the process are false or unproven
Much of what I have observed of the debate about how grades were given has consisted of calls for grade inflation disguised as complaints about inaccuracy, or emotive tales of students’ thwarted ambitions that assume that this was unfair or unusual without addressing the cause of the specific disappointment. As mentioned above, much debate has blamed everything on an “algorithm” rather than identifying what choices were made and why. Having accepted the problems with predicting grades and acknowledged the suffering caused by inaccuracies, it’s still worth trying to dispense with mistaken, misleading or inaccurate claims that I have seen on social media and heard on the news. Here are the biggest myths about what’s happened.
Myth 1: Exams grades are normally very accurate. A lot of attempts to emphasise the inaccuracies in the statistical model have assumed that there is more precision in exam grades than there actually are. In reality, the difference between a B grade student and a C grade student can be far less than the difference between two B grade students. Some types of exam marking (not maths, obviously) is quite subjective and there is a significant margin of error, making luck a huge factor in what grades are given. Add to that the amount of luck involved in revising the right topics, having a good day or a bad day in the exam, and it’s no wonder grades are hard to predict with accuracy. It’s not comforting to think that a student may miss out on a university offer because of bad luck, but that is not unique to this year; it is normal. The point of exam grades is not to distinguish between a B grade and a C grade, but between a B grade and a D grade or even an E grade. It’s not that every A* grade reflects the top 7.7% of ability, it’s more a way of ensuring that anyone in the top 1%, say, should get an A*. All grades are a matter of probability, not a definitive judgement. That does not make them useless or mean that there are better alternatives to exams, but it does mean everyone should interpret grades carefully every year.
Myth 2: CAGs would have been more accurate.
As mentioned above, CAGs were higher than they should have been based on the reasonable assumption that a year group with an interrupted year 13 is unlikely to end up far more able than all previous year groups. There’s been a tendency for people to claim that aggregate errors don’t tell us anything about inaccuracies at the level of individual students. This is getting things backwards. It is possible to have inaccuracies for individual students that cancel each other out and aren’t visible at the aggregate level. So you could have half of grades being too high, and half too low, and on average the distribution of grades seems fair. You could even argue that this happens every year. But this does not work the other way. If, on average, grades were too high it does tell us something about individual grades. It tell us that they are more likely to be too high than too low. This is reason enough to adjust downwards if you want to make the most accurate predictions.
Myth 3: Individual students we don’t know getting unpredicted Us and not getting predicted A*s are examples of how the statistical model was inaccurate.
As argued above, the statistical model is likely to have been inaccurate with respect to the extremes. However, because we know CAGs are also inaccurate, and that bad rankings can also explain anomalies, we cannot blindly accept every story about this from kids we don’t know. I mention this because so much commentary and news coverage has been anecdotal in this way. If there were no disappointed school leavers that would merely tell us that the results this year were way out compared to what they should have been, because disappointed school leavers are normal when exam grades are given out. Obviously, the better you know a student, the more likely you are to know a grade is wrong, but even then you need to know their ranking and the justification for the grade distribution to know the statistical model is the problem.
Myth 4: The system was particularly unfair on poor bright children.
This myth seems to have come from two sources, so I’ll deal with each in turn.
Firstly, is has been assumed that as schools which normally get no A*s would not be predicted A*s (not quite true) this means poor bright kids in badly performing schools would have lost out. This misses out the fact that even with little history of getting A*s previously, they might still be predicted if the cohort has better GCSE results than usual, so the error is less likely if the poor bright kid had good GCSEs. It also assumes that it is normal for poor kids to go to do A-levels in institutions that get no A*s which is unlikely for big institutions. Additionally, schools are not uniform in their intake. The bright kid at a school full of poor kids who misses out is not necessarily poor, in fact because disadvantaged kids are likely to get worse results, they often won’t be. Finally, it’s not just low achieving schools whose A* students are hard to predict. While a school that usually gets no A*s in a subject, but who would have got one this year makes for a more dramatic story, the situation of that child is no different to the lowest ranked child in a school that normally gets 20 A*s in a subject and this year would have got 21.
The second cause of this myth, is from statistics about downgrading from CAGs like these.
Although really this shows there’s not a huge difference between children with a different socioeconomic status (SES) it has been used to claim that poorer students were harder hit by downgrading and, therefore, it is poor bright kids that will have been hit worse than wealthier bright kids. (Other arguments have looked at type of school, but I’ll deal with that next). Whether this figure is a result of the problem of small cohorts, or from the fact that it is harder to overestimate higher achieving students, I don’t know. However, we do know the claim these figures reflect what happened to the highest achieving kids is incorrect. If we look at the top two grades, the proportion of kids who had a high CAG and had them downgraded is smaller for lower SESs (although because fewer students received those grades overall the chance of being downgraded given that you had a high CAG would show the opposite pattern).
Myth 5: The system was deliberately rigged to downgrade the CAGs of some types of students more than others
I suppose it’s probably worth saying that it’s impossible to prove beyond all doubt that this is a myth, but I can note the evidence is against it. The statistical model should not have discriminated at all. The problem of small cohorts and the fact it is easier to over-estimate low-achieving students and harder to over-estimate high achieving students seem to provide a plausible explanation of what we can observe about discrepancies in downgrading. Also, if we compare results over time, we would expect those types of institutions who on average had a fall in results last time to have a rise this year. Take those three factors into account and nobody should be surprised to see the following or to think it sinister (although it would be useful to know to what extent each type of school was affected by downgrading and by small cohort size).
If you see anyone using only one of the above two sets of data, ignoring the change from 2018 to 2019, or deciding to pick and choose which types of centre matter (like comparing independent schools with FE colleges) suspect they are being misleading. Also, recall that these are averages and individual subjects and centres will differ a lot. You cannot pick a single school like, say, Eton and claim it will have done well in avoiding downgrading in all subjects this year.
Now for some general myth-busting.
The evidence shows students were affected by rounding errors. False. Suggestions like this, often used to explain unexpected Us, seem entirely speculative and not necessary to explain why students have got Us.
Some students got higher results in further maths than maths. True. Still a tiny minority, but much higher than normal.
No students at Eton were downgraded. Almost certainly false. This claim that was all over Twitter is extremely unlikely; denied anecdotally and there is no evidence for it. We would expect large independent schools to have been downgraded in popular subjects.
Something went wrong on results day. False. Things seem to have gone according to plan. If what happened was wrong it was because it was the wrong plan. Nothing surprising happened at the system level.
Students were denied the grades they needed by what happened. True for some students, but on average there is no reason to think it would have been more common to miss out on an offer than if exams had taken place, and some institutions might become more generous, if they can, due to the reduced reliability of the grades.
Results were given according to a normal distribution. False.
Rankings were changed by the statistical model. False. Or at least if it did happen, it wasn’t supposed to and an error has been made.
The stressful events of this year where exams were cancelled show that we shouldn’t have exams. False. Your logic does not resemble our earth logic.
And one final point. So many of the problems above come down to small cohort size, that next week’s GCSE results should be far more accurate. Fingers crossed. And good luck.
Teachers on the Edge
September 6, 2020Making the frontline the centre of the education system
The biggest difference in education is made by those at the frontline: the teachers (including school leaders), lecturers and support staff. They know who they are serving; they have a responsibility to their learners. They can also see more directly what is working and what isn’t. At every other level, and unfortunately sometimes in school leadership, there is a distance between the decisions made and their results in actual classrooms.
At other levels, the education system is its own worst enemy. This is not a whine about the political leadership of education: the politicians, the policy makers and the civil servants. For good or ill, their careers usually cover far more than just education, changing portfolios and moving departments as they progress. Whatever faults they bring to the system they usually take them with them when they go. What I am referring to is the way that parts of the education system itself seem to be perpetually focused on something other than education.
It’s a given that those responsible for tens of thousands of schools and other educational institutions, are not trying to shape every single classroom. Whether they do their job well or not, it’s clear that their responsibility is to serve the interests of the public as a whole. It’s also clear that they can consult frontline staff if they wish to, and it’s not obvious that they have any particular reason not to. What concerns me, are those parts of the system which seem to have a vested interest in keeping frontline staff out of sight and out of influence. There are parts of the system that tell frontline staff what to do, but do not have to do those frontline jobs themselves and often haven’t done them for years and often look very uncomfortable if those at the frontline have any say in the matter.
In ITT, education departments in universities overwhelmingly expect those training teachers to teach to be full time academics and not to be teaching in schools. As a result, ITT staff are often concerned only with the political and pedagogical orthodoxies of educationalists, not what works in schools. They have no ‘skin in the game’. On issues such as mixed ability teaching and use of exclusion and discipline in schools, university education lecturers typically appear to have attitudes that are militant, extreme and entirely out of touch with teachers. While they would claim their positions are more evidence-informed than those of teachers, there are also some issues such as phonics where it is noticeable how often educationalists stand against the evidence.
Frontline staff are not encouraged to have much say over their own professional development. CPD budgets are spent by schools and colleges, not by the individual professionals. While it is only appropriate for schools and colleges to provide some proportion of CPD, after all schools need to train their staff in the school specific systems and expectations, this has left education workers unable to set their own priorities. As a result, a voluntary “shadow” system of CPD has developed that teachers take part in during their own time and often pay for out of their own pockets. After school teach meets, BrewED events in pubs, and huge researchED conferences at weekends rely on speakers (often frontline staff themselves) speaking for free and teachers attending in their own time. Sometimes school staff can ask their schools to pay for tickets or travel (although I suspect most don’t), but attendance is on top of the time already spent on days of employer-directed CPD.
A considerable downside to too much employer-directed, and too little self-directed CPD, is that a market for a particular type of consultant has been created. Rather than concentrating on improving the effectiveness of frontline staff, these consultants concentrate on appealing to managers. Teachers find they are given training on how to help the school pass inspections and how to ensure that their response to bad behaviour doesn’t create work for those in charge, rather than being trained on how to teach or manage behaviour more effectively. They may even be employed simply to fill a gap in the schedule for an INSET day, or to give a motivational talk, rather than to provide meaningful professional development. This type of consultant then becomes another vested interest within the system, arguing against effective teaching methods and whole school behaviour systems.
And once you have consultants and educationalists earning a living without providing a benefit to frontline staff, they take an interest in capturing resources intended to serve the frontline. The marginalisation of the frontline is perhaps best illustrated by the way that, in recent years, new institutions have promised to change the balance of power only to replicate what already existed. Two recent examples of institutions funded by the DfE being created to serve the frontline and being captured by interests other than the frontline are:
The Education Endowment Fund. This was apparently intended to move control over education research from the ideologically motivated individuals in education academia. Michael Gove claimed it would “provide additional money for those teachers who develop innovative approaches to tackling disadvantage” and “it is teachers who are bidding for its support and establishing a new research base to inform education policy” [my emphasis]. In practice, it’s chief executive is an educationalist who has been involved in writing papers on how setting children into ability groups is “symbolic violence” based on the theories of Bourdieu. The EEF is now a law unto itself in the agendas it promotes. It recently squandered funds for research into the effectiveness of setting and mixed ability by failing to compare them directly and continues to share older research of doubtful provenance instead. And nobody can work out who, other than the opponents of phonics, wanted the EEF to spend money on the latest iteration of Reading Recovery.
The Chartered College of Teaching. This was created by government policy (and government funding) to be an independent teacher led professional body, “run by teachers, for teachers”. In practice, it is run largely by ex-teachers who already have or had positions of power in education; it is funded by employers, and it is now only too happy to campaign against government policy, even taking its lead from the trade unions. It now holds events in the day time when most teachers can’t leave school, promotes educational fads and censors teachers who dare question educationalists.
Another issue is how difficult it is for frontline staff to express opinions. Teachers have been reported to their employers for expressing opinions on social media. Those training to teach have been reported to their training institutions. Without being able to divulge the details of specific cases it’s hard to prove the trivial nature of such instances. But it doesn’t take long on teacher twitter to discover that whereas consultants and educationalists can heap online abuse on anyone they like, teachers find there are professional consequences for even disagreeing with fashionable opinions and very often those making the complaints are the same consultants and educationalists who have complete freedom of speech themselves.
Finally, the education system promotes and protects the beliefs and interests of those who make the job at the frontline more difficult. Some of this, like the consultants described earlier, appears to be about self-interest. We have organisations that provide training to schools campaigning for the government to ban internal exclusions, suspensions and expulsion, thus creating behaviour problems which require more training for staff. We have organisations that provide mental health services and advice to schools, running public campaigns claiming there is a youth mental health crisis that requires schools to spend more money on mental health services and advice.
To be charitable, it’s not all self-interest, sometimes it’s ideological. When the newly appointed head of Goldsmiths Education department indicates that her department’s programmes focus on “inclusion and social justice in educational settings”, she is no doubt sincere, but it is far from clear why money from the education budget should fund an organisation with such openly political priorities. Similarly, when The Children’s Commissioner joins an online campaign that demonises schools, she is no doubt sincere in her belief that the campaigners are right that schools are cruel and internal exclusion is unnecessary. But it’s far from clear why the government should be funding ideologically motivated attacks on things that are perfectly normal in schools.
Here are my suggestions for changing the system to empower the frontline.
Share this:
Like this:
Posted in Commentary | 2 Comments »