A Response to Ben Goldacre’s Building Evidence Into Education Report. Part 2March 30, 2013
Here I continue with my response to Ben Goldacre’s report on evidence in education.
The usefulness of RCTs (Randomised Control Trials) in education cannot be determined without confronting a large number of prior questions which have largely been avoided.
1) What type of debates are to be resolved by RCTs?
It would appear that despite the excellent job he did debunking Brain Gym, Goldacre has not realised that much, or even most, educational debate is about worthless nonsense that can already be shown to be wrong long before the RCT stage. He assumes that education is like medicine was in the 1970s, whereas it is probably more like medicine in the 1370s. He assumes that we have clear aims and sound theories which need to be refined with better empirical research to identify those situations where we have been misled. However, it would be fairer to say we are at war in education over our ultimate aims and over the underlying theories. We are not 1970s doctors needing information about the effectiveness of certain drugs, we are medieval doctors trying to find the correct balance of the four humours. Teachers (and educationalists) don’t agree on aims; they don’t agree on what existing evidence shows, and they don’t always have the time and resources to implement an idea even if it is proven to be effective (for instance, look at what most teachers agree is the most effective sort of marking and what marking most teachers can actually get done in a week).
What do we actually need to be finding out? Should we be testing ideas that are obviously wrong but popular (e.g. Brain Gym)? Should we be testing ideas that are not particularly plausible to teachers but are popular with educationalists or OFSTED inspectors (e.g. group work)? Should we be testing ideas that are supported by cognitive psychology but rarely applied, (e.g. use of tests to aid recall) or ideas that are common but psychologically implausible (e.g. discovery learning)? What constitutes success for an intervention? I wrote here about the kinds of goalpost-shifting arguments we have over teaching methods and RCTs do not provide an obvious end to that debate. Whether an RCT will give useful data or waste resources will depend on resolving these issues, issues which, as a profession, we seem to be getting nowhere with. Additionally, there are ethical issues. Goldacre suggests that these are not insurmountable but “requires everyone involved in education to recognise when it’s time to honestly say “we don’t know what’s best here””. Easily said, but how many debates are there in education where everyone, or even enough people to conduct research, actually say that?
2) What are the gains to accuracy from RCTs, relative to the costs?
A lot of the arguments for RCTs assume that they are justified by being more accurate then the alternatives and that this will apply in all cases. However, this needs to be considered alongside the difficulties which make RCTs in education practically difficult. Most interventions will happen at a whole class level, making it harder to isolate the effects. We will be unable to “blind” trials (i.e. ensure that those delivering an intervention don’t know whether they are doing so or instead just delivering a placebo). We don’t have the resources of the drug companies to fund trials. I think Goldacre is right to suggest that those who say RCTs in education cannot possibly work for reasons such as these have it wrong. But these reasons do mean that before we can begin we have to accept that the level of accuracy of any given RCT might not justify the cost. We cannot simply say that as RCTs are more accurate than non-randomised trials then we can safely ignore non-randomised research even if it is abundant and overwhelmingly pointing in one direction. We are not comparing the perfectly accurate with the utterly inaccurate; we are comparing differing degrees of accuracy. In the past I have heard Ben Goldacre declare that the effectiveness of phonics is, as yet, undecided because there have not been enough results from RCTs even though there have been hundreds of non-randomised studies. This argument might work if RCTs were perfect, and non-randomised studies were always useless. It is less convincing when we admit we are actually considering imperfect data, even when we look at RCTs. How much data can we afford to throw out in the hope that RCTs will provide a definitive answer at some point in the future? We can even take this point to the extreme of asking how much more accurate RCTs are compared with the opinions teachers form from experience. It is often assumed that, just as doctors were fooled by the placebo affect or the spread of probabilities, teachers are constantly mistaken about the effectiveness of their teaching. This has not actually been established, for all teachers making judgements in all ways. Most teachers would sooner seek advice from experienced peers than reserve judgement on all matters, waiting for RCTs that may never happen.
3) What research do we have the resources for?
Underlying the previous two questions is the issue of resources. If we can conduct enough RCTs it makes it easier to choose which questions to address. If we can make the RCTs large enough, or replicate them, then we can address questions of accuracy. There is money available for RCTs, but there will always need to be rationing because debates in education seem to expand continually, and even with unlimited funding there may still be a lack of trained researchers. We need to set priorities, which is what makes it so hard to proceed without resolving the previous two questions. We also need to have some idea of the potential benefits of RCTs. Too much of this discussion does seem to present RCTs as a magic bullet, where the benefits will definitely outweigh the costs, even if we use them indiscriminately. However, we constantly have to ask which applications of RCTs will provide the greatest benefit. Again, that requires asking the previous two questions. There is no point using RCTs to resolve questions where the answer is already available, or where teachers can find a reliable answer just from experience. There is little point in investing in RCTs in areas where the results from non-randomised trials are already convincing (e.g. phonics). And what if it turns out we can get reliable results from much cheaper methods, such as training teachers to better evaluate their own practice or from testing psychological theories in the lab?
4) Who are we trying to persuade here?
There’s no getting away from the fact that teachers have learned to be sceptical of education research. Often this has been for very good reason (as I explained in my previous blogpost). Sometimes it is purely out of stubbornness. Anyone following the debate over phonics will have noticed that there is a hardcore of educationalists and teachers who simply cannot be persuaded of the benefits of phonics by reason, evidence or even direct personal experience of its effectiveness. As well as those who are irrational about methods, I mentioned under the first question that there are also a variety of beliefs about aims. It’s all very well declaring that RCTs are the best evidence, but how often are disputes in education about the quality of evidence, and how often are they about ideology? There is no point in spending money on an RCT which will, whatever it shows, be ignored by all the people who most need to be persuaded. Does anyone think that even one phonics denialist will be persuaded if, on top of the hundreds of non-randomised studies showing the effectiveness of phonics, there were a few more RCTs? Perhaps this is what Goldacre means to address by talking about “culture change” in education, however, it does leave me wondering if a focus on RCTs is like hoping a particularly accurate globe will persuade flat-earthers. The time and resources spent on conducting RCTs might be better spent on persuading teachers to accept the scientific method, or the basics of cognitive psychology, rather than looking for an unidentified degree of improvement in the quality of empirical studies.
Having followed some of the debate that has happened since the report came out I may be over-emphasising the issue of RCTs here. I do accept that, where practical, they may be the best form of evidence. There are certain questions, such as those about expensive interventions affecting individual students rather than whole classes, where they are both practical and suited to the task. However, there are enough practical difficulties with making good use of RCTs more widely that I do worry that pushing for them may distract from the more important debate. The real argument in education is often over the principle of using evidence rather than the question of which type of evidence. Pushing for the best possible evidence in all circumstances may turn out to make things worse for teachers faced with pseudo-science. Yes, if we set the bar for quality of evidence too low then we will be told that all sorts of nonsense is backed by research, but if we set it too high we will be told there is no evidence supporting interventions that have been shown to work time and time again. Either position will surrender territory to those who are convinced that their ideology must be enforced on the profession, and that research is to be used only to suit their pre-determined agenda rather than to ascertain the truth.