h1

A Response to Ben Goldacre’s Building Evidence Into Education Report. Part 2

March 30, 2013

Here I continue with my response to Ben Goldacre’s report on evidence in education. 

The usefulness of RCTs (Randomised Control Trials) in education cannot be determined without confronting a large number of prior questions which have largely been avoided.

1) What type of debates are to be resolved by RCTs?

It would appear that despite the excellent job he did debunking Brain Gym, Goldacre has not realised that much, or even most, educational debate is about worthless nonsense that can already be shown to be wrong long before the RCT stage. He assumes that education is like medicine was in the 1970s, whereas it is probably more like medicine in the 1370s. He assumes that we have clear aims and sound theories which need to be refined with better empirical research to identify those situations where we have been misled. However, it would be fairer to say we are at war in education over our ultimate aims and over the underlying theories. We are not 1970s doctors needing information about the effectiveness of certain drugs, we are medieval doctors trying to find the correct balance of the four humours. Teachers (and educationalists) don’t agree on aims; they don’t agree on what existing evidence shows, and they don’t always have the time and resources to implement an idea even if it is proven to be effective (for instance, look at what most teachers agree is the most effective sort of marking and what marking most teachers can actually get done in a week).

What do we actually need to be finding out? Should we be testing ideas that are obviously wrong but popular (e.g. Brain Gym)? Should we be testing ideas that are not particularly plausible to teachers but are popular with educationalists or OFSTED inspectors (e.g. group work)? Should we be testing ideas that are supported by cognitive psychology but rarely applied, (e.g. use of tests to aid recall) or ideas that are common but psychologically implausible (e.g. discovery learning)? What constitutes success for an intervention? I wrote here about the kinds of goalpost-shifting arguments we have over teaching methods and RCTs do not provide an obvious end to that debate. Whether an RCT will give useful data or waste resources will depend on resolving these issues, issues which, as a profession, we seem to be getting nowhere with. Additionally, there are ethical issues. Goldacre suggests that these are not insurmountable but “requires everyone involved in education to recognise when it’s time to honestly say “we don’t know what’s best here””. Easily said, but how many debates are there in education where everyone, or even enough people to conduct research, actually say that?

2) What are the gains to accuracy from RCTs, relative to the costs?

A lot of the arguments for RCTs assume that they are justified by being more accurate then the alternatives and that this will apply in all cases. However, this needs to be considered alongside the difficulties which make RCTs in education practically difficult. Most interventions will happen at a whole class level, making it harder to isolate the effects. We will be unable to “blind” trials (i.e. ensure that those delivering an intervention don’t know whether they are doing so or instead just delivering a placebo). We don’t have the resources of the drug companies to fund trials. I think Goldacre is right to suggest that those who say RCTs in education cannot possibly work for reasons such as these have it wrong. But these reasons do mean that before we can begin we have to accept that the level of accuracy of any given RCT might not justify the cost. We cannot simply say that as RCTs are more accurate than non-randomised trials then we can safely ignore non-randomised research even if it is abundant and overwhelmingly pointing in one direction. We are not comparing the perfectly accurate with the utterly inaccurate; we are comparing differing degrees of accuracy. In the past I have heard Ben Goldacre declare that the effectiveness of phonics is, as yet, undecided because there have not been enough results from RCTs even though there have been hundreds of non-randomised studies. This argument might work if RCTs were perfect, and non-randomised studies were always useless. It is less convincing when we admit we are actually considering imperfect data, even when we look at RCTs. How much data can we afford to throw out in the hope that RCTs will provide a definitive answer at some point in the future? We can even take this point to the extreme of asking how much more accurate RCTs are compared with the opinions teachers form from experience. It is often assumed that, just as doctors were fooled by the placebo affect or the spread of probabilities, teachers are constantly mistaken about the effectiveness of their teaching. This has not actually been established, for all teachers making judgements in all ways. Most teachers would sooner seek advice from experienced peers than reserve judgement on all matters, waiting for RCTs that may never happen.

3) What research do we have the resources for?

Underlying the previous two questions is the issue of resources. If we can conduct enough RCTs it makes it easier to choose which questions to address. If we can make the RCTs large enough, or replicate them, then we can address questions of accuracy. There is money available for RCTs, but there will always need to be rationing because debates in education seem to expand continually, and even with unlimited funding there may still be a lack of trained researchers. We need to set priorities, which is what makes it so hard to proceed without resolving the previous two questions. We also need to have some idea of the potential benefits of RCTs. Too much of this discussion does seem to present RCTs as a magic bullet, where the benefits will definitely outweigh the costs, even if we use them indiscriminately. However, we constantly have to ask which applications of RCTs will provide the greatest benefit. Again, that requires asking the previous two questions. There is no point using RCTs to resolve questions where the answer is already available, or where teachers can find a reliable answer just from experience. There is little point in investing in RCTs in areas where the results from non-randomised trials are already convincing (e.g. phonics). And what if it turns out we can get reliable results from much cheaper methods, such as training teachers to better evaluate their own practice or from testing psychological theories in the lab?

4) Who are we trying to persuade here?

There’s no getting away from the fact that teachers have learned to be sceptical of education research. Often this has been for very good reason (as I explained in my previous blogpost). Sometimes it is purely out of stubbornness. Anyone following the debate over phonics will have noticed that there is a hardcore of educationalists and teachers who simply cannot be persuaded of the benefits of phonics by reason, evidence or even direct personal experience of its effectiveness. As well as those who are irrational about methods, I mentioned under the first question that there are also a variety of beliefs about aims. It’s all very well declaring that RCTs are the best evidence, but how often are disputes in education about the quality of evidence, and how often are they about ideology? There is no point in spending money on an RCT which will, whatever it shows, be ignored by all the people who most need to be persuaded. Does anyone think that even one phonics denialist will be persuaded if, on top of the hundreds of non-randomised studies showing the effectiveness of phonics, there were a few more RCTs? Perhaps this is what Goldacre means to address by talking about “culture change” in education, however, it does leave me wondering if a focus on RCTs is like hoping a particularly accurate globe will persuade flat-earthers. The time and resources spent on conducting RCTs might be better spent on persuading teachers to accept the scientific method, or the basics of cognitive psychology, rather than looking for an unidentified degree of improvement in the quality of empirical studies.

*

Having followed some of the debate that has happened since the report came out I may be over-emphasising the issue of RCTs here. I do accept that, where practical, they may be the best form of evidence. There are certain questions, such as those about expensive interventions affecting individual students rather than whole classes, where they are both practical and suited to the task. However, there are enough practical difficulties with making good use of RCTs more widely that I do worry that pushing for them may distract from the more important debate. The real argument in education is often over the principle of using evidence rather than the question of which type of evidence. Pushing for the best possible evidence in all circumstances may turn out to make things worse for teachers faced with pseudo-science. Yes, if we set the bar for quality of evidence too low then we will be told that all sorts of nonsense is backed by research, but if we set it too high we will be told there is no evidence supporting interventions that have been shown to work time and time again. Either position will surrender territory to those who are convinced that their ideology must be enforced on the profession, and that research is to be used only to suit their pre-determined agenda rather than to ascertain the truth.

11 comments

  1. I think you may be too pessimistic in costs. Since so many teachers and schools are already taking all the actions and measurements required, the only extra will often be a little organisation and analysis.


    • That comes down to what’s being tested. Like I said, expensive interventions affecting individual students would be fairly easy to test. Whole class interventions on the other hand, might take a lot of organisation.


      • Wouldn’t it be reasonably straightforward to test things such as “no hands up”, desks in rows versus islands, working in groups versus practising individually during lesson, single-sex versus mixed lessons? Just some ideas off the top of my head.

        Randomly allocate teachers or classes to a group, then have some controls which just carry on regardless, and wait for the results to roll in.

        You’re right that it’ll take a lot of organisation, because if you want statistically significant results you’ll want to include lots of classes, teachers and schools and to avoid selection bias in populating the groups. But that organisation would be done by the people running the trial, just as it is in medicine.

        For individual teachers, the workload isn’t necessarily changed at all: for example, if you suddenly find that The Powers That Be have assigned you some single-sex classes, you don’t need to do anything. In other cases, you would need to change, for example by adopting a “no hands up” rule, or by changing lessons to include/exclude group work or individual practice, but would that be an intolerable burden? (That sounds like a rhetorical question, but it’s not. I don’t know.)

        I suppose it’s an upheaval if you’re used to one method, and especially so if you’re opposed in principle to the method you’ve been asked to use. But that’s rather like the teachers who’re dubious about phonics who’re now effectively forced to use them. It’s useful data to know how reluctant practitioners fare. And if your trial only includes the excited evangelists of a method, that’ll skew your results and over-estimate the benefits.


        • It is going to be a huge imposition if you randomly allocate students to classes. It will be difficult (but not impossible) to find enough schools willing to alter their setting for you.

          If you accept that this is too tricky then you could do things at a whole class level, and randomly select classes for intervention and control. However, in this comment http://www.guardian.co.uk/education/2013/mar/18/teaching-research-michael-gove#comment-22073935 Goldacre suggests that a valid experiment done this way would involve 400 classes. That is still a big deal to organise.

          There’s also the possibility that schools already have classes organised in a way that would allow you to pick an intervention group and a control group without too much bias, making this extra effort to eliminate one source of bias rather unnecessary.

          Of course, none of this prevents RCTs, but it does limit how many you can do, making the issue of which questions you investigate and whether the advantages to accuracy over non-randomised methods are really worth it more important than ever.


  2. These posts are very good. I must admit to pulling my hair out over the piss poor state of education, and the piss poor state of the debate about education. Research in the area of education is, with a few notable exceptions, execrable, and would make any halfway serious scientist blush with embarrassment. Yes, it is hard to say anything in this area that is actually backed up by scientific evidence, but that doesn’t mean we should base our entire approach on whichever educational theorist (bullshitter) happens to be flavour of the month. One thing is very clear to me: at the moment, many state school students are being massively disadvantaged, in our supposedly meritocratic society, by the very poor education they receive to the age of 18 (but the primary schools seem fairly good, IMHO). Meanwhile the fee-paying schools are laughing themselves silly.


  3. I agree that there are very important strategic questions to answer before embarking on expensive research. The biggest question at the moment is ‘what is education for?’. Until this is answered, the direction of the required outcomes cannot be set, and the direction for the research cannot be defined. Research undertaken under these circumstances is often quickly rendered irrelevant by events/changes in the strategic direction.

    RCTs may be useful once the questions to be answered have been effectively defined.


    • This is an issue I have explored a lot. My answer is that education is for making children cleverer. When people then suggest that this is too vague because I haven’t defined what “clever” means, I would suggest that it is something that what it means to be clever is already defined by one’s culture and can easily be interpreted from that, and usually that will entail having certain forms of knowledge and being able to perform certain types of reasoning.

      However, I am the first to agree that, despite this being such an easy question to answer, it does tend to come up in debate a lot because when people are advocating dumbing-down then one of the obvious arguments is to bring in alternative aims for education either completely unrelated to the intellect, or based on an attempt to redefine the intellectual virtues as something vaguer and independent of knowledge (like “thinking skills” or “creativity”).


  4. You may be interested in this post from Dan Willingham, making very similar points to your own: http://www.danielwillingham.com/1/post/2013/03/a-new-push-for-science-in-education-in-britain.html


  5. […] my last effort to address this subject a few people have directed me to brief descriptions of RCTs in education, […]


  6. I am a Paediatric physio and am currently undertaking a study into a small group (Wave 2) motor skills intervention in mainstream junior schools. It is taking the form of pre, post and follow-up standardised measures.
    The results will be known early in 2014.
    School believe the programme works but it needs to be proved.

    The discussion around RCTs is interesting but with children really only applies in drugs because of the ethical issues.


  7. […] have more than once disagreed with simply trying to ape medical research too closely (for instance here and here). Regardless, the reason we should listen to the empirical evidence on phonics is not […]



Comments are closed.