A Response to Ben Goldacre’s Building Evidence Into Education Report. Part 1

March 29, 2013

Earlier this month, Ben Goldacre’s government-backed report into the use of evidence in education was published. In this, and a subsequent post, I will respond to some of the key points and arguments.

Firstly, I will highlight the part that is to be welcomed. The suggestion is made that teaching could become an evidence based profession and that, as in medicine, evidence would aid informed decision-making. It is suggested that having expertise based on a grasp of the evidence would allow the profession to be more, rather than less, autonomous. It is suggested that teachers, by identifying the important questions from the frontline, could be the driving force in setting research priorities. There are practical suggestions, such as training teachers in research methods, finding ways to disseminate research findings and helping teachers to work with researchers.

This vision is a welcome change from the current dynamic between researchers and teachers. In my experience, the current situation is that teachers tend to encounter research in two unhelpful ways. Disastrously, dubious research is presented as the source of the latest initiatives and fads, as a reason to overrule professional judgement and embrace an idea suggested and endorsed by somebody who has either never taught full time or long since fled the classroom. In this model, research serves the interests and ideas of researchers and is used as a method of advocacy for, or an excuse for enforcement of, the latest fad. I can think of few bad ideas in teaching that weren’t, at some point, presented as the product of definitive research. Sometimes the research doesn’t really exist (e,g, Brain Gym). Sometimes the research itself is the worthless work of propagandists (e.g. Jo Boaler’s work on maths teaching). Sometimes the research is reputable but the interpretation is worthless (see countless forms of nonsense claiming to be based on Carol Dweck’s work). But the effect of this relationship has been to lower teacher autonomy and to make teachers instinctively sceptical of academics. Few teachers will change their ideas simply because of research, which is sometime an irony given that often our ideas as a profession simply reflect the work of some long since discredited researcher from a previous generation. A lot of the bad teaching methods enforced by OFSTED may have their origins in this kind of relationship between quack researchers and teachers.

The other relationship we see between teaching and research is that which broadly goes under the title of “action research”. Under this headline, some poor teacher who has foolishly decided to embark upon a masters degree in their spare time, is persuaded to carry out their own research project. Typically, this will be statistically worthless, involve lots of questionnaires and be considered worthwhile only if it shows interest in some current initiative or gimmick. While this may provide the teacher some insight into their own situation, it is unlikely to ever produce generalisable research results or anything more persuasive than the personal opinion of any other member of the teaching profession.

Overall, the “research architecture” suggested in the report is the most useful contribution to debate. The idea of a teaching profession setting the questions, and researchers investigating them, seems to be turning an upside down situation the right way round.

The less helpful discussion prompted by the report is that about Randomised Control Trials (RCTs) which are experiments conducted by applying different interventions to different people, selected at random and comparing the results. These have been particularly effective in medicine where they are used to evaluate new drugs and other interventions. In the debate over RCTs that has followed the report, I have tended to see from the anti-RCT side responses which either completely rule out evidence or RCTs, either arbitrarily, or for a reason related to a genuine difficulty but without any proper analysis of how great that difficulty is. From the pro-RCT side I have tended to see arguments which amount to little more than a confidence that problems that were overcome in medicine can be overcome here, that more trials can overcome the difficulties and the claim (without analysis) that the advantages will outweigh the costs. Goldacre acknowledges that qualitative research may be useful in explaining why certain interventions are effective (something I tend to doubt) but not that there are (quantitative) alternatives to RCTs that may prove more practical in many educational contexts.

I’m happy to accept most of the points in the report justifying RCTs as the best way to test an intervention. However, I feel that a lot of the debate in and around the paper on RCTs seems to ignore, or put off answering, some absolutely crucial questions about RCTs. These are mainly around which hypotheses are to be tested and what level of resources are to be devoted to testing them. I realise that it can be argued that these are debates for further down the road, that the first step is simply to accept the principle of RCTs, however, I think that if we fail to look at these questions first then we end up simply talking at cross-purposes for most of the discussion. What we test and how much we can spend testing it, shapes the both the usefulness and the ethics of RCTs.

In my next blogpost I will consider some of the questions that need to be considered in order to evaluate Goldacre’s case for increased use of RCTs.


  1. To do research to prove a drug is effective in curing a condition starts from the position that eradicating the condition from the patient is the desired goal and that any side effects are a price worth paying. That is easy to gain consensus on. In education we might find a particular course of actions improves exam performance and that any side-effects along the way are a price worth paying. Apart from the education case being rather less easy to control than the medical, there is another fundamental difference. Medical problem solving is more to do with fixing health issues not preventing those issues ever arising. Education is more like promoting health than fixing health problems. We know that exercise is good for health but how would we define whether distance running was more beneficial than strength training or vice versa? If you want to be an olympic gold medalist in one you certainly won’t be in the other. We have to define very precisely what matters and in education that is subject to constant political conflict. I think we should move to more evidence based practice but I doubt we are ever going to do this much beyond saying what might make perhaps marginal gains in exam performance and over focus on that could well be counter-productive in other areas.

  2. Ben Goldacre has been responding to this point a lot. His basic argument is that a) even in medicine you are aiming for something less general than “health” and b) there are some clearly defined aims for some interventions. However, I think he probably does under-estimate how contentious debate around the aims of education is.

  3. I find myself in complete agreement with you here Andrew, in spite of some of our previous disagreements about research. The misappropriation of Dweck is hugely problematic in education at the moment – the same thing happened to Black and Wiliam’s work on AfL. I’m also glad you point out the neuromyth of Brain Gym – Goswami’s work on uncovering neuromyths was interesting on this count too. I’m also sceptical about relying on RCT when dealing with the complex and emotive nature of learning.

  4. I look forward to the next part.

    The key to getting RCTs in education right is, I think, to choose interventions with the potential for large effect sizes, keep them simple, make them large enough and to follow through. Choose a simple, reproducible measure like, say, a well known measure of reading age. Measure also proxy variables for longer term effects, such as standardised enthusiasm questionnaires. Measure the longer term effects through other well known variables, GCSE grades or A level take-up to build understanding of the validity of proxies.

    It really does not matter that there is disagreement on the objectives of education. There always will be, so choose your own.

    I suggest trying interventions aimed at eliminating the summer born problem ( the German model, perhaps), or using early interventions to greatly reduce the percentage of School Action / School Action Plus (I know a primary that is giving this a go).

  5. RCTs are problematic even in medicine. If an intervention – e.g. a drug – is to be tested in an RCT, suppose for example that it benefts 25%, harms 25% to the same extent and leaves 50% unaffected. This cannot be detected in an RCT – unless the hidden factor which determines who is in which group is at least guessed at[1], and hence controlled for. An RCT, in other words, is not a good way at finding out the unknown, only a way of – hopefully – validating the known.

    The analogy to education – or anything else – is obvious.

    In an ideal world there would be a way of dividing the subjects into groups who benefitted and were harmed by the intervention, so that those who would be harmed would not be subjected to it.

    Sometimes what is needed, is an actual person, looking at an actual person, and making a judgment that “this patient (child) is not benefiting from this approach”. In medicine, we can often play both roles – observer and observed – “This treatment really doesn’t agree with me – have you got anything else?”. In education, the subject is a child, who by definition are people who don’t know what is best for themselves – so the person taking the judgement must be a parent or teacher.

    Perhaps these are just random thoughts of no use. But it seems to me that they distill down to some useful advice. 1.) No one size fits all. 2) The personal judgement of a person intimately familiar with the patient/pupil can often/usually beat any amount of statistics or theory. 3) If they don’t seem to be responding to that method then bloodly well try something else! 4) have the confidence to do 3. Really. Everyone will back you up apart from the spineless bullies.

    [1] It is now accepted for example that drugs can have different average effects in different racial groups, due to differences in certain genes. Without testing for those genes (which may exist merely in different proportions in different racial groups, and in any case the gene in question may be unknown) the best way to determine the effect may be to simply ask the patient whether they feel better for the intervention.

  6. I found this article by Englemann interesting, first because of its strong endorsement of using classroom observations and impatience with Hirsch’s preferences for out of classroom research and second because it criticises Hirsch and his ideas but from a different perspective. Not sure if I agree with Englemann’s criticisms.
    I use Englemann’s maths programs with my children and they are brilliant and based on observatioon and refinement. The subsequent research on their effectiveness is routinely ignored because his methods are ideologically unacceptable to the educational establishment. No RCT will change that.

  7. An article on RCTs in Education by Dr Richard Schutz printed, Jan 2012. in Columbia University Teaching Record is well worth reading:
    ‘The experiments,without exception, replicate a single finding: “No Impact.” Although the results are spun to put
    the best possible light on the initiatives/innovations being investigated, the studies consistently find that the
    instructional consequences sought were not obtained.
    In each of the studies, what is consistently seen is variability: variability within classes; variability between
    classes within schools; variabilitybetween schools within districts; variability between districts within states;
    and variability between states. What you see is what you get: variability. What you don’t see or get is any
    information about the instruction that the students actually received or how to make the instruction more
    effective and reliable.’
    14 welll-conducted RCTs were commented on. I don’t have the link to hand but will post shortly.
    I agree with Heather’s point –
    “The subsequent research on their effectiveness is routinely ignored because his methods are ideologically unacceptable to the educational establishment. No RCT will change that.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: