Statistical data and the Education Debate Part 3: Errors and Gold Standards

June 18, 2013

I talked last time about how personal experience, or anecdote, cannot simply be dismissed as unreliable. Of course, there are exceptions. There are things where lay observation is very unreliable, for instance the effectiveness of a particular medicine, or assessment of the intelligence of one’s own children. But if something is easy to see, and you see it a lot, then it is not somehow unscientific to think it happens or even that it is common.

The misconception that we should dismiss people’s personal experiences is a common one. I have frequently heard politicians who listen to voters condemned for peddling “anecdotes” rather than listening to “evidence”. Sometimes it seems like the opinion of one academic about what people should have experienced, would be worth the unanimous testimony of 100 randomly selected members of the public about what they have experienced. If a politician is criticised for “ignoring the evidence” (or worse “ignoring the lack of evidence”) about an issue it is always worth wondering if they may have knocked on 100 doors recently and got a fair idea of what real people say they are experiencing.

One of the habits of mind that makes it so easy to dismiss real people’s experiences is that of believing in a gold standard. This is where it is imagined that evidence is either reliable (it meets the gold standard) or completely unreliable (it doesn’t) without actually looking into the probabilities that are involved. The dismissal of anecdotal or personal experience, without any consideration of likelihood, that I mentioned last time is one example of this. There are others though. Sometimes it is based on authority: the opinion of an education professor meets the gold standard, the opinion of a teacher doesn’t. Sometimes it is based on where research appears: a peer-reviewed journal (no matter how partisan) meets the gold standard, a think tank report or a newspaper article doesn’t. But often it is based on methodology, which brings us to RCTs – Randomised Control Trials.

After my last effort to address this subject a few people have directed me to brief descriptions of RCTs in education, and while I haven’t got those links to hand, I did notice a disturbing tendency for the “controls” to consist of nothing. A “control” in an experiment testing an intervention is an alternative intervention which is thought unlikely to have the relevant advantages of the intervention being tested. In medicine it is often a placebo, i.e. a medical procedure that it thought to be of no clinical advantage in itself. It could, alternatively, be the most effective intervention that currently exists, thereby providing a useful direct comparison.

So what exactly is the problem here? Well if an intervention is to be considered effective it should not be the case that almost any alternative intervention of similar cost would be equally, or more, effective. It should not be the case that an intervention which has no direct effect beyond making the subjects aware they are part of an experiment would have the same effect as the intervention being tested. Of course, it could be argued that as long as we use the same control in a lot of experiments, then we can use it as a baseline for similar interventions and account for any bias that way by only making comparisons between interventions. Or we could follow Hattie’s approach and assume that as education research overwhelmingly finds positive effects, we can ignore effects below a certain size. Results of these RCTs might be biased, but they do not necessarily invalidate the research. However, part of the argument for RCTs is that they are a gold standard. If some RCTs have a systematic inaccuracy then while it does not make them worthless, it does make it possible that a non-RCT which avoided the error in these RCTs might be superior in quality. Randomisation improves the accuracy of research, but so does an appropriate choice of control. I would argue, that we should attempt to measure the likely level of error caused by failing to randomise, or failing to choose an appropriate control, in order to evaluate research. Otherwise, consideration of RCTs as a gold standard might cause us to accept research because it happened to be an RCT even if it was of a lower quality than non-RCT research on the same topic. There is a strong argument that this is what happened here. If education research is to be considered reliable we need to start measuring errors, not simply classifying research as acceptable and unacceptable according to “gold standard” which accepts some shortcomings but not others.


  1. Reblogged this on The Echo Chamber.

  2. Agree about the ‘gold standard’ bit; the research design should be appropriate for the research question – an RCT might not be appropriate.

    But surely the idea of a control is to ensure that a single variable is isolated in the experimental condition.

  3. I’ve seen lots of teachers do lots of ‘research’ in schools. The ‘research’ invariably resulted in a bunch of teaching coming up with the exact conclusions they’d wanted from the beginning anyway. It was really just an exercise in confirming their own existing beliefs. I think teachers should trust in their own eyes more, in their own experience more. But let’s not claim something is ‘research’ when it’s a little bit of a vanity project or even a PR exercise.

  4. I always thought RCTs were virtually impossible in a social science field – there are simply too many variables.

  5. I agree with regards being cautious about RCTs. In fact I don’t think any methodology should be described as the gold standard. I also agree with chestnut that there are actually too many evident and often concealed variables for RCTs to be that reliable in social sciences although that is not saying they are of no use. There is also a useful post on the hype of so called ‘big data’ here that might be of interest http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html

  6. […] such as teachingbattleground.wordpress.com/ have written about academic research in school. I myself have blogged on my leadership of a […]

  7. […] than once disagreed with simply trying to ape medical research too closely (for instance here and here). Regardless, the reason we should listen to the empirical evidence on phonics is not because it is […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: