Statistical data and the Education Debate Part 3: Errors and Gold StandardsJune 18, 2013
I talked last time about how personal experience, or anecdote, cannot simply be dismissed as unreliable. Of course, there are exceptions. There are things where lay observation is very unreliable, for instance the effectiveness of a particular medicine, or assessment of the intelligence of one’s own children. But if something is easy to see, and you see it a lot, then it is not somehow unscientific to think it happens or even that it is common.
The misconception that we should dismiss people’s personal experiences is a common one. I have frequently heard politicians who listen to voters condemned for peddling “anecdotes” rather than listening to “evidence”. Sometimes it seems like the opinion of one academic about what people should have experienced, would be worth the unanimous testimony of 100 randomly selected members of the public about what they have experienced. If a politician is criticised for “ignoring the evidence” (or worse “ignoring the lack of evidence”) about an issue it is always worth wondering if they may have knocked on 100 doors recently and got a fair idea of what real people say they are experiencing.
One of the habits of mind that makes it so easy to dismiss real people’s experiences is that of believing in a gold standard. This is where it is imagined that evidence is either reliable (it meets the gold standard) or completely unreliable (it doesn’t) without actually looking into the probabilities that are involved. The dismissal of anecdotal or personal experience, without any consideration of likelihood, that I mentioned last time is one example of this. There are others though. Sometimes it is based on authority: the opinion of an education professor meets the gold standard, the opinion of a teacher doesn’t. Sometimes it is based on where research appears: a peer-reviewed journal (no matter how partisan) meets the gold standard, a think tank report or a newspaper article doesn’t. But often it is based on methodology, which brings us to RCTs – Randomised Control Trials.
After my last effort to address this subject a few people have directed me to brief descriptions of RCTs in education, and while I haven’t got those links to hand, I did notice a disturbing tendency for the “controls” to consist of nothing. A “control” in an experiment testing an intervention is an alternative intervention which is thought unlikely to have the relevant advantages of the intervention being tested. In medicine it is often a placebo, i.e. a medical procedure that it thought to be of no clinical advantage in itself. It could, alternatively, be the most effective intervention that currently exists, thereby providing a useful direct comparison.
So what exactly is the problem here? Well if an intervention is to be considered effective it should not be the case that almost any alternative intervention of similar cost would be equally, or more, effective. It should not be the case that an intervention which has no direct effect beyond making the subjects aware they are part of an experiment would have the same effect as the intervention being tested. Of course, it could be argued that as long as we use the same control in a lot of experiments, then we can use it as a baseline for similar interventions and account for any bias that way by only making comparisons between interventions. Or we could follow Hattie’s approach and assume that as education research overwhelmingly finds positive effects, we can ignore effects below a certain size. Results of these RCTs might be biased, but they do not necessarily invalidate the research. However, part of the argument for RCTs is that they are a gold standard. If some RCTs have a systematic inaccuracy then while it does not make them worthless, it does make it possible that a non-RCT which avoided the error in these RCTs might be superior in quality. Randomisation improves the accuracy of research, but so does an appropriate choice of control. I would argue, that we should attempt to measure the likely level of error caused by failing to randomise, or failing to choose an appropriate control, in order to evaluate research. Otherwise, consideration of RCTs as a gold standard might cause us to accept research because it happened to be an RCT even if it was of a lower quality than non-RCT research on the same topic. There is a strong argument that this is what happened here. If education research is to be considered reliable we need to start measuring errors, not simply classifying research as acceptable and unacceptable according to “gold standard” which accepts some shortcomings but not others.