Has new research on exclusions solved the problem of causation?

September 21, 2019

Earlier this week, Siobhan Benita, the Liberal Democrat candidate for London mayor made a speech where she declared that:

My “feel safe, be safe” plan for Londoners will give every young person a voice, activities and the security of good schooling… No child will be permanently excluded from mainstream schools.

Naturally plenty of teachers, ex-teachers and people whose kids go to mainstream schools were able to point out on social media that it might not be best to keep young people in schools if they are dangerous or out of control. But in the online discussions that followed, a number of people who work in education, but don’t teach in schools, seemed very convinced that there was great new evidence that exclusions were bad.

It turned out to be this conference paper by Bacher-Hicks et al which, while largely having the same glaring inadequacies as any other research in this area, did have a few innovations in its methodology. It found a link between suspensions from school and various negative outcomes. It is about suspensions rather than permanent exclusions. It is based in the US not the UK. It is based on a student population where 23% of students are suspended at least once per school year, and 19% go on to be arrested between the ages of 16 and 20. However, even given the general irrelevance of this research to the debate in England about permanent exclusion, it would still be interesting if it had solved a big problem with research about school discipline.

I’ve written before about how a big problem in reaching conclusions from data is being able to identify causation from correlation, i.e. being able to identify what is an effect and what is a cause where two statistics seem to be related.

The accepted method for researching causation is to use a Randomised Control Trial, where the proposed cause is assigned at random to some part of a sample, so that the effect can be isolated by comparing the “treated” and “untreated” parts of the sample. From a practical point of view it would be very easy to test policy on exclusions in this way, you could design an exclusion process where after the decision to exclude had been made it was only carried out after being confirmed by the toss of a coin. Unfortunately, while practically easy, it is ethically beyond the pale to apply punishments by chance. So there is no body of RCT based evidence on exclusions. This is not an isolated problem, similar ethical considerations also leave us with huge gaps in our knowledge of the effects of other sanctions used in schools, not to mention criminal justice policy and parental discipline.

Where there is no chance of an RCT, researchers tend to look at existing data, and try to draw conclusions from it. This is not a futile endeavour where there are clear reasons to limit which hypotheses you are testing. If it seems reasonable to think that falling out of an aeroplane without a parachute is a cause of death, but that being about to die is not a cause of falling out of aeroplanes, then looking at the death rate of people who have fallen out of planes could provide good evidence for that hypothesis. People do use correlation evidence in order to reach conclusions about causation quite often and quite reasonably in cases where there is only one reasonable hypothesis about causation that can be made. Unfortunately, this can trick us into thinking that deducing causation from correlation is a reliable “second best” method to be used in all cases where an RCT cannot be used. It isn’t. If there are multiple competing hypotheses about how causation works in a particular instance, then correlation evidence can be, not just less reliable than RCTs, but utterly useless.

This is the situation we have throughout research on sanctions and behaviour (where RCTs are ruled out). We usually wish to know whether particular sanctions improve behaviour. We wish to test the hypothesis that applying sanction X, improves behaviour. Unfortunately, because the sanction is a result of poor behaviour, there will always be a correlation between poor behaviour and sanctions or the poorly behaved and sanctions. We can pick groups to compare or we can vary the timescale, but almost every empirical claim that sanctions don’t work, or that harsher sanctions are less effective than lenient ones, runs into the problem that the punishments were a result of bad behaviour, therefore the punished were always more likely to behave badly than the unpunished. There may also be other variables that cause both bad behaviour and sanctions, which will allow for further hypotheses about causation.

The problems with causation don’t stop ideologues from making pronouncements (“non-custodial sentences are more effective than prison sentences” or “the best way to manage children’s behaviour by explaining to them why their actions are wrong”) that the evidence cannot actually support. However, we simply cannot tell whether correlations between stricter sanctions and worse behaviour are a result of behaviour being negatively affected by harsher sanctions, or sanctions being made harsher due to worse behaviour (except perhaps by applying common sense). The same problem exists where we try to find the the effects of sanctions (or systems of sanctions) on other outcomes. Do those outcomes result from the sanctions, or from the behaviour that resulted in the sanctions? The most egregious example of being unable to separate cause and effect we have seen recently has been over the supposed link between school exclusions and knife crime where some well-intentioned people seemed unable to consider the possibility that a propensity to violent criminal behaviour is a cause of exclusions, rather than the possibility that upstanding members of society are being excluded and stabbing people as a result.

There are various techniques that can be used to solve some problems involving correlation and causation. Sometimes the timing is a clue. Where young people are both permanently excluded and convicted of possessing a knife, it is far more common for a permanent exclusion to swiftly follow the conviction than for the conviction to swiftly follow the exclusion, suggesting that causation does not run from exclusion to involvement in knife crime. Sometimes alternative chains of causation can be eliminated by using multi-variate statistical methods (although this paper makes a pretty good argument that this doesn’t work as well as we think it does). Sometimes “natural experiments” occur, where just by luck, we have data that should resemble what we might expect from RCTs. Where people think that they have grounds for comparing the effects on two groups, without randomisation, research is often referred to as “quasi-experimental”.

And this brings us to the Bacher-Hicks et al paper mentioned earlier. A change in the boundaries of school districts allowed them to eliminate some variables that might confound efforts to identify causation involving suspensions. This enabled them to create a measure of schools’ willingness to suspend that controlled for student background, and to see what effect that had on outcomes for students who went to the school after the boundaries changed and, therefore, came from different backgrounds. Being able to control for some variables, allows the paper to improve on papers that couldn’t control for those variables. Unfortunately, the paper does not even begin to address the problem of controlling for behaviour. It describes schools with high conditional suspension rates (i.e. suspension rates after controlling for student background) as “strict” and concludes that strictness results in negative outcomes. What justification there is for this approach is limited. The point is made that where principals have changed schools, conditional suspension rates have also changed in ways that suggest leadership is important. However, this does not go very far to disprove the obvious hypothesis that high suspension rates may be a result of bad behaviour that also leads to the negative outcomes the researchers found. To conclude that suspensions, not bad behaviour, are the cause of the negative outcomes that are correlated to high conditional suspension rates, requires that one controls perfectly for bad behaviour. Anything less, can result in a correlation that does not prove causation.

One additional point I’m going to make is that research that assumes you can control for behaviour by attributing suspension rates entirely to a mix of pupil background and schools’ willingness to suspend, rather than, say, school culture, makes assumptions that I think few teachers will agree with. We will get better education research when researchers start getting better at listening to teachers when theorising about whether correlation indicates causation.


  1. The number of suspensions, at least where I am, is unrelated to how strict a school is.

    There is another sort of experiment — the one where a school changes over time but the intake doesn’t. The school I am currently at has become much more firm on discipline over the years. There are less fights, less bullying and less stealing. The classrooms are quieter and the behaviour better.

    Suspensions haven’t increased. I put that down to the students being much more certain where the boundaries are, even as those boundaries have become stricter. Small incidents are more likely to be addressed actively.

    But we still have suspensions and exclusions. There are always some students who simply will not stay within the boundaries, no matter where those boundaries are.

  2. Thinking about possible ways to look at causation v correlation which are more ethical, one possible way would be to look at two sets of schools, one set with a zero permanent exclusion policy, one set without (the second set made up to schools as similar as possible in the first set which I suspect would be the more limited group) and then see if there was a significant difference in the number of students involved in criminal activity.

    • The problem is with how you do that at random. If you pick apparently similar schools, but the policies have been deliberately chosen, the different policies may be a response to different levels of poor behaviour in the schools.

  3. […] Has new research on exclusions solved the problem of causation? […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: