Why is the EEF getting it so wrong about ability grouping?

March 3, 2018

For a while now, whenever the topic of setting and streaming has come up, people have referred me to the EEF toolkit, and particularly graphs like this one:

It shows that “ability grouping”, which to teachers in England is likely to be thought to refer to setting (i.e. grouping by previous test scores in a particular subject) or streaming (grouping by a measure of general ability or a combination of test scores across subjects) has a negative effect on student achievement. For other topics I’m interested in, the effect sizes found by the EEF, based on meta-analyses, correspond to similar work done by John Hattie for his book Visible Learning. Hattie found a positive effect size of 0.12 for ability grouping. The EEF found a negative effect size: -0.09. This puzzled me when I first saw it.

The EEF – The Education Endowment Foundation – was set up by Michael Gove with the intention of providing a more solid empirical basis for education. It has been generously funded to conduct Randomised Control Trials according to an agreed protocol. It is therefore often treated as a neutral source of information. This has been repeatedly cited to me as proof that setting does not work. One would have assumed that it would be subject to more checks than the work of a single researcher, such as Hattie. So how did the results end up so different?

These are the meta-analyses behind the figure:

Meta-analyses Effect size
Gutierrez, R., & Slavin, R. E., (1992)
-0.34 (mixed age attainment vs non-graded classes)
Kulik C-L.C & Kulik J.A. , (1982)
0.10 (on secondary pupils)
Kulik C-L.C & Kulik J.A. , (1984)
0.10 (on elementary/primary pupils)
Lou, Y., Abrami, P. C., Spence, J. C., Poulsen, C., Chambers, B., & d’Apollonia, S. , (1996)
-0.12 (on low attainers)
Puzio, K., & Colby, G. , (2010)
Slavin, R. E. , (1990)
-0.06 (on low attainers)
Indicative effect size (on low attainers) -0.09

I had previously noticed a couple of issues here.

Firstly, they have cherry picked data for low attainers, rather than an average effect size in two cases. While the effect on low attainers might be of particular concern, it is odd to simply include figures specifically for low attainers in calculating a single effect size. Also it has been claimed (and I’m not in a position to check whether this is correct) that figures for low attainers in this type of research include a systematic error. Secondly, Abrami et al (1996) meta-analysis was actually research into “within class” grouping, which is usually considered a form of mixed ability teaching. I don’t know how the EEF combines its effect sizes so I don’t know the effects of correcting for these two anomalies. It is noticeable that Hattie also includes within class grouping, so that alone can’t explain the differences. And, I think some of the positive effect sizes are also from within class grouping so this may cancel out.

But now I notice there’s a bigger anomaly. Hattie considers Gutierrez et al (1992) to give ability grouping a positive effect size of 0.34. The EEF says -0.34. This is the biggest effect size anyone found in either direction. The paper can be found here and it seems to agree with Hattie. The confusion may be that it is actually research into mixing year groups, and because year groups are called “grades” in the US, the ability grouped classes are called “ungraded”. However, the figure of 0.34 seems to be for ability grouping, not against.

The only effect size which Hattie and the EEF both agree is negative is Slavin (1990). In the abstract Slavin, an opponent of ability grouping, describes his results this way:

“Overall achievement effects were found to be essentially zero at all grade levels… Results were close to zero for students of all levels of prior performance.”

If that’s the best evidence the EEF found for negative effects of setting, then they have, intentionally or otherwise, misled us.

The EEF appears to have used a mix of irrelevant studies, and an incorrect figure, to get its assessment of ability grouping completely wrong. To be honest, the quality of the studies are generally so poor, and the practice of combining effect sizes so problematic, that I hadn’t been that bothered previously. Additionally, most of the effect sizes in the graph above, and in Hattie, assess interventions and come from research conducted by supporters of those interventions, meaning that any comparison with the effect size for ability grouping is not comparing like with like. I had generally concluded that this whole approach was hopeless.

However, it seemed likely that some good research was on its way. The EEF was doing some RCTs with researchers at UCL. And then this week, before any RCT results had come out, an incredibly poor journal article “The Symbolic Violence of Setting” attacking setting on the basis of minimal evidence, was published under the name of most of the UCL researchers. Greg Ashman discusses it here. It’s pretty much everything that gives educational research a poor name: worthless “qualitative” methods and a vehement ideological commitment to a particular conclusion.

This is a big problem for the EEF. If people are so against setting that they compare it with violence, then I don’t believe it can be ethical for them to conduct research into setting. From their perspective, honest and objective research that has a result that favours setting would actually encourage “symbolic violence”. Either they are willing to risk inflicting “symbolic violence” on children or they never planned to allow the results to go that way. Both alternatives would be completely unethical and researchers should not put themselves in a position where they have to make such a choice. This is all despite the claim: “The EEF is committed to maintaining independence and impartiality in all its work”.

The EEF and those funding it need to take a closer look at what it is doing. It was meant to help educators get evidence that is not tainted by the usual ideological nonsense of educational research. Here, either by error or bad intent, they seem to have encouraged exactly that.

As a final note, Dylan Wiliam recently claimed on Twitter that the problems with the research on setting might be even more fundamental:

This area remains one where there really is minimal good evidence, and I would recommend that teachers make their own decisions based on their experience.



  1. Your attack on the “an incredibly poor journal article “The Symbolic Violence of Setting” attacking setting on the basis of minimal evidence” is disingenuous. If people take the time to look at the paper mentioned it appears in a very prestigious journal with an Impact Factor: 1.214. This is the articel abstract:

    ‘Setting’ is a widespread practice in the UK, despite little evidence of its efficacy and substantial evidence of its detrimental impact on those allocated to the lowest sets. Taking a Bourdieusian approach, we propose that setting can be understood as a practice through which the social and cultural reproduction of dominant power relations is enacted within schools. Drawing on survey data from 12,178 Year 7 (age 11/12) students and discussion groups and individual interviews with 33 students, conducted as part of a wider project on secondary school grouping practices, we examine the views of students who experience setting, exploring the extent to which the legitimacy of the practice is accepted or challenged, focusing on students’ negative views about setting. Analyses show that privileged students (White, middle class) were most likely to be in top sets whereas working-class and Black students were more likely to be in bottom sets. Students in the lowest sets (and boys, Black students and those in receipt of free school meals) were the most likely to express negative views of setting and to question the legitimacy and ‘fairness’ of setting as a practice, whereas top-set students defended the legitimacy of setting and set allocations as ‘natural’ and ‘deserved’. This paper argues that setting is incompatible with social justice approaches to education and calls for the foregrounding of the views of those who are disadvantaged by the practice as a tool for challenging the doxa of setting.

    • I’m not sure what your point is.

      And it’s not from a prestigious journal, it’s from an education journal.

  2. I agree that Hattie and the EEF should be taken with a huge pinch of salt–especially when research is conducted by educators, who are far more likely to have a vested interest in the outcomes.

    There’s a huge issue between the way ability grouping plays out in the primary sector or in secondary. In primary, ability grouping teaching increases the need for differentiation, whereas in secondary, it decreases it. This may not be true in the small minority of very large primary schools or very small seconaries, but for statistical purposes I’d guess that this generalisation holds out.

    I’m now working on a primary maths fluency project, and we’re working on the assumption that the only differentiation should be by outcome–the major consideration is teacher workload, which can get pretty surreal by the end of KS2 if you have one group still struggling with fractions and another moving on to algebra.

  3. Thank you for this analysis. I was very surprised to see that in-class groupings were included as being the same as setting. Does that mean a mixed ability group is only mixed ability if all the children do the same work and there is no differentiation?

    There are many problems with finding out whether setting is effective or not from what I can see. Firstly unless students are set by ‘target grade’ (as per streaming), they will generally be attainment groupings – not quite the same thing.

    Secondly, the research is generally conducted around Maths and English and might not be relevant to other subjects.

    Thirdly it may well have been conducted in the UK at least whilst the tyranny of the C/D border focus ruled. As Dylan William indicated the allocation of teachers is important. The ‘best teachers’ might be allocated to the students who would make the most difference to the school’s standing in the performance tables. With those in the top sets with better teachers to encourage take up at A level and those best at crowd control or teaching their second subject (especially with Maths) left with the poorest attaining students with often the worst work ethic.

    Fourthly, many option subjects are taught in mixed ability classes at GCSE, setting is not a widespread practice as stated in the abstract. When people say – there is little evidence, it can mean exactly that – not much research has been conducted.

    Surely the problem must be about expectations. If despite being in one of the lowest sets, the school and the teacher set high expectations without indulging in ‘pity’, or worse abandoning students to a mini ‘Lord of the Flies’ existence, students can and do achieve.

  4. Reblogged this on The Echo Chamber.

  5. Thanks for pointing out the weakness of the research around ability grouping. It annoys me that an organisation such as the EEF does not appear to have the capability to rigourously evaluate research evidence. In my opinion the outcomes of ability grouping in schools could well be negative for lower ability students – if there isn’t an effective behaviour system/high standards of classroom behaviour and high quality teaching. The chances are a lower ability class will be harder to manage, with more off-task behaviour and less than helpful shared peer attitudes toward schooling. Combine this with low teacher expectations and it is not hard to envisage low outcomes. Mix in a proportion of higher ability students with better classroom behaviour, more positive attitudes and self efficacy and the culture of the class changes – easier to teach = better results. I have seen setting used effectively to help lower ability students with small class sizes with a highly effective teacher making a difference. I have also seen the positive effects of setting on a high ability students – no question they can achieve better without the off-task behaviour of their less able peers. In the end it may be the school culture and teacher expectations which determine whether setting is positive or negative.

  6. […] explored this in this blogpost and found that the toolkit referenced the following […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: