h1

The EEF revisits ability grouping

September 29, 2018

Earlier this year I wrote a couple of posts about the Education Endowment Foundation’s summary of the research on ability grouping.

In a summary of the meta-analyses, they had claimed an average effect size of -0.09 for setting/streaming, which was at odds with Hattie’s claim of a positive effect size of 0.12 for ability grouping and a more recent analysis (Steenbergen-Hu et al, 2016) finding a positive effect size between 0.04 and 0.06.

It took two posts because there were so many odd practices and little errors that it was difficult to find out what had actually gone wrong. Two main ones that I identified were that:

  • Within-class ability grouping (i.e. something that would normally be considered a form of mixed ability teaching) was included because it was “ability grouping” even though the figure was being presented as a figure for “setting and streaming”.
  • A positive figure had been calculated originally based on all the meta-analyses, but so had a negative figure for low attaining students (based on just 2 meta-analyses) and at some point this figure had become used instead.

At the time, schools minister Nick Gibb shared my blogpost on Twitter, and as a result, I was condemned by a variety of educationalists for daring to disagree with the experts. After all, we all know that a mere teacher could not be right about a technical matter?

But the EEF did look at its figures again, and guess what they realised they had got wrong?

  • Within-class ability grouping (i.e. something that would normally be considered a form of mixed ability teaching) was included because it was “ability grouping” even though the figure was being presented as a figure for “setting and streaming”.
  • A positive figure had been calculated originally based on all the meta-analyses, but so had a negative figure for low attaining students (based on just 2 meta-analyses) and at some point this figure had become used instead.

They have now separated out the results for in class ability grouping, stopped using the figure for low attainers and reduced the “security rating” for their results. Obviously I am expecting those educationalists mentioned earlier to apologise immediately for dismissing me on a matter where I turned out to be right.

Unfortunately, there still seem to be issues with the new figure which is still negative and still way off the figures from Hattie and Steenbergen-Hu et al. The EEF result is based on the following meta-analyses:

Meta-analyses Effect size
Henderson, N. D., (1989)
-0.34
Kulik, J. A., & Kulik, C. C. , (1992)
0.03
Rui, N., (2009)
-0.09
Slavin, R. E., (1987)
0.00
Slavin, R. E., (1993)
-0.02
Slavin, R. E., (1990)
-0.06
Effect size (weighted mean) -0.08

So we have one study, Henderson (1989) in there that is an extreme outlier. It is an unpublished dissertation cited in Steenbergen-Hu et al, but hardly anywhere else. The only reference to it I could find elsewhere claimed that it it “found no achievement difference between students who had been ability-grouped and those who had been heterogeneously grouped”. Without this study, I suspect the overall effect size would be very close to 0.

Additionally, 2 studies by Kulik and Kulik that had been cited by Steenbergen-Hu et al and the previous EEF analysis that both found positive effect sizes of 0.1 have been removed because in both cases “The 2018 update revealed that this study was superseded by Kulik and Kulik 1992”. It is, of course, the case that more up to date meta-analyses by the same authors are to be preferred, but this sort of decision seems inconsistent with the approach of listing all meta-analyses and even more surprising when another author, (Slavin, R.E.) is allowed to have 3 studies considered without the newer ones superceding the older ones.

Perhaps I’m now just picking up on flaws with the entire “meta-meta-analysis” approach rather than just this review. Maybe it’s normal for really major differences to arise from highly contestable decisions and this will always be the case. And certainly I have a bias against mixed ability so I find myself looking far more closely at decisions to include negative studies or exclude positive ones than at the decisions to include positives ones and exclude negative ones. For instance, I also find myself wondering why Rui’s review of “detracking” (changing the academic program for different ability groups) is included, but Gutierrez, R., & Slavin, R. E. (1992)’s study of setting across age groups is excluded. Aren’t both studies in the same category of saying something about ability grouping but not exactly to do with setting and streaming as we use it in our schools? But while I am biased, might the author of the toolkit have a bias towards getting a result that was as close as possible to the previous flawed result? A published academic article might justify some of these decisions I’ve questioned, whereas a “toolkit” for teachers can’t be expected to. But if a peer-reviewed, published academic article is the standard required then Sternbergen-Hu et al (2016) is the only result we should listen to (and I’d still have questions about their use of Henderson (1989)).

At the very least, I think the priority of the EEF should be to sort out these issues, perhaps even conducting their own meta-analyses in a case like this, rather than leaving teachers to wonder whether they should listen to Hattie, Sternbergen et al or the EEF who between them have now come up with 3 completely different answers to the same question.

Advertisements

2 comments

  1. Three factors that should be taken in consideration when interpreting the results of these studies:

    1. Differentiation is the real issue–which is why in-class ability grouping will have a negative effect in that pupils spend most of their time working (or not working) with little or no guidance. And we know what KSC have to say about that.

    2. With setting and streaming, the least able pupils tend to get the least able teachers–and as one would expect, there is a tendency for the most able sets to prosper and and lowest ones to suffer. Gamoran has found individual studies in the US which demonstrate that performance of the lowest sets improves remarkably when they are taught by the same teachers that teach the top sets.

    3. The lowest sets inevitably have more than their share of the challenging pupils, and relatively few school have discipline policies that are effective enough to keep classroom climate from drifting into the bottom half of Terry Haydn’s 10-point scale.


  2. Teachers really need these detailed analyses. I don’t think you can trust Hattie on this either. For example, Hattie’s interpretation of the Kulik & Kulik (1992) study is highly questionable. For the category of accelerated students, Kulik & Kulik (1992) report 2 effect sizes of -0.02 and 0.87. The first effect size was calculated comparing accelerated kids with the kids 1 year ahead and the second, accelerated kids compared with the group of kids they came from.

    Hattie only reports the effect size was + 0.02, a clear mistake, it was -0.02.!

    But the larger error was he makes no mention of the 0.87.

    So these researchers have huge biases too!

    Details of Hattie analysis of these meta-analyses here-

    https://visablelearning.blogspot.com/p/ability-grouping.html



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: