Tuesday, 11 August 2020

Cleaning up the tail

I saw an interesting post on a Facebook cricket group recently, where a Pakistani fan said that they thought that Pakistan were the worst team at cleaning up the tail in world cricket. A bunch of Indian fans jumped in saying that India was, in fact, the worst. Then some English fans decided that England was actually the worst at cleaning up the tail. 

It led me to run a small poll, and I found that roughly 2/3 of respondents felt that their team was the worst at cleaning up the tail. Most who commented were adamant that not only was their team the worst at it, they were the worst by some margin.

There seemed to be a general cricket fan type one error. A type one error is an error of seeing a pattern that does not exist (or, more generally coming to an incorrect conclusion based on evidence that seems conclusive but is not). Perhaps this was caused by the fact that when we watch our team struggle to clean up the tail, it takes a long time, while a team cleaning up the tail efficiently does not take as long, so uses up less of our memory space. Or perhaps it is just because cricket teaches us to think negatively. Mark Richardson even wrote a whole book about the power of negative thinking in cricket.

That led me to a question. What team is actually the worst?

I didn't just want to know the team that averaged the most against the tail. That might just tell us who the worst team was at bowling overall, rather than the worst team at cleaning up the tail.

I decided to compare teams average against the top 4 partnerships and the bottom 4 partnerships. That seemed to me to be a good gauge of what team was actually bad at cleaning up the tail as opposed to just bad at bowling.

As most statisticians do, I started out with a scatter plot, to see what the data looked like, and there seemed to be a fairly good linear pattern.

The sensible next step was to look at the residuals. This is the difference between the actual average, and what the model would predict the average would be for each country. In other words, how much worse or better against the tail than expected each country was.

That resulted in the following graph:

South Africa and Bangladesh were very good, Sri Lanka, Australia and Pakistan were quite good, England, India and West Indies were quite bad, and New Zealand and Zimbabwe were horrible.

But the difference of just a few runs per wicket seemed to be quite small. There seemed to be a reasonable chance that that was a product of natural variation, rather than any actual difference.

I decided to try breaking down over time, to see if there was any pattern. It became clear looking at this that countries tended to go through phases of being good or bad at cleaning up the tail, relative to the top order, but no team was either generally good or generally bad.

The reason that almost everyone thinks that their country is the worst at cleaning up the tail may just be because at one point in time they were, and that was the memory that has remained.

No comments:

Post a comment