Tuesday, 14 July 2020

Good years as an all rounder

Over the last three years, Jason Holder has produced some incredible numbers. His bowling stats are like something out of the 1800's, and he has also averaged over 40 with the bat at the same time.

It made me wonder about how well he fitted in compared to other all rounders from history.

Hot on the heels of my first ever animated graph last week, I have another one today, showing every player who had at least 5 three year periods where they were in the top 23% of run scorers and wicket takers, and where they had a batting average over 17. (That last condition was necessary due to a period where the batsmen were interchanged regularly in the 1880's, resulting in the top run scorers including some bowlers who averaged below 10 with the bat.)

Tuesday, 7 July 2020

Towards more useful metrics for test bowling - part 2 - defensiveness

In the last article I introduced a replacement for average - wickets per hundred runs. 

This is more intuitive than bowling average for three reasons. Firstly it means that the higher the number, the better the performance. 

Secondly, it's putting the most rare event (a wicket) as the numerator, rather than the denominator. In mathematical terms, changing the numerator has a smaller impact than changing the denominator, so it means that the numbers are less probe to massive swings after a few matches.

Thirdly it can allow a quick estimate of average over a time period by just looking at series averages.

I was challenged on that final point, so I ran some simulations with 2000 pretend bowlers, each with 25 series giving them series averages based on the distribution of averages from the last 600 completed series by bowlers. The graphs below shows the result of finding the mean of the series averages and then looking at basing it off the wickets per hundred runs. The top graph shows that the estimates were generally reasonable, although they tended to overstate a bowler's ability while the bottom graph shows that using the series average often got a result that did not really resemble the bowler's actual average, and was always worse.


The next statistic that I wanted to get was one that talked about the style of bowler that they were.

I tried a number of formulations to get this, but I had three main criteria: The result needed to be sensible. 

1. It had to show that Darren Gough, Waqar Younis and Malcolm Marshall were attacking bowlers, while the likes of Morne Morkel, Lance Gibbs and Ewen Chatfield needed to come out as defensive. 

2. It had to be able to be worked out with a cellphone calculator - nothing like finding eigenvalues, z-scaling or any complicated calculations with logs or exponents. One of the beauties of the bowling average is that a player with reasonable mental arithmetic skills can calculate a reasonable estimate of it in their head.

3. It had to separate the players based on style not ability. I wanted to find a metric that distinguished the approach of the bowler, not just how successful it was. That's impossible to do without using ball by ball data (and still very difficult to do with ball by ball data) but it is possible to get as close as possible.

The formulation that I ended up using was the following: Balls per run - wickets per hundred balls - 0.5.

I subtracted 0.5 because when I subtracted those two numbers I got a range (from the top wicket-takers) of -0.7 to 2.6, with a median at roughly 0.5. By subtracting 0.5 it meant that the normal player was at 0, and the number represented how far away from normal/average they were.

This stat really comes into it's own when plotted against wickets per hundred runs.


For this graph, I've coloured the points based on the strike rate (balls per wicket) with bright red being a strike rate over 100 and bright green being a strike rate under 50. That gives an indication using more familiar metrics. The colour gradient indicates how strike rate is a measure of both style (attacking or defensive) and effectiveness. 

An obvious distinction to look at is the type of bowler in terms of pace. Just using the basic separator of spin/pace gives this graph:


While there's a reasonable overlap, there's a clear difference. The spin bowlers tended to be more defensive and also be less effective in general, in terms of wickets per 100 runs.

That can be shown better on these box and whisker graphs:


(I've removed Sobers, Grieg and Johnston due to them bowling a mixture of pace and spin, and 3 points does not make a sensible box and whisker graph)

It's worth remembering here that these are only the 183 bowlers who have taken the most wickets since 1945. This only represents bowlers who were both good enough, durable enough and who played for teams that had enough matches scheduled to make it into this group. The overall numbers for all players will be different.

That issue feeds into the next set of graphs - looking at the player's eras. I've grouped the players by the decade that their middle year was in. So if a player played from 1999 to 2018, (ie Rangana Herath) they would be counted as a 2000s player, while Shane Warne (1992-2007) is counted as a 1990s player. This is not a perfect way of grouping players, but it gives a reasonable indication of their era.


There are roughly three times as many players in the chard from the 2010's as there are from the 1960's. This is more an indication of the greater proliferation of test matches played, rather than a comment on the ability of the players. Very few players from before 1970 played more than 40 matches, so for most players to make this list from that era they had to take about 3.5 wickets per test. There have been almost three times as many players play that number of tests since 2000, so a number of players have been able to make the list without being stand out bowlers in their generations. As an example Jonny Wardle played for 9 years, dominated almost every team he played against and yet ended up with fewer test wickets than Paul Harris who played for 4 years and never truly established himself in his role.

As a result, we would expect the groups from the earlier eras to have taken more wickets per 100 runs, because they were top 5%, as opposed to the top 12%. And that certainly shows for the 1950's, but it is not as evident for other decades.


The key difference is how much more attacking the bowling is (at least in terms of results - it could be argued that the batting is more reckless).

Basically the bowlers are getting similar figures, but they're getting them in fewer overs. This might be the impact of one day cricket (note the drop in the 1970's when ODIs started) but it's possible that the changes in pitch preparation techniques have also had a lot to do with it.

Finally I thought I'd have a go at producing an animated gif showing where all the players on this are.


There's a whole heap more that I'd love to explore with this, so keep tuned this time next week.