Tuesday, 1 September 2020

Dominating

I don't often talk about my own cricket exploits on this page for the reason that this is normally reserved for elite cricketers, and I was anything but. In fact, it would be a great stretch to describe me as average, in reality I was no where near good enough to be described as average.

However, there was one batsman who really couldn't face me. It was like I turned into Shane Warne when I saw him at the other end. His name was John, and he was a reasonable quality batsman. Not the best I ever played with or against, but probably in the top 5-10% of guys that I had been on the field with. And yet I had the wood on him.

We played in a little competition every week where we would select captains every and they would pick their team using school yard rules. We played on a proper field, with proper equipment and every match was scored, but it wasn't part of an official competition. There were guys playing with us who had played top level club cricket, and other who would not have made the 6th grade sides. I was never first picked, but generally I was picked fairly quickly after John, by the opposite captain. 

John would often either open or bat at 3 or 4. I was normally the 4th or 5th bowler used, so he was often batting when I was given the ball. He was seldom batting at the end of that over. I had one tactic to John. I would bowl him a top-spinner on leg stump. He would almost always try and hit it out of the park, and get caught doing so. He would then kick over the stumps, say a lot of words that would get him fined in international cricket and/or throw his bat in anger at getting out to me again.

I managed to pick up a hat trick that season. He was the 3rd wicket. I got him with a top-spinner on leg stump, caught at short fine leg.

He found this very, very frustrating. Everyone else found it hilarious. 

What happened that season was a perfect storm of a flaw in his technique being exposed by one thing that I could do, combined with the psychological effect for both of us based on the experiences that we had had against each other. I felt like every ball was a wicket, and he felt like every ball was a chance for him to prove that I didn't have the wood on him. 

Match-ups have been a popular concept in cricket analytics, particularly for the players in the past few years. They want to know how well they match up against different players. Who have they been dominating, and who has the wood on them.

I was working on looking at something else, and generated a list of head to head match-ups over the past few years, and it made me wonder which match-ups were the most one sided.

These are from T20 matches since the start of 2017. They are taken from most of the matches in internationals, IPL, BBL, PSL, CPL and Natwest Blast (I don't yet have ball-by-ball data for every match played) and only feature match-ups that are more than 20 deliveries.

I found that the middle third of averages in the match-ups were between 25 and 50 (with a lot of infinite values, where a particular bowler had not dismissed that batsman) and the middle third of the strike rates were between 105 and 144. Looking at players where both values were at the respective third gave only 7 results for each.

Here are the 7 for each.

Most dominant batting match ups

There are two names appear three times there. Chris Jordan and Aaron Finch. Ahmed Shezhad vs Samuel Badree only makes the list by one run/one ball, so is probably a dubious addition, but the rest seem to clearly be a case of a batsman having the wood over their rival.

Going the other way, Sunil Narine is clearly able to get on top of some batsmen, and the Shadab Khan vs Kieron Pollard match-up is one that the big West Indian won't be too happy with.

There were 4 match-ups that only just missed out on this list. Babar Azam vs Ish Sodhi: average 25, strike rate 125; Babar Azam vs Carlos Brathwaite: average 22.5 strike rate 112.5, MS Dhoni vs Chris Jordan: average 15, strike rate 120 and Ahmed Shezad vs Imran Tahir: average 22, strike rate 110.

If they had been included, then Jordan, Shezad and Dhoni would have all featured on both lists. 

Babar Azam's slow strike rate comes to the fore here. Of the 12 bowlers that he has faced 20 or more balls from in the past 3 years, he's score at less than 7 rpo (116.67 strike rate) off 6 of them. 

Here's how he compares to others:


His median scoring rate against the bowlers he's faced the most often is below the average for all match-ups, and those that are above the median, are mostly not much above.

There's a risk with looking at match-ups of making big conclusions from very small sets of data. The strike rate and average can both change quite dramatically with one wicket or one six. But just because it can be misleading does not mean that it isn't interesting. For some of these, there will be a real phenomenon behind it, and so it is interesting to look at them and see if those battles are real in future. 


Thursday, 27 August 2020

Greatest ever test bowler?

In cricket we tend to look at career statistics when deciding who is the best at something. For example: Don Bradman averaged 99.94, and so he's the greatest batsman. But there's something fundamentally flawed in this concept. It assumes that a player has a certain level of ability, and that that remains constant throughout their career. That assumption is patently ridiculous once it's broken down.

If Don Bradman had made a comeback in his 60's, and played 10 tests at an average of 40, it would have added to his reputation, but subtracted from his career average. It would not have changed how good he was either side of World War 2. 

Likewise, looking at something like total wickets fails to take into account the differences in scheduling. It is a fine way to compare two players who played for the same team, but across teams the schedules are just too different. In the last 15 years, England have played 190 tests, while New Zealand and Pakistan have played 118 and 119 respectively. Hence, if an English player had played roughly 80% of their team's tests in that time, and taken 3.2 wickets per test (based on Fidel Edwards career numbers) they would have taken about 480 test wickets. If a New Zealand or Pakistani player had done similarly, but taken wickets at roughly 4.7 wickets per test (ala Dale Steyn) they would have roughly 445 wickets. Taking more wickets is inevitable when you play more matches.

To counter that, people talk about comparing players at their peak. Who reached the highest in their career?

I decided to have a look at just that. This list is lacking context - it hasn't accounted for opposition or conditions, but I think it's more useful than looking at overall career statistics.


This is the list of bowlers based on their best 30 consecutive matches.

This is still not perfect. For some of these players, 30 tests was just over 2 years (eg Botham, McGrath, Anderson) while for others it was more than 5 years (Steyn, Lindwall, Briggs, Laker). It's likely to be able to maintain peak form for 2 years easier than over 5 years.

It made me wonder if 20 tests was a better measure.


The same problem is here too. It took 8 years for Bobby Peel to play 20 tests, while McGrath and Pollock both took less than 2 years to do it.

However, the name at the top remains the same. Imran Khan has a greater claim to being the best ever than I would have realised.

There are lots of problems with using this as a canonical vaule for the best ever. But I think it's better than career stats, and certainly adds something to the conversation.

Tuesday, 11 August 2020

Cleaning up the tail

I saw an interesting post on a Facebook cricket group recently, where a Pakistani fan said that they thought that Pakistan were the worst team at cleaning up the tail in world cricket. A bunch of Indian fans jumped in saying that India was, in fact, the worst. Then some English fans decided that England was actually the worst at cleaning up the tail. 

It led me to run a small poll, and I found that roughly 2/3 of respondents felt that their team was the worst at cleaning up the tail. Most who commented were adamant that not only was their team the worst at it, they were the worst by some margin.

There seemed to be a general cricket fan type one error. A type one error is an error of seeing a pattern that does not exist (or, more generally coming to an incorrect conclusion based on evidence that seems conclusive but is not). Perhaps this was caused by the fact that when we watch our team struggle to clean up the tail, it takes a long time, while a team cleaning up the tail efficiently does not take as long, so uses up less of our memory space. Or perhaps it is just because cricket teaches us to think negatively. Mark Richardson even wrote a whole book about the power of negative thinking in cricket.

That led me to a question. What team is actually the worst?

Tuesday, 4 August 2020

Changes in test performance

I had a go at using animation to visualise the changes in team's performances in tests over time.

I really enjoyed making this - I hope you enjoy watching it!


Tuesday, 28 July 2020

T20i batsmen charts updated 2017-2020

A while ago I put together some charts of batsmen's scoring rates and how they scored their runs in various formats.

I thought that it would be time to look at those again.

The two rates that I mention are as follows: scoring rate is the proportion of balls scored off and boundary rate is the proportion of balls hit for a 4 or 6.

In previous graphs I used activity rate (which gave a bonus for players who ran twos and threes more often). I decided against that this time, as most other analysts tend to use scoring rate, and I'm seeing some value in consistency.

I've also coloured by the balls per dismissal.

I've started with the overall results.


The issue with this is that there's different times in the innings call for different levels of risk.

So I've also broken it down based on the time. I had to put very low limits on the balls faced at the death to have enough batsmen to actually put together a chart, so there's considerable room for sampling error in the proportions with some of the samples as low as 78 balls.









Tuesday, 21 July 2020

When to declare?

One of the unique things about test cricket is the prospect of declarations.

While there's situations in motor racing where a driver might go slow in order to conserve fuel or in sailing where a racer might give up the lead in order to get in a more favourable position, there's not really any other sport where a team or individual can opt to stop scoring to ensure that they get a win.

The prospect of a draw encourages positive play from the team that is on top, and allows an out for teams who are losing.

Deciding when to declare also provides interesting talking points for fans and commentators alike, and the decisions are much easier in hindsight.

However, it's essentially a statistics problem. There are two variables, and a whole lot of historical data. The key variables are the overs left at the start of the innings and the target to win. 

While there are other issues (the "throw out a carrot" theory - if the target is close enough, teams will take more risks) and no two teams are the same, we can build a model based on that data and use that to predict the chances of winning, based on when a team declares.

I've built a very basic model, based on the 130 most recent matches where the target was under 400.

Given that data, the optimum declaration point changes based on the runs per over that the team scored.

In the end they set West Indies a target of 312 in 85 overs, which was too much, and the model suggested that it would be. It output the probabilities of 9.8% for a West Indies victory, 27.5% for a draw and 62.6% (all values rounded - which is why they don't add to 100%). The most likely outcome is what happened.

Part of the fact that England had such a good chance of success was that they scored so quickly. 92 runs in 11 overs gave them an excellent chance of success. However, a declaration a couple of overs earlier might have given them an even higher chance of success. They would have had a 64.4% chance of winning if they had declared a couple of overs earlier.

I put together a graph showing the impact of the scoring rate on the chance of winnings and the optimum time to declare. It's often said that strike rate isn't relevant in test matches. But this match proved the value of scoring quickly in tests.


Tuesday, 14 July 2020

Good years as an all rounder

Over the last three years, Jason Holder has produced some incredible numbers. His bowling stats are like something out of the 1800's, and he has also averaged over 40 with the bat at the same time.

It made me wonder about how well he fitted in compared to other all rounders from history.

Hot on the heels of my first ever animated graph last week, I have another one today, showing every player who had at least 5 three year periods where they were in the top 23% of run scorers and wicket takers, and where they had a batting average over 17. (That last condition was necessary due to a period where the batsmen were interchanged regularly in the 1880's, resulting in the top run scorers including some bowlers who averaged below 10 with the bat.)

Tuesday, 7 July 2020

Towards more useful metrics for test bowling - part 2 - defensiveness

In the last article I introduced a replacement for average - wickets per hundred runs. 

This is more intuitive than bowling average for three reasons. Firstly it means that the higher the number, the better the performance. 

Secondly, it's putting the most rare event (a wicket) as the numerator, rather than the denominator. In mathematical terms, changing the numerator has a smaller impact than changing the denominator, so it means that the numbers are less probe to massive swings after a few matches.

Thirdly it can allow a quick estimate of average over a time period by just looking at series averages.

I was challenged on that final point, so I ran some simulations with 2000 pretend bowlers, each with 25 series giving them series averages based on the distribution of averages from the last 600 completed series by bowlers. The graphs below shows the result of finding the mean of the series averages and then looking at basing it off the wickets per hundred runs. The top graph shows that the estimates were generally reasonable, although they tended to overstate a bowler's ability while the bottom graph shows that using the series average often got a result that did not really resemble the bowler's actual average, and was always worse.


The next statistic that I wanted to get was one that talked about the style of bowler that they were.

I tried a number of formulations to get this, but I had three main criteria: The result needed to be sensible. 

1. It had to show that Darren Gough, Waqar Younis and Malcolm Marshall were attacking bowlers, while the likes of Morne Morkel, Lance Gibbs and Ewen Chatfield needed to come out as defensive. 

2. It had to be able to be worked out with a cellphone calculator - nothing like finding eigenvalues, z-scaling or any complicated calculations with logs or exponents. One of the beauties of the bowling average is that a player with reasonable mental arithmetic skills can calculate a reasonable estimate of it in their head.

3. It had to separate the players based on style not ability. I wanted to find a metric that distinguished the approach of the bowler, not just how successful it was. That's impossible to do without using ball by ball data (and still very difficult to do with ball by ball data) but it is possible to get as close as possible.

The formulation that I ended up using was the following: Balls per run - wickets per hundred balls - 0.5.

I subtracted 0.5 because when I subtracted those two numbers I got a range (from the top wicket-takers) of -0.7 to 2.6, with a median at roughly 0.5. By subtracting 0.5 it meant that the normal player was at 0, and the number represented how far away from normal/average they were.

This stat really comes into it's own when plotted against wickets per hundred runs.


For this graph, I've coloured the points based on the strike rate (balls per wicket) with bright red being a strike rate over 100 and bright green being a strike rate under 50. That gives an indication using more familiar metrics. The colour gradient indicates how strike rate is a measure of both style (attacking or defensive) and effectiveness. 

An obvious distinction to look at is the type of bowler in terms of pace. Just using the basic separator of spin/pace gives this graph:


While there's a reasonable overlap, there's a clear difference. The spin bowlers tended to be more defensive and also be less effective in general, in terms of wickets per 100 runs.

That can be shown better on these box and whisker graphs:


(I've removed Sobers, Grieg and Johnston due to them bowling a mixture of pace and spin, and 3 points does not make a sensible box and whisker graph)

It's worth remembering here that these are only the 183 bowlers who have taken the most wickets since 1945. This only represents bowlers who were both good enough, durable enough and who played for teams that had enough matches scheduled to make it into this group. The overall numbers for all players will be different.

That issue feeds into the next set of graphs - looking at the player's eras. I've grouped the players by the decade that their middle year was in. So if a player played from 1999 to 2018, (ie Rangana Herath) they would be counted as a 2000s player, while Shane Warne (1992-2007) is counted as a 1990s player. This is not a perfect way of grouping players, but it gives a reasonable indication of their era.


There are roughly three times as many players in the chard from the 2010's as there are from the 1960's. This is more an indication of the greater proliferation of test matches played, rather than a comment on the ability of the players. Very few players from before 1970 played more than 40 matches, so for most players to make this list from that era they had to take about 3.5 wickets per test. There have been almost three times as many players play that number of tests since 2000, so a number of players have been able to make the list without being stand out bowlers in their generations. As an example Jonny Wardle played for 9 years, dominated almost every team he played against and yet ended up with fewer test wickets than Paul Harris who played for 4 years and never truly established himself in his role.

As a result, we would expect the groups from the earlier eras to have taken more wickets per 100 runs, because they were top 5%, as opposed to the top 12%. And that certainly shows for the 1950's, but it is not as evident for other decades.


The key difference is how much more attacking the bowling is (at least in terms of results - it could be argued that the batting is more reckless).

Basically the bowlers are getting similar figures, but they're getting them in fewer overs. This might be the impact of one day cricket (note the drop in the 1970's when ODIs started) but it's possible that the changes in pitch preparation techniques have also had a lot to do with it.

Finally I thought I'd have a go at producing an animated gif showing where all the players on this are.


There's a whole heap more that I'd love to explore with this, so keep tuned this time next week.