Wednesday 1 November 2023

NZ vs SA match preview

Within a few days of the All Blacks, playing mostly without their captain Cane, losing the rugby world cup final to the Springbox, the two country's cricket equivalents face off in a different world cup, but one where New Zealand are also going to have to pay without captain Kane.

In the 9 matches leading up to the rugby final, South Africa had lost only 2, once to New Zealand and once to the 2nd favourite European side in the tournament. In the 9 matches leading up to tonight's match, South Africa have lost only 2 matches, one to New Zealand and one to the second favourite European team. 

But the similarities end there. There are some contrasts. In rugby, New Zealand are a team that are dramatic to watch. Big plays and plenty of highlights. South Africa, however, are the masters of shutting down their opponents and pressuring them into making mistakes. In cricket the opposite is true. South Africa are the box office side, while New Zealand pressure their opposition into making mistakes. 

New Zealand went into the rugby world cup final $1.72 favourites (ie the bookies gave them a 55% chance of winning). South Africa go into this match as $1.72 favourites. 

As the tournament gets closer to the business end, the matches are starting to get closer. The Rugby final was decided by 1 point. Will the cricket match give us an equality tense finish?

New Zealand have won the toss and opted to bowl, bringing in Tim Southee. South Africa have also brought back their most experienced bowler with Rabada coming in for Shamsi.

Sunday 14 November 2021

Win the toss win the match?

In watching this world cup, there have been 6 sides who have looked a step ahead of the others: Australia, England, India, New Zealand, Pakistan and South Africa. There have been 8 matches featuring two of those 6 sides.

Pakistan beat India. Pakistan beat New Zealand. New Zealand beat India. England beat Australia, Australia beat South Africa, South Africa beat England, New Zealand beat England and Australia beat Pakistan.

Interestingly, in 7 out of those 8 matches, the team who won the toss, won the match.

If the toss was independent, the chances of this happening are about 3.5% (9/256 for anyone who wants a more precise answer). That is very unlikely. But it is not so unlikely that it would be considered impossible.

If I throw 8 coins, every now and again it will come up with either 7 heads or 7 tails.

It made me wonder if the toss was a significant contributor to team's success, and if so, what can be done about it.

The first thing to do is to build a model to predict the outcome of matches. This is important, because England beating Papua New Guinea after winning the toss does not say much about the importance of the toss, because England would probably beat Papua New Guinea if they lost the toss too.

I decided to use logistic regression to build the model. I looked at every international match between 2 sets of competent (or semi-competent) side in the last 3 years. Looking back afterwards, I noticed that I had left out Nepal (who did deserve inclusion) but they were the only team that the ICC currently have ranked in the top 20 that I left out. I also included a few lower ranked sides in order to give a better picture of the difference between the teams near the bottom of the rankings for the T20 World Cup. So I also included the likes of Singapore, Kenya, USA and Malaysia.

I chose to use logistic regression because it has been helpful in the past for giving realistic probabilities of winning for limited overs matches. When I tested the model, it explained about 85% of the variation in results. When it said that a team had a 50% chance of winning, they generally won about 50% of the time. When it said that a team had a 70% chance of winning, they generally won about 70% of the time.

It is not perfect, but it is simple enough and close enough to tell us about the impact of the toss.

The factors that I used were the team, the opposition, and if the match was home, away or neutral.

The model suggested that the 6 teams that I listed above, along with Afghanistan were the 7 best teams. It probably overstated the strength of Afghanistan, due to them not playing at home at all, and so therefore missing out on the home advantage. Their players are so familiar with their adopted grounds that they have an advantage there that is not accurately reflected in the tag "neutral."

Once I had built the model, I could then make predictions about all the world cup matches.

In this graph, I have the modeled probability of winning on the x-axis and then the actual outcome on the y-axis. The green points are where a team has won the toss and the red ones are where they have lost the toss.

I have divided the data into 3 groups - expected loss, too close to say, and expected win. The numbers are the proportion of wins by the team that won the toss (green) or lost the toss (red).

We can see that the team that won the toss has won more than the team that lost the toss in each of the 3 regions.

This is fairly compelling that there is an advantage in winning the toss. But it is not nearly as dramatic as 7 out of 8 in the first sub-group that I looked at.

This made me wonder if there was some sort of accidental gerrymandering with the way that I selected the data. So I tried 5 groups instead of three.

This time I grouped them together, and looked at the expected number of wins against the actual number of wins.

This time I added in two parallel trend lines, and looked at the difference between them. The groups of teams who won the toss ended up winning about 1.5 more matches than the groups of teams who lost the toss.

This was interesting, but I was not sure what to read into this. So I decided to re-randomise the toss, to see what would have happened with an independent toss.

To do this I randomly assigned to each match one team as the designated toss winner. Then I redid the groups, and saw what the difference was. I wanted to know how rare a difference of 1.5 was.

It turned out to be more common than I would have expected. After 10000 trials, I found that roughly 10 % of the sets had a difference of more than 1.478, and roughly 10% had a difference of less than -1.478. For a re-randomisation, 10% is about the cut off where you say that it was likely or unlikely to have been caused by natural variation. 

This was a surprising result. I was expecting to find that there was clear evidence that winning the toss improved the chance of winning, but instead I found out that it might do, or it might be just natural variation.

There are two major errors in statistics: saying too much or saying too little. This situation looked like one that had the potential to put egg on my face no matter which way I went. There was not quite enough evidence to be very confident that the toss made a difference, but there was also enough evidence to be quite confident of that fact.

I wanted to try one more test before I decided that I didn't know what to say.

This time I picked 60 random innings from any match in the past 3 years. I applied the model to that innings, and then grouped the innings and found the difference between the two lines. I repeated that 1000 times.

This time I found more like what I was expecting.

Less than 1% of the randomly selected innings had the impact of the winning the toss as high as it has been in the world cup. Interestingly, the teams that lost the toss actually had a slight advantage (1.7% of the time losing the toss had an advantage of 1.478 or more matches)

This tells us two interesting things:

Winning the toss seems to have given teams an advantage in this world cup, and it does not normally give teams any advantage whatsoever.

I wondered if that was due to the dew factor. It can often get harder to bowl as the match wears on due to dew in the gulf states. But that does not seem to have been the difference. Winning the toss was roughly as much of an advantage in the daytime as it was in the nighttime.

The biggest single factor seemed to be Dubai International Stadium. Matches there seemed to be much more toss dependent than almost anywhere else.

And that is where the final is being held.

Given that, the toss is likely to give an advantage to whoever wins it. Or perhaps it will revert to type, and there will be no advantage.

Assuming that the toss should be factored in, my model has the following probabilities for the final:

If Australia win the toss: Australia 67%, NZ 33%.

If New Zealand win the toss: Australia 30%, NZ 70%

Neither team is 100% or 0% in either scenario, but there's clearly an advantage.

Now it will be up to the players to see if they can overcome it.

Tuesday 13 October 2020

IPL at halfway

The match between Kolkata Knight Riders and Royal Challengers Bangalore was the 28th match, and represented the halfway point in this year's edition of the IPL. 

Every team has played every other team once, and there have been some interesting patterns emerge. I'm going to look at a few of those in this article.

The most obvious pattern is that teams have won more often by batting first than by chasing. This is somewhat unusual. In most T20 competitions the toss doesn't make much difference, and generally it's better to field first.

This pattern hasn't really been seen in other competitions, where the result has all been within the expected margin of error.

The most recent BBL was slightly biased towards batting first, and the CPL and PSL were similarly biased towards bowling first, but the difference in the IPL has been much more dramatic. If we were looking at throwing a coin multiple times, there would be a 12% chance of getting a result as extreme as the BBL and a 20% chance of getting a result as extreme as the CPL one.

The probability of getting as extreme a difference as this year's IPL from just randomness is less than 3%. Using technical language we can say that batting first makes a statistically significant difference to a team's probability of winning.

At the start of the tournament, pretty much every captain chose to field. Only one brave soul (David Warner) chose to bat first in the first 13 matches. However, since then teams have chosen to bowl first in all but 3 matches, and in 8 of the last 9. 

When teams have chosen to bat first, they've only won twice. That's a winning record of just 13.3%. When teams have chosen to field first, they've done much better - winning 7 out of 13. The best outcome at the toss seems to be to lose it, and have the opposition captain decide to field.

Breaking it down by location is interesting too. In Abu Dhabi, there is no clear advantage to batting first. At that ground the chasing team has won 4 out of the 10 matches. At the other two grounds, however, it is a different story. At Sharjah the chasing team has won only once out of 6, while at Dubai the chasing team has won twice out of 12. It's hard to know why this is, and it will be interesting to see if it continues throughout the tournament.

After about a quarter of the tournament was done, I noticed that there seemed to be a pattern that batsmen who turned the strike over quickly had a bigger impact on their team's chance of winning than players who scored extra boundaries, although I wondered if that was just a statistical anomaly from a small dataset.

It seems that it was just a product of a small dataset. Now that there's more data, it seems that neither activity rate nor boundary rate on their own from an individual batsman make a significant difference. They both seem to help - teams win more often when their batsmen score more boundaries and more run runs, but neither seems to explain enough to discount the other one anymore.

There is a noticeable difference in scoring rates of batsmen at the three grounds. They tend to score much quicker at Sharjah (probably due to the short boundary) than they do at the other grounds.

An interesting development is that recently teams have found it more difficult to turn over the strike at Dubai than at the other two grounds. 

The median strike rates for innings over 30 at each ground is:

This leads to a suggested good team score at each ground of 174,176 and 196 respectively.

Those are the team scores equivalent to an average set batsman batting the whole innings. I find that a good guide to reliable winning targets at grounds. They're possibly slightly high at the moment, due to the awful record of chasing sides, but that may change as the tournament progresses.

Looking at the bowling stats, I find it useful to group the attacks based on their styles.

This leads to this graph. It takes a while to understand, but the squares are all the pace bowlers combined and the triangles are all the spin bowlers combined.

It is possible to use these groupings to predict the success of the teams. The two key statistics to look at are the economy rate of spin bowlers and the strike rate of the pace bowlers.

Looking at these two statistics suggests that CSK are probably the team who have been underperforming the most with the bat, as their bowlers are doing a sufficient job to keep them in the matches.

Finally, I used the data to build a predictive model using logistic regression to assess how good the teams were. As every team has played each other exactly once, the basic model is fairly uninteresting, but the one where batting first is controlled for is much more interesting.

The difference in the coefficients of any two teams gives the log odds of the result for that match (and hence the probability can be calculated from it).

The batting advantage is added and subtracted from the teams. So for example if Mumbai bat first against RCB, they would have an expected value of 2.43 + 1.27, while RCB would have 2.04 -1.27. This means that for almost every match up, the team batting first would be favoured to win. The only exception is when Mumbai, Bangalore or Delhi are playing against Kings XI Punjab. There they would still be the favourites, even if batting second.

This model is only based on 28 matches, so is clearly not perfect. But it is an interesting guide to how well the teams have been playing, and I think somewhat informative.

Tuesday 1 September 2020


I don't often talk about my own cricket exploits on this page for the reason that this is normally reserved for elite cricketers, and I was anything but. In fact, it would be a great stretch to describe me as average, in reality I was no where near good enough to be described as average.

However, there was one batsman who really couldn't face me. It was like I turned into Shane Warne when I saw him at the other end. His name was John, and he was a reasonable quality batsman. Not the best I ever played with or against, but probably in the top 5-10% of guys that I had been on the field with. And yet I had the wood on him.

We played in a little competition every week where we would select captains every and they would pick their team using school yard rules. We played on a proper field, with proper equipment and every match was scored, but it wasn't part of an official competition. There were guys playing with us who had played top level club cricket, and other who would not have made the 6th grade sides. I was never first picked, but generally I was picked fairly quickly after John, by the opposite captain. 

John would often either open or bat at 3 or 4. I was normally the 4th or 5th bowler used, so he was often batting when I was given the ball. He was seldom batting at the end of that over. I had one tactic to John. I would bowl him a top-spinner on leg stump. He would almost always try and hit it out of the park, and get caught doing so. He would then kick over the stumps, say a lot of words that would get him fined in international cricket and/or throw his bat in anger at getting out to me again.

I managed to pick up a hat trick that season. He was the 3rd wicket. I got him with a top-spinner on leg stump, caught at short fine leg.

He found this very, very frustrating. Everyone else found it hilarious. 

What happened that season was a perfect storm of a flaw in his technique being exposed by one thing that I could do, combined with the psychological effect for both of us based on the experiences that we had had against each other. I felt like every ball was a wicket, and he felt like every ball was a chance for him to prove that I didn't have the wood on him. 

Match-ups have been a popular concept in cricket analytics, particularly for the players in the past few years. They want to know how well they match up against different players. Who have they been dominating, and who has the wood on them.

I was working on looking at something else, and generated a list of head to head match-ups over the past few years, and it made me wonder which match-ups were the most one sided.

These are from T20 matches since the start of 2017. They are taken from most of the matches in internationals, IPL, BBL, PSL, CPL and Natwest Blast (I don't yet have ball-by-ball data for every match played) and only feature match-ups that are more than 20 deliveries.

I found that the middle third of averages in the match-ups were between 25 and 50 (with a lot of infinite values, where a particular bowler had not dismissed that batsman) and the middle third of the strike rates were between 105 and 144. Looking at players where both values were at the respective third gave only 7 results for each.

Here are the 7 for each.

Most dominant batting match ups

There are two names appear three times there. Chris Jordan and Aaron Finch. Ahmed Shezhad vs Samuel Badree only makes the list by one run/one ball, so is probably a dubious addition, but the rest seem to clearly be a case of a batsman having the wood over their rival.

Going the other way, Sunil Narine is clearly able to get on top of some batsmen, and the Shadab Khan vs Kieron Pollard match-up is one that the big West Indian won't be too happy with.

There were 4 match-ups that only just missed out on this list. Babar Azam vs Ish Sodhi: average 25, strike rate 125; Babar Azam vs Carlos Brathwaite: average 22.5 strike rate 112.5, MS Dhoni vs Chris Jordan: average 15, strike rate 120 and Ahmed Shezad vs Imran Tahir: average 22, strike rate 110.

If they had been included, then Jordan, Shezad and Dhoni would have all featured on both lists. 

Babar Azam's slow strike rate comes to the fore here. Of the 12 bowlers that he has faced 20 or more balls from in the past 3 years, he's score at less than 7 rpo (116.67 strike rate) off 6 of them. 

Here's how he compares to others:

His median scoring rate against the bowlers he's faced the most often is below the average for all match-ups, and those that are above the median, are mostly not much above.

There's a risk with looking at match-ups of making big conclusions from very small sets of data. The strike rate and average can both change quite dramatically with one wicket or one six. But just because it can be misleading does not mean that it isn't interesting. For some of these, there will be a real phenomenon behind it, and so it is interesting to look at them and see if those battles are real in future. 

Thursday 27 August 2020

Greatest ever test bowler?

In cricket we tend to look at career statistics when deciding who is the best at something. For example: Don Bradman averaged 99.94, and so he's the greatest batsman. But there's something fundamentally flawed in this concept. It assumes that a player has a certain level of ability, and that that remains constant throughout their career. That assumption is patently ridiculous once it's broken down.

If Don Bradman had made a comeback in his 60's, and played 10 tests at an average of 40, it would have added to his reputation, but subtracted from his career average. It would not have changed how good he was either side of World War 2. 

Likewise, looking at something like total wickets fails to take into account the differences in scheduling. It is a fine way to compare two players who played for the same team, but across teams the schedules are just too different. In the last 15 years, England have played 190 tests, while New Zealand and Pakistan have played 118 and 119 respectively. Hence, if an English player had played roughly 80% of their team's tests in that time, and taken 3.2 wickets per test (based on Fidel Edwards career numbers) they would have taken about 480 test wickets. If a New Zealand or Pakistani player had done similarly, but taken wickets at roughly 4.7 wickets per test (ala Dale Steyn) they would have roughly 445 wickets. Taking more wickets is inevitable when you play more matches.

To counter that, people talk about comparing players at their peak. Who reached the highest in their career?

I decided to have a look at just that. This list is lacking context - it hasn't accounted for opposition or conditions, but I think it's more useful than looking at overall career statistics.

This is the list of bowlers based on their best 30 consecutive matches.

This is still not perfect. For some of these players, 30 tests was just over 2 years (eg Botham, McGrath, Anderson) while for others it was more than 5 years (Steyn, Lindwall, Briggs, Laker). It's likely to be able to maintain peak form for 2 years easier than over 5 years.

It made me wonder if 20 tests was a better measure.

The same problem is here too. It took 8 years for Bobby Peel to play 20 tests, while McGrath and Pollock both took less than 2 years to do it.

However, the name at the top remains the same. Imran Khan has a greater claim to being the best ever than I would have realised.

There are lots of problems with using this as a canonical vaule for the best ever. But I think it's better than career stats, and certainly adds something to the conversation.

Tuesday 11 August 2020

Cleaning up the tail

I saw an interesting post on a Facebook cricket group recently, where a Pakistani fan said that they thought that Pakistan were the worst team at cleaning up the tail in world cricket. A bunch of Indian fans jumped in saying that India was, in fact, the worst. Then some English fans decided that England was actually the worst at cleaning up the tail. 

It led me to run a small poll, and I found that roughly 2/3 of respondents felt that their team was the worst at cleaning up the tail. Most who commented were adamant that not only was their team the worst at it, they were the worst by some margin.

There seemed to be a general cricket fan type one error. A type one error is an error of seeing a pattern that does not exist (or, more generally coming to an incorrect conclusion based on evidence that seems conclusive but is not). Perhaps this was caused by the fact that when we watch our team struggle to clean up the tail, it takes a long time, while a team cleaning up the tail efficiently does not take as long, so uses up less of our memory space. Or perhaps it is just because cricket teaches us to think negatively. Mark Richardson even wrote a whole book about the power of negative thinking in cricket.

That led me to a question. What team is actually the worst?

Tuesday 4 August 2020

Changes in test performance

I had a go at using animation to visualise the changes in team's performances in tests over time.

I really enjoyed making this - I hope you enjoy watching it!