Sunday 14 November 2021

Win the toss win the match?

In watching this world cup, there have been 6 sides who have looked a step ahead of the others: Australia, England, India, New Zealand, Pakistan and South Africa. There have been 8 matches featuring two of those 6 sides.

Pakistan beat India. Pakistan beat New Zealand. New Zealand beat India. England beat Australia, Australia beat South Africa, South Africa beat England, New Zealand beat England and Australia beat Pakistan.

Interestingly, in 7 out of those 8 matches, the team who won the toss, won the match.

If the toss was independent, the chances of this happening are about 3.5% (9/256 for anyone who wants a more precise answer). That is very unlikely. But it is not so unlikely that it would be considered impossible.

If I throw 8 coins, every now and again it will come up with either 7 heads or 7 tails.

It made me wonder if the toss was a significant contributor to team's success, and if so, what can be done about it.

The first thing to do is to build a model to predict the outcome of matches. This is important, because England beating Papua New Guinea after winning the toss does not say much about the importance of the toss, because England would probably beat Papua New Guinea if they lost the toss too.

I decided to use logistic regression to build the model. I looked at every international match between 2 sets of competent (or semi-competent) side in the last 3 years. Looking back afterwards, I noticed that I had left out Nepal (who did deserve inclusion) but they were the only team that the ICC currently have ranked in the top 20 that I left out. I also included a few lower ranked sides in order to give a better picture of the difference between the teams near the bottom of the rankings for the T20 World Cup. So I also included the likes of Singapore, Kenya, USA and Malaysia.

I chose to use logistic regression because it has been helpful in the past for giving realistic probabilities of winning for limited overs matches. When I tested the model, it explained about 85% of the variation in results. When it said that a team had a 50% chance of winning, they generally won about 50% of the time. When it said that a team had a 70% chance of winning, they generally won about 70% of the time.

It is not perfect, but it is simple enough and close enough to tell us about the impact of the toss.

The factors that I used were the team, the opposition, and if the match was home, away or neutral.

The model suggested that the 6 teams that I listed above, along with Afghanistan were the 7 best teams. It probably overstated the strength of Afghanistan, due to them not playing at home at all, and so therefore missing out on the home advantage. Their players are so familiar with their adopted grounds that they have an advantage there that is not accurately reflected in the tag "neutral."

Once I had built the model, I could then make predictions about all the world cup matches.

In this graph, I have the modeled probability of winning on the x-axis and then the actual outcome on the y-axis. The green points are where a team has won the toss and the red ones are where they have lost the toss.

I have divided the data into 3 groups - expected loss, too close to say, and expected win. The numbers are the proportion of wins by the team that won the toss (green) or lost the toss (red).

We can see that the team that won the toss has won more than the team that lost the toss in each of the 3 regions.

This is fairly compelling that there is an advantage in winning the toss. But it is not nearly as dramatic as 7 out of 8 in the first sub-group that I looked at.

This made me wonder if there was some sort of accidental gerrymandering with the way that I selected the data. So I tried 5 groups instead of three.

This time I grouped them together, and looked at the expected number of wins against the actual number of wins.


This time I added in two parallel trend lines, and looked at the difference between them. The groups of teams who won the toss ended up winning about 1.5 more matches than the groups of teams who lost the toss.

This was interesting, but I was not sure what to read into this. So I decided to re-randomise the toss, to see what would have happened with an independent toss.

To do this I randomly assigned to each match one team as the designated toss winner. Then I redid the groups, and saw what the difference was. I wanted to know how rare a difference of 1.5 was.

It turned out to be more common than I would have expected. After 10000 trials, I found that roughly 10 % of the sets had a difference of more than 1.478, and roughly 10% had a difference of less than -1.478. For a re-randomisation, 10% is about the cut off where you say that it was likely or unlikely to have been caused by natural variation. 

This was a surprising result. I was expecting to find that there was clear evidence that winning the toss improved the chance of winning, but instead I found out that it might do, or it might be just natural variation.

There are two major errors in statistics: saying too much or saying too little. This situation looked like one that had the potential to put egg on my face no matter which way I went. There was not quite enough evidence to be very confident that the toss made a difference, but there was also enough evidence to be quite confident of that fact.

I wanted to try one more test before I decided that I didn't know what to say.

This time I picked 60 random innings from any match in the past 3 years. I applied the model to that innings, and then grouped the innings and found the difference between the two lines. I repeated that 1000 times.

This time I found more like what I was expecting.

Less than 1% of the randomly selected innings had the impact of the winning the toss as high as it has been in the world cup. Interestingly, the teams that lost the toss actually had a slight advantage (1.7% of the time losing the toss had an advantage of 1.478 or more matches)

This tells us two interesting things:

Winning the toss seems to have given teams an advantage in this world cup, and it does not normally give teams any advantage whatsoever.

I wondered if that was due to the dew factor. It can often get harder to bowl as the match wears on due to dew in the gulf states. But that does not seem to have been the difference. Winning the toss was roughly as much of an advantage in the daytime as it was in the nighttime.

The biggest single factor seemed to be Dubai International Stadium. Matches there seemed to be much more toss dependent than almost anywhere else.

And that is where the final is being held.

Given that, the toss is likely to give an advantage to whoever wins it. Or perhaps it will revert to type, and there will be no advantage.

Assuming that the toss should be factored in, my model has the following probabilities for the final:

If Australia win the toss: Australia 67%, NZ 33%.

If New Zealand win the toss: Australia 30%, NZ 70%

Neither team is 100% or 0% in either scenario, but there's clearly an advantage.

Now it will be up to the players to see if they can overcome it.