CricketGeek: Toss

Showing posts with label Toss. Show all posts

Sunday, 14 November 2021

Win the toss win the match?

In watching this world cup, there have been 6 sides who have looked a step ahead of the others: Australia, England, India, New Zealand, Pakistan and South Africa. There have been 8 matches featuring two of those 6 sides.

Pakistan beat India. Pakistan beat New Zealand. New Zealand beat India. England beat Australia, Australia beat South Africa, South Africa beat England, New Zealand beat England and Australia beat Pakistan.

Interestingly, in 7 out of those 8 matches, the team who won the toss, won the match.

If the toss was independent, the chances of this happening are about 3.5% (9/256 for anyone who wants a more precise answer). That is very unlikely. But it is not so unlikely that it would be considered impossible.

If I throw 8 coins, every now and again it will come up with either 7 heads or 7 tails.

It made me wonder if the toss was a significant contributor to team's success, and if so, what can be done about it.

The first thing to do is to build a model to predict the outcome of matches. This is important, because England beating Papua New Guinea after winning the toss does not say much about the importance of the toss, because England would probably beat Papua New Guinea if they lost the toss too.

I decided to use logistic regression to build the model. I looked at every international match between 2 sets of competent (or semi-competent) side in the last 3 years. Looking back afterwards, I noticed that I had left out Nepal (who did deserve inclusion) but they were the only team that the ICC currently have ranked in the top 20 that I left out. I also included a few lower ranked sides in order to give a better picture of the difference between the teams near the bottom of the rankings for the T20 World Cup. So I also included the likes of Singapore, Kenya, USA and Malaysia.

I chose to use logistic regression because it has been helpful in the past for giving realistic probabilities of winning for limited overs matches. When I tested the model, it explained about 85% of the variation in results. When it said that a team had a 50% chance of winning, they generally won about 50% of the time. When it said that a team had a 70% chance of winning, they generally won about 70% of the time.

It is not perfect, but it is simple enough and close enough to tell us about the impact of the toss.

The factors that I used were the team, the opposition, and if the match was home, away or neutral.

The model suggested that the 6 teams that I listed above, along with Afghanistan were the 7 best teams. It probably overstated the strength of Afghanistan, due to them not playing at home at all, and so therefore missing out on the home advantage. Their players are so familiar with their adopted grounds that they have an advantage there that is not accurately reflected in the tag "neutral."

Once I had built the model, I could then make predictions about all the world cup matches.

In this graph, I have the modeled probability of winning on the x-axis and then the actual outcome on the y-axis. The green points are where a team has won the toss and the red ones are where they have lost the toss.

I have divided the data into 3 groups - expected loss, too close to say, and expected win. The numbers are the proportion of wins by the team that won the toss (green) or lost the toss (red).

We can see that the team that won the toss has won more than the team that lost the toss in each of the 3 regions.

This is fairly compelling that there is an advantage in winning the toss. But it is not nearly as dramatic as 7 out of 8 in the first sub-group that I looked at.

This made me wonder if there was some sort of accidental gerrymandering with the way that I selected the data. So I tried 5 groups instead of three.

This time I grouped them together, and looked at the expected number of wins against the actual number of wins.

This time I added in two parallel trend lines, and looked at the difference between them. The groups of teams who won the toss ended up winning about 1.5 more matches than the groups of teams who lost the toss.

This was interesting, but I was not sure what to read into this. So I decided to re-randomise the toss, to see what would have happened with an independent toss.

To do this I randomly assigned to each match one team as the designated toss winner. Then I redid the groups, and saw what the difference was. I wanted to know how rare a difference of 1.5 was.

It turned out to be more common than I would have expected. After 10000 trials, I found that roughly 10 % of the sets had a difference of more than 1.478, and roughly 10% had a difference of less than -1.478. For a re-randomisation, 10% is about the cut off where you say that it was likely or unlikely to have been caused by natural variation.

This was a surprising result. I was expecting to find that there was clear evidence that winning the toss improved the chance of winning, but instead I found out that it might do, or it might be just natural variation.

There are two major errors in statistics: saying too much or saying too little. This situation looked like one that had the potential to put egg on my face no matter which way I went. There was not quite enough evidence to be very confident that the toss made a difference, but there was also enough evidence to be quite confident of that fact.

I wanted to try one more test before I decided that I didn't know what to say.

This time I picked 60 random innings from any match in the past 3 years. I applied the model to that innings, and then grouped the innings and found the difference between the two lines. I repeated that 1000 times.

This time I found more like what I was expecting.

Less than 1% of the randomly selected innings had the impact of the winning the toss as high as it has been in the world cup. Interestingly, the teams that lost the toss actually had a slight advantage (1.7% of the time losing the toss had an advantage of 1.478 or more matches)

This tells us two interesting things:

Winning the toss seems to have given teams an advantage in this world cup, and it does not normally give teams any advantage whatsoever.

I wondered if that was due to the dew factor. It can often get harder to bowl as the match wears on due to dew in the gulf states. But that does not seem to have been the difference. Winning the toss was roughly as much of an advantage in the daytime as it was in the nighttime.

The biggest single factor seemed to be Dubai International Stadium. Matches there seemed to be much more toss dependent than almost anywhere else.

And that is where the final is being held.

Given that, the toss is likely to give an advantage to whoever wins it. Or perhaps it will revert to type, and there will be no advantage.

Assuming that the toss should be factored in, my model has the following probabilities for the final:

If Australia win the toss: Australia 67%, NZ 33%.

If New Zealand win the toss: Australia 30%, NZ 70%

Neither team is 100% or 0% in either scenario, but there's clearly an advantage.

Now it will be up to the players to see if they can overcome it.

Tuesday, 13 October 2020

IPL at halfway

The match between Kolkata Knight Riders and Royal Challengers Bangalore was the 28th match, and represented the halfway point in this year's edition of the IPL.

Every team has played every other team once, and there have been some interesting patterns emerge. I'm going to look at a few of those in this article.

The most obvious pattern is that teams have won more often by batting first than by chasing. This is somewhat unusual. In most T20 competitions the toss doesn't make much difference, and generally it's better to field first.

This pattern hasn't really been seen in other competitions, where the result has all been within the expected margin of error.

The most recent BBL was slightly biased towards batting first, and the CPL and PSL were similarly biased towards bowling first, but the difference in the IPL has been much more dramatic. If we were looking at throwing a coin multiple times, there would be a 12% chance of getting a result as extreme as the BBL and a 20% chance of getting a result as extreme as the CPL one.

The probability of getting as extreme a difference as this year's IPL from just randomness is less than 3%. Using technical language we can say that batting first makes a statistically significant difference to a team's probability of winning.

At the start of the tournament, pretty much every captain chose to field. Only one brave soul (David Warner) chose to bat first in the first 13 matches. However, since then teams have chosen to bowl first in all but 3 matches, and in 8 of the last 9.

When teams have chosen to bat first, they've only won twice. That's a winning record of just 13.3%. When teams have chosen to field first, they've done much better - winning 7 out of 13. The best outcome at the toss seems to be to lose it, and have the opposition captain decide to field.

Breaking it down by location is interesting too. In Abu Dhabi, there is no clear advantage to batting first. At that ground the chasing team has won 4 out of the 10 matches. At the other two grounds, however, it is a different story. At Sharjah the chasing team has won only once out of 6, while at Dubai the chasing team has won twice out of 12. It's hard to know why this is, and it will be interesting to see if it continues throughout the tournament.

After about a quarter of the tournament was done, I noticed that there seemed to be a pattern that batsmen who turned the strike over quickly had a bigger impact on their team's chance of winning than players who scored extra boundaries, although I wondered if that was just a statistical anomaly from a small dataset.

It seems that it was just a product of a small dataset. Now that there's more data, it seems that neither activity rate nor boundary rate on their own from an individual batsman make a significant difference. They both seem to help - teams win more often when their batsmen score more boundaries and more run runs, but neither seems to explain enough to discount the other one anymore.

There is a noticeable difference in scoring rates of batsmen at the three grounds. They tend to score much quicker at Sharjah (probably due to the short boundary) than they do at the other grounds.

An interesting development is that recently teams have found it more difficult to turn over the strike at Dubai than at the other two grounds.

The median strike rates for innings over 30 at each ground is:

This leads to a suggested good team score at each ground of 174,176 and 196 respectively.

Those are the team scores equivalent to an average set batsman batting the whole innings. I find that a good guide to reliable winning targets at grounds. They're possibly slightly high at the moment, due to the awful record of chasing sides, but that may change as the tournament progresses.

Looking at the bowling stats, I find it useful to group the attacks based on their styles.

This leads to this graph. It takes a while to understand, but the squares are all the pace bowlers combined and the triangles are all the spin bowlers combined.

It is possible to use these groupings to predict the success of the teams. The two key statistics to look at are the economy rate of spin bowlers and the strike rate of the pace bowlers.

Looking at these two statistics suggests that CSK are probably the team who have been underperforming the most with the bat, as their bowlers are doing a sufficient job to keep them in the matches.

Finally, I used the data to build a predictive model using logistic regression to assess how good the teams were. As every team has played each other exactly once, the basic model is fairly uninteresting, but the one where batting first is controlled for is much more interesting.

The difference in the coefficients of any two teams gives the log odds of the result for that match (and hence the probability can be calculated from it).

The batting advantage is added and subtracted from the teams. So for example if Mumbai bat first against RCB, they would have an expected value of 2.43 + 1.27, while RCB would have 2.04 -1.27. This means that for almost every match up, the team batting first would be favoured to win. The only exception is when Mumbai, Bangalore or Delhi are playing against Kings XI Punjab. There they would still be the favourites, even if batting second.

This model is only based on 28 matches, so is clearly not perfect. But it is an interesting guide to how well the teams have been playing, and I think somewhat informative.

Wednesday, 8 March 2017

Win the toss - bowl

Everyone bowls first if they win the toss in a test in New Zealand, but should they?

It seems to be established wisdom that bowling first in New Zealand is the right thing to do, but I’m interested to see if the numbers play that out. The first thing that people would want to look at is the results. I’ve first of all limited to the last 5 years, because that’s the period in which teams have always chosen to bowl first.

	Won	Lost	draw	Total
Bowl first	8	6	7	21
Probability	0.381	0.286	0.333	1
CI	0.173-0.589	0.093-0.479	0.134-0.535

While the team bowling first has won 33% more often than the team, that’s only 2 out of 21 matches difference. We know that if we had a perfectly fair three sided coin (hard to imagine, but go with me) and we flipped it 21 times, it would actually be very unlikely for it to land exactly 7 times on each side. (Just under 4% probability). Given the data that we have, and assuming that it tells the story about all pitches in New Zealand, we can say that if you bowl first, the probability of winning is likely to be between 17.3% and 58.9%, while the probability of losing is between 9.3% and 47.9%. These are massive confidence intervals, and there’s no way that we can make a call statistically from them. We would need to see more than a difference of two before we could statistically say that there is a difference in the expected result based if you batted or bowled first.

So perhaps the issue is the small sample size.

I could extend to all tests in the last 40 years in New Zealand.

	Won	Lost	draw	Total
Bowl first	45	42	56	143
Probability	0.315	0.294	0.392	1
CI	0.239-0.391	0.219-0.369	0.134-0.535

The difference in the experimental probability is 0.021, but the margin of error is much larger. We would need to have a difference of about 0.114 before we could say that there’s a difference statistically.

However this data also includes situations where teams have won the toss and chosen to bat. So eliminating those might make a difference…

	Won	Lost	draw	Total
Bowl first	29	28	34	91
Probability	0.319	0.308	0.374	1

At this point it’s pretty clear that winning or losing is not decided by the toss. Teams who have lost the toss and been sent in have won 28 as opposed to losing 29. We need to look deeper if we’re going to find anything.

I decided to look at what the normal score was in the first and second innings. If bowling first was the right move, then we’d expect the second innings to be more productive than the first.

It seems that it’s more the other way round. I’ve looked at batting average for the innings rather than score to account for declarations. (540/6 should be worth more than 550 all out).

In the first innings, we’d expect teams to get about 350 and in the second we’d expect them to get about 300. Interestingly, there’s actually a statistical difference here. We can say that teams tend to score more in the first innings than in the second innings.

This suggests that all the hype that says that teams should always bowl first in New Zealand is just that: hype. There’s no statistical evidence that says that bowling first is better than batting first, and – strangely – there is some that suggests batting first actually might be better.

Friday, 27 November 2015

Heads or Tails?

The start of a cricket game is actually 30 minutes before the first ball is bowled. The ritual of it has changed a few times over the years, but basically what happens is the captains walk out to the pitch wearing their blazers over their whites with an umpire and often a cameraman and commentator. The umpire gives the coin to the home captain who tosses the coin up in the air, and then the visiting captain calls heads or tails before the coin lands. The coin is left to land on the pitch, and then the captain who has won the toss is asked if he wants to bat or bowl.

It's an old tradition. Using a coin to make a decision has been done for at least 2000 years. Using it in cricket has dated back to at least the 1850's (The toss result was recorded in the match between Oxford and Cambridge Universities in 1858). And yet there is now some calls for it to be done away with. The ECB are going to experiment with removing the mandatory toss for the 2016 County Championship season.

There is a thought that the toss is too influential. There is a perception that there are too many matches where "if you win the toss, you win the match." Just over 50% of respondents wanted the toss to be done away with on an ABC poll.

I remember having a conversation with a friend who is a fan of most sports, but he was sick of cricket because "the toss of the coin has too much impact." As a cricket statistician, this is wonderful, as it's something that I can test. What is the impact of the toss on cricket matches. How much more often does the team that wins the toss win the match?

In all test matches, the team that has won the toss has won the match 749 times, and the team that lost the toss has won 671 times. These combined with 734 draws and 2 ties have meant that the team that wins the toss has won the match 34.7% of the time, and the team that lost the toss has won 31.1% of the time. This is a relative probability of 1.116. This means that the team that has won the toss has had a 11.6% higher chance of winning the test.

This is slightly lower than I would have expected, but it's still a statistically significant difference. (Statistically significant is a technical term that basically means that we can say that we have enough evidence that there is a difference, and that it's unlikely to be just because of randomness).

But if we delve deeper into these numbers, then some interesting things turn up. First lets break it up by home and away. This is an important distinction, as the home teams will generally be better at reading the conditions. I would have expected that winning the toss at home would provide a bigger difference than winning at home. It turns out to be so, but not by nearly as much as I expected.

When the home team has won the toss, they've won 41.8% of the time (467/1117 matches), when the home team has lost the toss, they've won 37.9% of the time (394/1039). This is a relative probability of 1.103, or a 10.3% increase in probability of winning. There are different methods of testing for significance, and this is right on the edge of being significant or not. In other words, while there's a 10% difference, if we randomly selected the results, it would be reasonably likely that we would get this sort of difference.

When the away team wins the toss, they've won 27.1% of matches (282/1039) vs 24.8% when losing the toss (277/1117). This is a relative probability of 1.094: away teams have won 9.4% more often when they've won the toss than when they've lost the toss. Again, there is a difference, but it's not statistically significant.

The reason why it can be significant with the full group, but not with any sub group is due to the smaller sample sizes.

The similarity can be represented fairly well graphically.

The result proportions for home and away are very similar, regardless of who won the toss.

Now at this point, I wondered if this was simply due to older tests, where perhaps there was less doctoring of pitches. It is an interesting idea that pitches are doctored more often now. I would have thought that if anything, the nature of pitches round the world is now more similar.

To look at this, I selected a completely arbitrary cut off point of the 1st of January 2000. Looking at matches that started after that date, I found these numbers:

Overall the team that has won the toss has won 260 out of 680, while the team that lost the toss has won 252. That's a relative probability of 1.032 ie. in the past 15 years, the team that won the toss has won 3.2% more often than the team that lost the toss.

That's an almost negligible difference.

Breaking it down to home/away it gets even closer.

When the home team won the toss they've won 46.9%, when they lost the toss, they've won 46.9%. The difference is so small that you have to go to the 4th significant figure to be able to measure it.

When the away team won the toss they've won 29.4%, when they lost the toss, they've won 27.4%. The relative probability is 1.072. Surprisingly this is much larger than the advantage for the home team, which suggests that the difference is just down to the effect of randomness rather than the effect of the toss.

Here it is visually:

Again the similarity is remarkable.

So, to assess our original question: does the toss have a major impact on the result of the match? No. Absolutely not. Winning the toss has historically only given teams an 11% higher chance of winning, and recently that's reduced to only 3.2% since the year 2000.

*picture of Bradman and Allen at the coin toss thanks to Wikimedia.